A question about {} -> {} syntax in Learning queries of UPPAAL #227
-
Dear All Best wishes |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 2 replies
-
Hi Peyman, The variables As an example, consider the bouncing ball in Figure 4 in Teaching Stratego to Play Ball. We may want to derive a controller that is ignorant to the velocity of the ball (as this might be hard to estimate in the application), we can describe this delimitation as follows:
where This is opposed to
which observes the full state of the ball. Theoretically this induces an Partially Observable EMDP for which Q-learning is not garuanteed to converge. However, it does (at least to a usefull degree) for many practical cases, see e.g. Playing Wordle with Uppaal Stratego where we use this heavily. Difference between {i,j} and {d,f}The difference between the left-hand and right-hand side relates to the partition-refinement method described in Teaching Stratego to Play Ball. The observation-space Observations in the In fact, if you only add observations in the left-hand side ( If you only place observation in the right-hand-side, you will be training function approximations (think neural network, but different technology). If you mix, you will create a table (over So taking the bouncing ball as an example:
trains an agent that for each (integral) velocity will have a function-approximator over Why not always use the {i,j} side?The sample complexity (i.e. effort needed to get decent controllers) is conjectured to be exponential in the dimensions. Why not always use the {d,f} side?The table is only valid for observations seen during training, so there might be holes in your table. So be VERY careful with doubles on this side. Hope that helps. |
Beta Was this translation helpful? Give feedback.
Hi Peyman,
The variables
i
,j
,d
,f
relates to the observable part of the state-space (to the learning agent).As an example, consider the bouncing ball in Figure 4 in Teaching Stratego to Play Ball.
We may want to derive a controller that is ignorant to the velocity of the ball (as this might be hard to estimate in the application), we can describe this delimitation as follows:
where
h
is the height of the ball.This is opposed to
which observes the full state of the ball.
Theoretically this induces an Partially Observable EMDP for which Q-learning is not garuanteed to converge. However, it does (at least to a usefull degree) f…