Property is not satisfied in Learning Queries #241

Szpilman2 · 2023-12-20T10:09:57Z

Szpilman2
Dec 20, 2023

Dear UPPAAL Team,

I trust this message reaches you in good health.

I have been working on a system model using UPPAAL
My model is :
thirdTry.zip

The model primarily comprises two templates: software and hardware. I am encountering an issue with strategy synthesis.

While the query strategy reach = control: A<> software.u4 successfully generates a strategy, I face a challenge with the learning strategy minE(energy) [<=250] : <> software.u4. The latter indicates that the property is not satisfied. Could you please help me understand the underlying issue with my model and suggest potential solutions?

Your assistance is greatly appreciated.

Best regards.

Answered by petergjoel

Jan 15, 2024

When the minE query fails, it implies that the learning algorithm cannot generate sufficient samples to train that validate the "sample termination objective" (in your case <> software.u4.

In your particular case, you meet the timing-control limitation of the learning algorithm; namely that the learning algorithm only can learn non-lazy controller strategies.

In the software component, the control at u1, u2 and u3 all require the controller to propose the particular delay that should transpire before the edge is used. This is exactly a "lazy" behavior; namely that the controller postpones the use of an edge.
The result is that the learning algorithm proposes "do nothing" in e.g. software.u1…

View full answer

petergjoel · 2024-01-15T08:12:50Z

petergjoel
Jan 15, 2024
Collaborator

When the minE query fails, it implies that the learning algorithm cannot generate sufficient samples to train that validate the "sample termination objective" (in your case <> software.u4.

In your particular case, you meet the timing-control limitation of the learning algorithm; namely that the learning algorithm only can learn non-lazy controller strategies.

In the software component, the control at u1, u2 and u3 all require the controller to propose the particular delay that should transpire before the edge is used. This is exactly a "lazy" behavior; namely that the controller postpones the use of an edge.
The result is that the learning algorithm proposes "do nothing" in e.g. software.u1 which eventually leads the environment to move the system to error (or get into a deadlock due to the safety strategy).

The are a couple of ways to get around it, the most common is to add a scheduler component (completely in the environments control) that emits "ticks". Each tick moves the software component from u1_wait into a u1_choice location which is committed, and where the controller can chose to either return to u1_wait or progress to u2_wait.
Importantly, all outgoing edges of u1_wait must be uncontrollable.

These particular restrictions are discussed way back in the early papers of stratego: Uppaal Stratego and On Time with Minimal Expected Cost!.

It is an interesting problem to study the extension of the learning algorithms to also deal with continuous and timed action spaces.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

UPPAAL

Property is not satisfied in Learning Queries #241

{{title}}

Replies: 1 comment

{{title}}

Select a reply

UPPAAL

Property is not satisfied in Learning Queries #241

Szpilman2 Dec 20, 2023

Replies: 1 comment

petergjoel Jan 15, 2024 Collaborator

Szpilman2
Dec 20, 2023

petergjoel
Jan 15, 2024
Collaborator