Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[failure handling] A subclass of FailureHandling #179

Merged
merged 9 commits into from
Aug 27, 2024

Conversation

sunava
Copy link
Collaborator

@sunava sunava commented Aug 7, 2024

Introduces the RetryMonitor class, a subclass of FailureHandling, which implements a retry mechanism specifically designed to work with a Monitor. This class aims to improve the robustness of task execution by allowing retries in case the monitoring condition is triggered.

@sunava sunava requested a review from artnie August 7, 2024 11:12
Copy link
Contributor

@artnie artnie left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please provide a minimal working example or at least a code snippet on how to extend an action perfom with this monitor. Then I can test and approve.

src/pycram/failure_handling.py Outdated Show resolved Hide resolved
LucaKro and others added 2 commits August 15, 2024 04:20
[RetryMonitor] updated RetryMonitor to the most up to date version, and added test cases
@sunava
Copy link
Collaborator Author

sunava commented Aug 16, 2024

Adds "RetryMonitor", which retries a plan up until max_tries times, while refreshing the monitoring of that plan. An example of that can be found in "examples/retry_monitor.ipynb", which showcases how RetryMonitor is to be used, and also how to manually update variables at runtime while using RetryMonitor.

# This line uses the already existing Monitor expression in PyCram
plan = ParkArmsAction([Arms.BOTH]) + MoveTorsoAction([0.25]) >> Monitor(monitor)

# Now we can use the RetryMonitor expression to retry our plan up to max_tries times.
state, [ParkArmsReturnValue, MoveTorsoReturnValue] = RetryMonitor(plan, max_tries=5).perform()

The .perform() method of the Monitor class was adjusted slightly to allow for smooth and consistent refreshing of the monitor, as the old "check_condition" implementation seemed to sometimes cause the condition not to be checked properly after it was refreshed, due to the sleep loop.

Furthermore, it is now Raising a PlanFailure error if the condition is met, which is caught by the RetryMonitor.

Lastly, the class FailureHandling now inherits from Language, to allow passing instances of FailureHandling to be monitored as well, and designator_description is now expected to be "Union[DesignatorDescription, Monitor]".


Description of the changes:
The state is the state of execution specified in the documentation, and List[Any] is a list of results of the children executed in the correct order. This allows us to use RetryMonitor as follows, given that "sleep" and "print_height" simply return their parameters after execution:

s_time = Code(lambda: sleep(sleep_time))
p_height = Code(lambda: print_height(height))
move_torso = Code(lambda: MoveTorsoAction([height]).resolve().perform())

plan = (p_height + move_torso + s_time) * 2 >> Monitor(monitor)
h, _, s = RetryMonitor(plan, max_tries=5).perform()
print(f"Used height {h}")
print(f"Used sleep time {s}")

While this should not change the behaviour or usage of these functions if you have not used their return values thus far, if you did, you will need to adjust them accordingly, to correctly unpack the return values. If before for example you used:

state = SomeExpression + SomeOtherExpression

You should now use:

state, _ = SomeExpression + SomeOtherExpression

We can define different recovery behaviours for different error messages as follows:

recover1 = Code(lambda: recovery1())
recover2 = Code(lambda: recovery2())

, [, milk_desig] = RetryMonitor(plan, max_tries=5, recovery={NotALanguageExpression: recover1,
PerceptionObjectNotFound: recover2}).perform()

LucaKro and others added 2 commits August 16, 2024 13:23
[Tests] adjusted test cases of language to reflect changes made for RetryMonitor
src/pycram/language.py Show resolved Hide resolved
src/pycram/language.py Outdated Show resolved Hide resolved
src/pycram/failure_handling.py Outdated Show resolved Hide resolved
src/pycram/failure_handling.py Outdated Show resolved Hide resolved
src/pycram/failure_handling.py Outdated Show resolved Hide resolved
src/pycram/failure_handling.py Outdated Show resolved Hide resolved
src/pycram/failure_handling.py Outdated Show resolved Hide resolved

def reset_interrupted(child):
if hasattr(child, "interrupted"):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is a very unsafe way of doing things. Why is child.interrupted not guaranteed?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i wasnt aware that its unsafe, removed it as it should not be needed anymore. where needed im using try-except now

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, try except is not the solution i was looking for. If you expect the child to be of a certain type that has an interrupted attribute you should specify that in the typing and can expect that every child has it. Then just access the attribute. If someone passed in a wrong type he would get a yellow marking of the python linter and if executed anyway an attribute error that. Is that not intended?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well the problem is that the type here may for example be a "Code()" language object, but due to python limitations, as i understand it, there are no ways to properly interrupt these code blocks. Thats why currently, the interrupt method of Code() raises a NotImplementedError. Also, this should never be called by hand, only through the Monitor, which can only do monitoring for Language objects. and these objects either have a interrupt method or it raises a NotImplementedError

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tomsch420 pls answer

def reset_interrupted(child):
if hasattr(child, "interrupted"):
child.interrupted = False
for sub_child in getattr(child, "children", []):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above

src/pycram/failure_handling.py Show resolved Hide resolved
src/pycram/language.py Outdated Show resolved Hide resolved
cond = self.condition.get_value()
if cond:
for child in self.children:
if hasattr(child, 'interrupt'):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unsafe as above

@sunava sunava requested a review from tomsch420 August 21, 2024 12:51
@tomsch420 tomsch420 merged commit b020755 into cram2:dev Aug 27, 2024
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants