-
Notifications
You must be signed in to change notification settings - Fork 194
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for preemption in actions #284
Comments
@jacobperron can we move the discussion here? |
Yes, this is a good place to consolidate discussion. Thanks! |
@naiveHobo @jacobperron any progress on this? |
I don't have bandwidth to work on this, but maybe sometime for G-turtle. It would be good to get input from some of the original authors of the design document of actions. cc/ @gbiggs @sloretz |
That would be helpful, this represents a pretty substantial bug in navigation2 that we need to resolve far before g-turtle. Ideally, I'd like a proposed solution with an implementation in review before the end of June. I'm also open to getting directly involved here myself but I think I'm probably more useful after the design discussion takes place and helping with the implementations. |
Patches like ros2/rclcpp#1117 (and the connected PRs) change API and so I don't think adding them to Foxy is likely to happen. I haven't had a chance to take a closer look into the situation, but if it can't be worked around in navigation2 or patched in rclcpp_action in an API (preferably ABI) compatible way for Foxy, then it will have to wait for G-turtle. |
To clarify, I'm proposing we work on this change for G-turtle. |
Ah yes, sorry, I interpreted that as working on it around g-turtle. I'm OK with it being in g-turtle's release. Though, I think this is a breaking change that is worth making mid-distribution (assuming it doesn't change default behavior). I'd usually agree we shouldn't change it, but this particular issue at least on the nav side is pretty serious and there's no feasible work around. But lets work through the issue first and then we can figure it out later. Because this is LTS is the reason I want this in since this will represent a really serious bug for years to come after its been fixed. If it was non-LTS, I wouldn't care as much since it would go out of scope relatively quickly. |
The proposal is to add a preempt branch to the existing goal state machine. This would allow all the existing action servers to keep running without breaking anything but also give the ability for an action server to initiate a preemption when a previous goal is being handled and a new goal arrives. Would this change then be classified as a breaking change? It looks to me that this is just an addition to the existing capabilities. I'd love your inputs to this design proposal. |
@naiveHobo Thanks for providing an updated diagram (visuals are useful!). After more thought, I'm not sure that we want to add the proposed "preempted" state. Taking a step back, the only difference in the proposed "preempted" state and the "aborted' state is semantics (ie. why did the action server stop the goal?). It appears to me that there could be many reasons that a server stops the execution of a goal, and we certainly can't be expected to capture specific cases for all applications in this general state machine. In the context of ros-navigation/navigation2#1652, the navigation2 behavior tree relies on the difference of an action goal terminating because it was overridden by a higher priority goal (e.g. a new path from the planner) versus some other reason (e.g. a path could not be found). If it were some other case, I would say that the result message should contain information to communicate the reason for an aborted goal. But, I see how having a layer of abstraction on top of ROS 2 actions complicates things; the simple action server would need to be adapted to communicate the desired terminal state information, regardless of the type of action. On the other hand, it might be worth adding the new "preempt" goal state to make implementing N-goal action servers (like the simple action server) more straight-forward. I don't think I've thought enough about adapting the simple action server to resolve ros-navigation/navigation2#1652.
It depends on how it is implemented. One of our goals is to maintain ABI compatibility within a ROS distribution. While we may be able to maintain ABI compatibility, I'm wary about breaking application code that may be relying on the goal state (e.g. what happens when an application receives an unexpected goal state "preempted"?). As a concrete example, the command line tool |
Small thing, but I think in your diagram the arrows with "preempt" should be "preempting" to be in line with the other ones, but not a big deal. I don't think that there should be a transition between cancelling and preempt. If its in the middle of canceling, we shouldn't be able to preempt it, that's probably bad form (and also if its being canceled, that's at odds with the semantic meaning of preemption). I think the only state to transition to preempted is from executing.
Just to flip that around a bit, the only difference between "aborted" and "canceled" is also semantics (ei. why did the action stop the goal?). The way it exits is semantically important to represent as you have already identified by separating the concepts of a failure to process and a request to stop processing. Creating a separate concept of stop processing because superseded completes the exit criteria. This would also be usable for not just 1-goal-actions but also N goal actions where you have a limited number of workers to process goals and you want the most recent ones to go at the expense of older ones. When you exit, then its good to know that you didn't fail to process, or request it, but rather something else was more important so it dropped that goal.
The result code of the action is what we're looking at and what we should look at. If you suggest that we rather include all that same information in the result of the action server, then what's the point at all of having the While preemption has clear uses outside of the 1-goal-at-a-time action servers, ignoring all that for a moment, a super common use of Anyhow, I think you get my point, don't mean to harp too much on it. I think the other (bad) option would be to remove the |
since there is a clear difference between |
I'm having trouble understanding the motivation for a preempt state. It looks like As far as I understand the motivation is to enable After reading ros-navigation/navigation2#1652 , I can't tell what it is the Stepping back, I wonder whether
This is an interesting way to look at it. In the spirit of flipping things around 🙃 , what if instead of adding more states, we removed existing ones? Say the state machine for a goal only has three states: Accepted, Executing, and Done.
Canceling is no longer a state in the state machine. When the goal is Done, the server includes a ResultCode explaining why the goal was moved to Done with some common values defined:
This would enable others to define their own result code values without waiting on the action design to change, such that they can be proved and added to a list of enums in the next release. |
I don't really want to discuss the specifics of BtActionNode since its off topic and I think it will digress this discussion to unimportant nit-picking. The specific way we handle the action server isn't really important for this request, I don't think. Regardless of implementation details, the concept of supporting preemption was important in ROS1 and missing in ROS2 & its useful context.
I suppose we could do that, but I don't see what that buys us, other than making the action server implementation take more responsibility over communicating state knowledge. That might be nice for power users, but probably not nice for other users. If the reason you mention that is because you don't want to deal with more changes in the future, I don't know that after preemption there is any other serious ask where we might want to extend this, given ~10 years of ROS1 actionlib existing. If we didn't come up with more in that phase, I don't expect anything more to come in the next ~10 years of ROS2 actions, this request is just to match feature parity. Reducing the action state machine I don't think is a good or bad idea intrinsically, it just is a shift in perspective of responsibility. However if you reduce the state machine but keep the |
Hopefully it makes it easier to implement an action server because it means fewer functions to be aware of.
Simplifying the state machine means there would only be one method instead of those four.
Simplifying might also mean the name of
Action servers can already implement preemption in ROS 2 by aborting the current goal when they accept a new goal.
Some amount of discussion of Maybe
The only potentially interesting places to check are |
I don't think I agree with this, although I do agree with your point that we don't want to handle all the application reasons for stopping a goal in the action state machine. The reasons why the server stopped a goal are going to be quite specific. It was either cancelled externally, or it was cancelled internally. The latter (internal cancellation) is going to be due to not being able to achieve the goal any more, or the goal being superseded if the server is implemented with the capability to stop one goal in favour of another. The former (external cancellation) are going to be application-specific and we cannot do anything more to support them, other than perhaps providing a pass-through string that allows a client cancelling a goal to say why it's cancelling it, for other clients to know (I do not think this is a good idea). Preemption, as a concept, may fall under external cancellation or internal cancellation, depending on your point of view. For example (non-exhaustive list):
Depending on your application design, all of the above cases could be treated as "cancel the/a current goal, then start this goal" (external cancellation) or "I've received a new goal so I'll dump this existing one" (internal cancellation). The way that preemption can be read differently depending on the application design makes me think that adding it as an explicit state to the actions state machine is not a good idea. The semantics of what it means are too application-dependent, in my opinion. However, I do agree that there is benefit in providing a way for an action server to communicate to clients that a goal was stopped because another goal is going to be executed, whether the reason for that was a server-side decision or a client-side decision. It helps when there are multiple clients of a single server (if you have one client and one server, you can track this on the client side easily enough, but that's not the case we should design for). The proposal to drop down to one terminal state and make more use of the result value sounds good, on the face of it. It both simplifies the state machine and makes it easier to provide additional reasons for why the goal reached that state in the future. However I think there is value in having separate states for "goal finished" and "goal not finished". It makes the distinction between successful completion and not completing (in a good way or a bad way) clear. |
This makes me want to know: are you preempting goals a lot, or is it that you want to know they were preempted? Preemption-from-the-client is possible in the current design by cancelling the current goal. Knowing a goal was preempted is also possible if that is the only time your application cancels goals. So I wonder if what you, @fujitatomoya, are really asking for is that the server automatically preempts running goals (also possible now) or if you want to distinguish between preempted and cancelled-for-other-reasons? It's this latter one that is the issue here, I think, |
@sloretz so you're just proposing we change the API so that there's only 1 exit function now (and then we extend the
That assumes that the goal when returns a result has access to state information about the new goal, which I think breaks some important encapsulation. Goal 17 shouldn't rely on or require any knowledge about Goal 23. It also assumes that there's only 1 client talking to the server for there to be some internal state tracking that Goal 17 and 23 came from the same place for Goal 17 to be told "ok, you ended because I just sent 23". From the long posts, what I'm still getting from it is a general agreement on the |
Probably preempting because the target pose has changed or refined over time.
I think that's what we're all discussing here |
@SteveMacenski I don't understand. Are you talking about the action server or action client? The action server must know about all of the goals it is executing, otherwise how would it decide to preempt one? |
both, as @SteveMacenski mentioned, it is for pose, behavior and so on.
yes, at least the client has to be notified that the goal is preempted (not aborted). |
@sloretz sorry, maybe a full example is required.
Action servers can abort old goal when a new one comes in, lets call the old goal Goal 17 and the new one Goal 23. For the situation with 1 action client: When Goal 17 result comes back, you're then requiring the action client application to be aware of the state of every recent goal so that it returns a failed state code, that its because a new one was processed. You're asking for the application to have to retain state about the nature of many goals because the state returned by the action server is incomplete. I think that breaks some important concepts of encapsulation of individual goals as independent actions. It also basically requires an application to have to maintain its own separate state for goals because it can't trust the actual results, which is pretty hacky. For the situation with multiple action clients: when Goal 23 preempts Goal 17, and they were each applied by a different client, there's really no way for 17 to know why it was cancelled / failed. We'd then have to have each of these clients linked together to transmit information to let each other know what's happening so that they can properly handle the incomplete failed terminal code. Same arguments for encapsulation and hacky solutions. |
Yes, in general I am supportive of having either a single |
I also support the idea to simplify the goal states, and/or allowing the user to pass their own return code. I do think that there is still value in communicating "succeeded" and "aborted" states (or "completed" and "terminated", whatever we decide to call them). Having this pair of states will let tools (like It might be good at this point to discuss the API we imagine with a single I'm thinking we want to add an additional "user" return code to GoalStatus. Here's a couple example definitions we can poke at: With single terminated state
With "succeeded" and "aborted" states
|
I am against having a user defined return code if there is only one "finished" state. It would make it hard for tools to know in a generic way whether a goal actually finished or not. If we have succeeded and aborted states, then I see value in having a user-defined return code, but I don't understand why it can't just be in the action definition. That's where application-specific stuff goes, isn't it? Even if we do have one at the API level, I think it should be in addition to the built-in return code, not overwrite it. |
To back up a little, in this case, do you imagine GoalStatus and ResultCode are identical? They seem to service a very similar, if not same, purpose. I'm just wanting to make sure as we're talking about GoalStatus that this is also being given to the clients in the form of result codes as well when they receive a "finished" notification on the In this case, the Both of these examples lack a preemption status. I don't understand what is achieved here.
I think we need to be really explicit on what we're talking about. When I'm thinking about 1 "finished" [thing], what I'm talking about is 1 finished API for the server, e.g. |
To clarify, a goal result is made up of two pieces: 1) the
Sorry, I think I misinterpreted the above discussion. I was thinking about reducing the number of hard-coded terminal states and use the "return code" as a way to pass additional information. But, I believe @sloretz's proposal was to relax the possible number of terminal states (which makes way more sense). I.e. we can keep the existing definition of |
I'm still confused by this point. When we designed actions for ROS 2, we kept the "PREEMPTED" and "PREEMPTING" states from ROS 1 (ROS 1 defintion), but chose to rename them "CANCELED" and "CANCELING" as a preference. I guess maybe the real difference boils down to in ROS 1, the action server could trigger the transition to PREEMPTING, whereas in ROS 2 it must be triggered by an action client request? |
An action server is free to cancel a currently-executing goal if it wants to. What is lacking is the ability to report to clients that a goal was cancelled because a new goal is being executed instead. In other words, a new goal preempted a currently-executing goal. |
Exactly what @gbiggs stated. OK GoalStatus == ResultCode, just making sure. There are 3 terminal failure statuses in my opinion that we've been discussing:
|
That is exactly our requirement! |
I guess I'm missing how these two failure modes were communicated differently in ROS 1 actionlib. Edit: I guess this feature doesn't exist in ROS 1. |
There has been a lot of good discussion so far about implementation. I don't see a description of the use case yet. Why does the Action client need to know its goal was preempted? Existing uses in ROS 1 haven't answered that question. While action clients had a preempted state in ROS 1, I didn't find any ROS 1 client side use of this knowledge that was different from the aborted state.
@fujitatomoya in your use case, why does the action client need to know the goal was preempted? What does the client do when the goal is preempted that is different from what it would do if it was aborted?
@SteveMacenski I'm not sure we're talking about the same thing. It's true action action clients in ROS 2 aren't able to tell if their goal was replaced by another goal on the server. I was only saying action servers can implement preemption. The server may choose to abort 17 when it accepts 23, and the client for 17 would get notice that its goal was aborted. Why does
@jacobperron Good point, I does look like they're indistinguishable once the goal became active. Once the goal reaches the |
I'll just respond to my tagged element. It needs to know because in the case that it was cancelled due to failure, we will enter a recovery state which would involve probably stopping the robot if not already stopped and clearing state or taking aggressive measures to resolve. If its just a preemption because there's a new goal, then we don't want to do anything and keep chugging along. Our architecture makes substantial use of action servers preempting each other with new information when required. Every time a new plan comes from the planner server, the controller server is preempted and given it to now track. That means without it, every 1 hz or something the robot would stop which is really bad. Without specifying preemption vs failure, its then not possible for us to process outright failures properly because the failures are currently being assumed to only be from preemption because the APIs are making us misuse them. |
thanks for checking on this,
as application perspective, above is really different i guess. |
friendly ping; |
Another friendly ping :-) |
And a friendly ping from me. |
And another warm ping from me |
Currently, there's no predefined way to handle preemption in
rclcpp_actions
. When a client sends a new goal when a previous goal is running, the old goal's state is set to ABORTED. This means that there's no way to differentiate between an abort caused by a true failure in the server and an abort caused by a request for preemption.A client should be able to distinguish between the result codes received on true abort and abort due to preemption.
There are some ongoing discussions on the tickets and PRs listed below:
ros2/rclcpp#1104
ros2/rclcpp#1117
ros2/rcl_interfaces#105
ros-navigation/navigation2#1652
The text was updated successfully, but these errors were encountered: