Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Trajectory replay: Fix a few corner cases #6380

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

li-boxuan
Copy link
Collaborator

@li-boxuan li-boxuan commented Jan 21, 2025

End-user friendly description of the problem this fixes or functionality that this introduces

  • Include this change in the Release Notes. If checked, you must provide an end-user friendly description for your change below

Fix two corner cases handling in trajectory replay feature.


Give a summary of what the PR does, explaining any non-trivial design decisions

Two corner cases were missing in the previous PR #6215:

  1. When there's a wait_for_response message, replay gets stuck, waiting for user's response, which doesn't make sense when in the middle of a replay. This is demonstrated in demo2.json and demo3.json.
  2. The trajectory dumped from the GUI would contain environmental actions, which shall be skipped during replay. This is demonstrated in demo1.json (Note: trajectory export from GUI is not available yet; demo1.json is downloaded using the PR (feat) Add button to export trajectory on chat panel #6378).

demo1.json - GUI mode: downloaded from web GUI
demo2.json - Headless mode: after demo1 replay, add a user message, and finish
demo3.json - Headless mode: a replay of demo2. Note: demo2.json and demo3.json only differ in step id, timestamp, and hostname.


Link of any specific issues this addresses

Part of #6049


To run this PR locally, use the following command:

docker run -it --rm   -p 3000:3000   -v /var/run/docker.sock:/var/run/docker.sock   --add-host host.docker.internal:host-gateway   -e SANDBOX_RUNTIME_CONTAINER_IMAGE=docker.all-hands.dev/all-hands-ai/runtime:e1a7c46-nikolaik   --name openhands-app-e1a7c46   docker.all-hands.dev/all-hands-ai/openhands:e1a7c46

@li-boxuan li-boxuan requested review from xingyaoww and enyst January 21, 2025 06:34
if isinstance(event, MessageAction) and event.wait_for_response:
# For any message waiting for response that is not the last
# event, we override wait_for_response to True, as a response
# would have been included in the next event, and we don't
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a response would have been included in the next event

The next event is a MessageAction with source='user', which is the response. Is this event you mean, or do you mean the next source=agent event?

I ask because I'm curious about something: I feel like if the replay process ends, then we close the controller, it will be saved in a new trajectory, and that should reflect perfectly what happened, just like the initial trajectory: so IMHO it should contain... 🤔

  • the agent actions, including this MessageAction
  • the correct user actions, including task, including... the response to this MessageAction? The response is a MessageAction with source='user'

(I mean enough events should be retrieved so that an agent with this history can continue normally, with all information it had in the past. Or do you see a reason why that won't work?)

Copy link
Collaborator Author

@li-boxuan li-boxuan Jan 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The next event is a MessageAction with source='user', which is the response. Is this event you mean

Yes, literally the next event.

I mean enough events should be retrieved so that an agent with this history can continue normally, with all information it had in the past

I think that's what I've been trying to achieve? Do you see any place that would break this assumption? The response from user is indeed included in the trajectory. For example, in demo2.json, step 16 contains the user response.

The logic here is to NOT pause the control flow. The controller "replays" the "recorded" user response from the trajectory, rather than a new user response from the actual user.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see and I agree, thank you, we're on the same page on the goal. I have some small hesitation though, but I am still to look in detail at the .json files, so please take it with a grain of salt, and feel free to ignore it atm (I'll look closer at it tonight):

  • I don't see clearly how the controller can replay the "recorded" user response, since this code says that all actions with source='user' are not replayable. What am I missing?

  • idk, it also seems to me that we're getting an extra MessageAction that wasn't in history before? The null/null message is new in demo2. Unless I'm hallucinating worse than my Opus. 😅

I wonder if there's an alternative: during replay, interpret wait_for_response as "don't wait, read next message". But it might be more complex.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see clearly how the controller can replay the "recorded" user response, since this code says that all actions with source='user' are not replayable. What am I missing?

You are right, I was cheating! It's not being "replayed" because there's nothing to replay. It's skipped from replay manager perspective.

it also seems to me that we're getting an extra MessageAction that wasn't in history before? The null/null #6380 (comment) is new in demo2

Yeah that might be a side-effect of setting wait_for_response = False. Let me think about your alternative.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if there's an alternative: during replay, interpret wait_for_response as "don't wait, read next message". But it might be more complex.

This sounds like the right way to do stuff, but... it means more coupling between agent controller and replay manager 💭

@enyst
Copy link
Collaborator

enyst commented Jan 21, 2025

Looking at demo2, this seems strange:

{"id": 13, "timestamp": "2025-01-20T22:23:36.002374", "source": "agent", "action": "message", "args": {"content": null, "image_urls": null, "wait_for_response": true}, "timeout": 120}

A MessageAction with null content, null image, and wait_for_response = true ?

Aaahh I think I see how that happened, you literally said it, you added something. The previous is a MessageAction with content where the agent is asking the user a question, but its wait_for_response = false... because this PR is setting it false, right?

@li-boxuan
Copy link
Collaborator Author

Looking at demo2, this seems strange:

{"id": 13, "timestamp": "2025-01-20T22:23:36.002374", "source": "agent", "action": "message", "args": {"content": null, "image_urls": null, "wait_for_response": true}, "timeout": 120}

A MessageAction with null content, null image, and wait_for_response = true ?

Aaahh I think I see how that happened, you literally said it, you added something. The previous is a MessageAction with content where the agent is asking the user a question, but its wait_for_response = false... because this PR is setting it false, right?

That's correct!

@li-boxuan
Copy link
Collaborator Author

Aside, I do realize this would become a bug farm... and I'll make sure to add some E2E tests before checking in the user-facing replay functionality in #6348

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants