Trajectory replay: Fix a few corner cases #6380

li-boxuan · 2025-01-21T06:34:01Z

End-user friendly description of the problem this fixes or functionality that this introduces

Include this change in the Release Notes. If checked, you must provide an end-user friendly description for your change below

Fix two corner cases handling in trajectory replay feature.

Give a summary of what the PR does, explaining any non-trivial design decisions

Two corner cases were missing in the previous PR #6215:

When there's a wait_for_response message, replay gets stuck, waiting for user's response, which doesn't make sense when in the middle of a replay. This is demonstrated in demo2.json and demo3.json.
The trajectory dumped from the GUI would contain environmental actions, which shall be skipped during replay. This is demonstrated in demo1.json (Note: trajectory export from GUI is not available yet; demo1.json is downloaded using the PR (feat) Add button to export trajectory on chat panel #6378).

demo1.json - GUI mode: downloaded from web GUI
demo2.json - Headless mode: after demo1 replay, add a user message, and finish
demo3.json - Headless mode: a replay of demo2. Note: demo2.json and demo3.json only differ in step id, timestamp, and hostname.

Link of any specific issues this addresses

Part of #6049

To run this PR locally, use the following command:

docker run -it --rm   -p 3000:3000   -v /var/run/docker.sock:/var/run/docker.sock   --add-host host.docker.internal:host-gateway   -e SANDBOX_RUNTIME_CONTAINER_IMAGE=docker.all-hands.dev/all-hands-ai/runtime:e1a7c46-nikolaik   --name openhands-app-e1a7c46   docker.all-hands.dev/all-hands-ai/openhands:e1a7c46

enyst · 2025-01-21T06:58:30Z

openhands/controller/replay.py

+                if isinstance(event, MessageAction) and event.wait_for_response:
+                    # For any message waiting for response that is not the last
+                    # event, we override wait_for_response to True, as a response
+                    # would have been included in the next event, and we don't


a response would have been included in the next event

The next event is a MessageAction with source='user', which is the response. Is this event you mean, or do you mean the next source=agent event?

I ask because I'm curious about something: I feel like if the replay process ends, then we close the controller, it will be saved in a new trajectory, and that should reflect perfectly what happened, just like the initial trajectory: so IMHO it should contain... 🤔

the agent actions, including this MessageAction

the correct user actions, including task, including... the response to this MessageAction? The response is a MessageAction with source='user'

(I mean enough events should be retrieved so that an agent with this history can continue normally, with all information it had in the past. Or do you see a reason why that won't work?)

The next event is a MessageAction with source='user', which is the response. Is this event you mean

Yes, literally the next event.

I mean enough events should be retrieved so that an agent with this history can continue normally, with all information it had in the past

I think that's what I've been trying to achieve? Do you see any place that would break this assumption? The response from user is indeed included in the trajectory. For example, in demo2.json, step 16 contains the user response.

The logic here is to NOT pause the control flow. The controller "replays" the "recorded" user response from the trajectory, rather than a new user response from the actual user.

I see and I agree, thank you, we're on the same page on the goal. I have some small hesitation though, but I am still to look in detail at the .json files, so please take it with a grain of salt, and feel free to ignore it atm (I'll look closer at it tonight):

I don't see clearly how the controller can replay the "recorded" user response, since this code says that all actions with source='user' are not replayable. What am I missing?

idk, it also seems to me that we're getting an extra MessageAction that wasn't in history before? The null/null message is new in demo2. Unless I'm hallucinating worse than my Opus. 😅

I wonder if there's an alternative: during replay, interpret wait_for_response as "don't wait, read next message". But it might be more complex.

I don't see clearly how the controller can replay the "recorded" user response, since this code says that all actions with source='user' are not replayable. What am I missing?

You are right, I was cheating! It's not being "replayed" because there's nothing to replay. It's skipped from replay manager perspective.

it also seems to me that we're getting an extra MessageAction that wasn't in history before? The null/null #6380 (comment) is new in demo2

Yeah that might be a side-effect of setting wait_for_response = False. Let me think about your alternative.

I wonder if there's an alternative: during replay, interpret wait_for_response as "don't wait, read next message". But it might be more complex.

This sounds like the right way to do stuff, but... it means more coupling between agent controller and replay manager 💭

enyst · 2025-01-21T07:01:23Z

Looking at demo2, this seems strange:

{"id": 13, "timestamp": "2025-01-20T22:23:36.002374", "source": "agent", "action": "message", "args": {"content": null, "image_urls": null, "wait_for_response": true}, "timeout": 120}

A MessageAction with null content, null image, and wait_for_response = true ?

Aaahh I think I see how that happened, you literally said it, you added something. The previous is a MessageAction with content where the agent is asking the user a question, but its wait_for_response = false... because this PR is setting it false, right?

li-boxuan · 2025-01-21T08:14:30Z

Looking at demo2, this seems strange:

{"id": 13, "timestamp": "2025-01-20T22:23:36.002374", "source": "agent", "action": "message", "args": {"content": null, "image_urls": null, "wait_for_response": true}, "timeout": 120}

A MessageAction with null content, null image, and wait_for_response = true ?

Aaahh I think I see how that happened, you literally said it, you added something. The previous is a MessageAction with content where the agent is asking the user a question, but its wait_for_response = false... because this PR is setting it false, right?

That's correct!

li-boxuan · 2025-01-21T08:25:27Z

Aside, I do realize this would become a bug farm... and I'll make sure to add some E2E tests before checking in the user-facing replay functionality in #6348

Trajectory replay: Fix a few corner cases

7bbad2d

li-boxuan requested review from xingyaoww and enyst January 21, 2025 06:34

enyst reviewed Jan 21, 2025

View reviewed changes

Fix a typo in comment

e1a7c46

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Trajectory replay: Fix a few corner cases #6380

Trajectory replay: Fix a few corner cases #6380

li-boxuan commented Jan 21, 2025 •

edited by github-actions bot

Loading

enyst Jan 21, 2025

li-boxuan Jan 21, 2025 •

edited

Loading

enyst Jan 21, 2025

li-boxuan Jan 22, 2025

li-boxuan Jan 22, 2025

enyst commented Jan 21, 2025

li-boxuan commented Jan 21, 2025

li-boxuan commented Jan 21, 2025

Trajectory replay: Fix a few corner cases #6380

Are you sure you want to change the base?

Trajectory replay: Fix a few corner cases #6380

Conversation

li-boxuan commented Jan 21, 2025 • edited by github-actions bot Loading

enyst Jan 21, 2025

Choose a reason for hiding this comment

li-boxuan Jan 21, 2025 • edited Loading

Choose a reason for hiding this comment

enyst Jan 21, 2025

Choose a reason for hiding this comment

li-boxuan Jan 22, 2025

Choose a reason for hiding this comment

li-boxuan Jan 22, 2025

Choose a reason for hiding this comment

enyst commented Jan 21, 2025

li-boxuan commented Jan 21, 2025

li-boxuan commented Jan 21, 2025

li-boxuan commented Jan 21, 2025 •

edited by github-actions bot

Loading

li-boxuan Jan 21, 2025 •

edited

Loading