Skip to content

Conversation

finbarrtimbers
Copy link
Collaborator

@finbarrtimbers finbarrtimbers commented Oct 3, 2025

The only reason we had the generate thread previously was to fix a locking issue that we had when updating the weights on the actor. Turns out that, with vLLM v1, the locking issue goes away!

Step time decreases significantly on the runs with inflight_updates=True (wandb):

Screenshot 2025-10-06 at 9 12 21 AM

Runs with inflight_updates=True:

Runs with inflight_updates=False:

@finbarrtimbers finbarrtimbers marked this pull request as ready for review October 6, 2025 15:33
Copy link
Collaborator

@hamishivi hamishivi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me generally, but I worry about error handling now.

total_processed = 0
iteration_count = 0

while not self._should_exit():
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

similar question to above, what happens if some other process dies and the vllm worker should stop?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then it gets killed when Ray shuts down:

output.request_id, output, complete_output, current_time
)

if self.verbose and iteration_count % 100 == 0:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i never really used this logging so fine with removing but was there any other impetus for doing so?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Truthfully, I asked Claude to remove all the debug logging (I had a bunch of debug logging in an earlier version of this PR) and it also removed this. I thought it was fine so I kept it. Open to changing this if you prefer.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no, all good! Was just wondering why the change.

Copy link
Collaborator

@hamishivi hamishivi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tested and seems good to me!

@finbarrtimbers finbarrtimbers added this pull request to the merge queue Oct 7, 2025
Merged via the queue into main with commit e320ff0 Oct 7, 2025
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants