Skip to content

Conversation

@finbarrtimbers
Copy link
Collaborator

@finbarrtimbers finbarrtimbers commented Aug 30, 2025

As part of #859, I made a number of major changes. To be a better software engineer, I'm breaking them out into separate PRs (and also because there's a bug in #859 that I can't identify).

This refactors the way we pass data in the queues to and from LLMRayActor. Instead of passing batches, we now pass individual prompts. This shouldn't affect the observable behaviour of the system in any way.

Runs:

@finbarrtimbers finbarrtimbers marked this pull request as ready for review September 3, 2025 20:14
- Remove detailed timing instrumentation from accumulate_inference_batches
- Remove enhanced Timer class duration property and initialization timing
- Remove process_from_queue timing breakdown logging
- Preserve all functional changes including single-prompt processing

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <[email protected]>
@hamishivi
Copy link
Collaborator

/gemini review

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request refactors the data passing mechanism to use individual prompts in queues instead of batches, which is a significant and positive change for code clarity and logic. The changes are consistent across the codebase, including updates to data types, function signatures, and tests. I've identified one test case, test_uneven_distribution_no_empty_batches, that appears to have been missed during the refactoring and will fail with the new implementation. A suggestion to fix this is provided. Additionally, a new test file for vllm_utils3.py is a great addition.

Copy link
Collaborator

@hamishivi hamishivi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! One minor comment around num_eval_samples

"""RUNTIME VALUE: The number of training_steps to train"""
local_eval_every: int = 100
"""Run evaluation after this many training steps. This controls in-loop evals, which reuse the generation/reward verifier setup. Set to -1 to disable."""
num_eval_samples: int = 32
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice catch! Actually, I wonder if we should remove this and rely on dataset_eval_mixer_list to control eval set size.

@finbarrtimbers finbarrtimbers added this pull request to the merge queue Sep 5, 2025
github-merge-queue bot pushed a commit that referenced this pull request Sep 5, 2025
…e queue, not batches. (#972)

* PAsses single prompts through.

* Updated queue_types.py to match.

* Fixed issue with code.

* Fixed queue sizing issue.

* Updated length of tool use experiment.

* Merged conflicts

* Corrected inference batch size calculation.

* Fixed merge errors.

* Undid changes ot test file.

* UNdid changes.

* Cleaned up code

* Another attempt to fix the dataset_index bug.

* Another attempt to fix the dataset_index bug.

* Added assert statements.

* Removed debugging code.

* Less changes

* Undid changes

* Cleaned up PR.

* Fixed change in sort order

* Clean up PR

* Cleaned up code.

* Cleaned up PR.

* Fixed issue.

* Fixed linter errors.

* Updated tool_grpo_fast.sh to use new workspace.

* Removed redundant test.

* Added back whitespace.

* Ran linter.

* Refactored code.

* Cleaned up PR.

* Fixed linter error.

* Removed logging.

* Removed logging statement.

* Attempt at fix mask mismatch issue.

* Tests should pass now.

* Updated timing code.

* Ran linter.

* Added timing.

* Timing is fast now.

* Remove timing instrumentation code

- Remove detailed timing instrumentation from accumulate_inference_batches
- Remove enhanced Timer class duration property and initialization timing
- Remove process_from_queue timing breakdown logging
- Preserve all functional changes including single-prompt processing

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <[email protected]>

* Added lots of debugging statements.

* Ran linter. Fixed bug.

* Added test file

* Removed whitespace

* Updated script.

* Cleaned up code.

* Removed debugging code.

* Fixed failing test.

* Set timeout for tests. They should take 5 minutes to run.

* Now, tests should pass.

* now tests should pass.

* Linter passes.

* now, tests should pass

* now, tests should actually pass

---------

Co-authored-by: Claude <[email protected]>
@finbarrtimbers finbarrtimbers removed this pull request from the merge queue due to a manual request Sep 5, 2025
@finbarrtimbers finbarrtimbers added this pull request to the merge queue Sep 5, 2025
Merged via the queue into main with commit af3f921 Sep 5, 2025
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants