-
Notifications
You must be signed in to change notification settings - Fork 457
Modify grpo_fast.py so that we now pass individual prompts through the queue, not batches. #972
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
- Remove detailed timing instrumentation from accumulate_inference_batches - Remove enhanced Timer class duration property and initialization timing - Remove process_from_queue timing breakdown logging - Preserve all functional changes including single-prompt processing 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]>
7acfcfc to
6dbae0e
Compare
|
/gemini review |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request refactors the data passing mechanism to use individual prompts in queues instead of batches, which is a significant and positive change for code clarity and logic. The changes are consistent across the codebase, including updates to data types, function signatures, and tests. I've identified one test case, test_uneven_distribution_no_empty_batches, that appears to have been missed during the refactoring and will fail with the new implementation. A suggestion to fix this is provided. Additionally, a new test file for vllm_utils3.py is a great addition.
hamishivi
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good! One minor comment around num_eval_samples
open_instruct/grpo_fast.py
Outdated
| """RUNTIME VALUE: The number of training_steps to train""" | ||
| local_eval_every: int = 100 | ||
| """Run evaluation after this many training steps. This controls in-loop evals, which reuse the generation/reward verifier setup. Set to -1 to disable.""" | ||
| num_eval_samples: int = 32 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice catch! Actually, I wonder if we should remove this and rely on dataset_eval_mixer_list to control eval set size.
…e queue, not batches. (#972) * PAsses single prompts through. * Updated queue_types.py to match. * Fixed issue with code. * Fixed queue sizing issue. * Updated length of tool use experiment. * Merged conflicts * Corrected inference batch size calculation. * Fixed merge errors. * Undid changes ot test file. * UNdid changes. * Cleaned up code * Another attempt to fix the dataset_index bug. * Another attempt to fix the dataset_index bug. * Added assert statements. * Removed debugging code. * Less changes * Undid changes * Cleaned up PR. * Fixed change in sort order * Clean up PR * Cleaned up code. * Cleaned up PR. * Fixed issue. * Fixed linter errors. * Updated tool_grpo_fast.sh to use new workspace. * Removed redundant test. * Added back whitespace. * Ran linter. * Refactored code. * Cleaned up PR. * Fixed linter error. * Removed logging. * Removed logging statement. * Attempt at fix mask mismatch issue. * Tests should pass now. * Updated timing code. * Ran linter. * Added timing. * Timing is fast now. * Remove timing instrumentation code - Remove detailed timing instrumentation from accumulate_inference_batches - Remove enhanced Timer class duration property and initialization timing - Remove process_from_queue timing breakdown logging - Preserve all functional changes including single-prompt processing 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]> * Added lots of debugging statements. * Ran linter. Fixed bug. * Added test file * Removed whitespace * Updated script. * Cleaned up code. * Removed debugging code. * Fixed failing test. * Set timeout for tests. They should take 5 minutes to run. * Now, tests should pass. * now tests should pass. * Linter passes. * now, tests should pass * now, tests should actually pass --------- Co-authored-by: Claude <[email protected]>
As part of #859, I made a number of major changes. To be a better software engineer, I'm breaking them out into separate PRs (and also because there's a bug in #859 that I can't identify).
This refactors the way we pass data in the queues to and from
LLMRayActor. Instead of passing batches, we now pass individual prompts. This shouldn't affect the observable behaviour of the system in any way.Runs: