Modify grpo_fast.py so that we now pass individual prompts through the queue, not batches. #972

finbarrtimbers · 2025-08-30T04:58:45Z

As part of #859, I made a number of major changes. To be a better software engineer, I'm breaking them out into separate PRs (and also because there's a bug in #859 that I can't identify).

This refactors the way we pass data in the queues to and from LLMRayActor. Instead of passing batches, we now pass individual prompts. This shouldn't affect the observable behaviour of the system in any way.

Runs:

Single GPU: Beaker
Multi-node: Beaker
Single GPU with tool use: Beaker

- Remove detailed timing instrumentation from accumulate_inference_batches - Remove enhanced Timer class duration property and initialization timing - Remove process_from_queue timing breakdown logging - Preserve all functional changes including single-prompt processing 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]>

hamishivi · 2025-09-05T15:55:28Z

/gemini review

gemini-code-assist

Code Review

This pull request refactors the data passing mechanism to use individual prompts in queues instead of batches, which is a significant and positive change for code clarity and logic. The changes are consistent across the codebase, including updates to data types, function signatures, and tests. I've identified one test case, test_uneven_distribution_no_empty_batches, that appears to have been missed during the refactoring and will fail with the new implementation. A suggestion to fix this is provided. Additionally, a new test file for vllm_utils3.py is a great addition.

open_instruct/test_grpo_fast.py

hamishivi

Looks good! One minor comment around num_eval_samples

hamishivi · 2025-09-05T15:54:40Z

open_instruct/grpo_fast.py

    """RUNTIME VALUE: The number of training_steps to train"""
    local_eval_every: int = 100
    """Run evaluation after this many training steps. This controls in-loop evals, which reuse the generation/reward verifier setup. Set to -1 to disable."""
+    num_eval_samples: int = 32


nice catch! Actually, I wonder if we should remove this and rely on dataset_eval_mixer_list to control eval set size.

…e queue, not batches. (#972) * PAsses single prompts through. * Updated queue_types.py to match. * Fixed issue with code. * Fixed queue sizing issue. * Updated length of tool use experiment. * Merged conflicts * Corrected inference batch size calculation. * Fixed merge errors. * Undid changes ot test file. * UNdid changes. * Cleaned up code * Another attempt to fix the dataset_index bug. * Another attempt to fix the dataset_index bug. * Added assert statements. * Removed debugging code. * Less changes * Undid changes * Cleaned up PR. * Fixed change in sort order * Clean up PR * Cleaned up code. * Cleaned up PR. * Fixed issue. * Fixed linter errors. * Updated tool_grpo_fast.sh to use new workspace. * Removed redundant test. * Added back whitespace. * Ran linter. * Refactored code. * Cleaned up PR. * Fixed linter error. * Removed logging. * Removed logging statement. * Attempt at fix mask mismatch issue. * Tests should pass now. * Updated timing code. * Ran linter. * Added timing. * Timing is fast now. * Remove timing instrumentation code - Remove detailed timing instrumentation from accumulate_inference_batches - Remove enhanced Timer class duration property and initialization timing - Remove process_from_queue timing breakdown logging - Preserve all functional changes including single-prompt processing 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]> * Added lots of debugging statements. * Ran linter. Fixed bug. * Added test file * Removed whitespace * Updated script. * Cleaned up code. * Removed debugging code. * Fixed failing test. * Set timeout for tests. They should take 5 minutes to run. * Now, tests should pass. * now tests should pass. * Linter passes. * now, tests should pass * now, tests should actually pass --------- Co-authored-by: Claude <[email protected]>

finbarrtimbers added 27 commits August 29, 2025 22:56

PAsses single prompts through.

2a53d1c

Updated queue_types.py to match.

15a8e3f

Fixed issue with code.

8c24d1b

Fixed queue sizing issue.

f38f62c

Updated length of tool use experiment.

fc6dbf7

Merge branch 'main' into single-prompts

68037ea

Merge branch 'main' into single-prompts

1e4f121

Merge branch 'main' into single-prompts

43eb277

Merged conflicts

66757f0

Corrected inference batch size calculation.

38ebe73

Fixed merge errors.

f25af3e

Undid changes ot test file.

5702134

UNdid changes.

aef0e6b

Cleaned up code

b4430d9

Another attempt to fix the dataset_index bug.

8f5523e

Another attempt to fix the dataset_index bug.

cc95a2f

Added assert statements.

47ddbbd

Removed debugging code.

0f1999a

Less changes

05eab66

Undid changes

c92b986

Cleaned up PR.

2c89ff4

Fixed change in sort order

aadd7d4

Clean up PR

c2152ca

Merge branch 'main' into single-prompts

4b604c5

Cleaned up code.

67d3cca

Cleaned up PR.

766e74d

Fixed issue.

d6e254b

finbarrtimbers marked this pull request as ready for review September 3, 2025 20:14

finbarrtimbers added 2 commits September 3, 2025 14:18

Merge branch 'main' into single-prompts

69a5328

Fixed linter errors.

00e2616

finbarrtimbers force-pushed the single-prompts branch from 7acfcfc to 6dbae0e Compare September 4, 2025 20:37

finbarrtimbers added 6 commits September 4, 2025 16:20

Added lots of debugging statements.

6f05e37

Ran linter. Fixed bug.

66ab15d

Added test file

7d600ec

Removed whitespace

933ca5a

Updated script.

89e7b7b

Cleaned up code.

ee32186

finbarrtimbers requested a review from hamishivi September 5, 2025 15:37

finbarrtimbers added 2 commits September 5, 2025 09:37

Merge branch 'main' into single-prompts

1fa7f78

Removed debugging code.

fcd6e1a

finbarrtimbers enabled auto-merge September 5, 2025 15:41

finbarrtimbers added 3 commits September 5, 2025 09:47

Fixed failing test.

a367ba3

Set timeout for tests. They should take 5 minutes to run.

ffc600b

Now, tests should pass.

35ad725

gemini-code-assist bot reviewed Sep 5, 2025

View reviewed changes

open_instruct/test_grpo_fast.py Show resolved Hide resolved

finbarrtimbers added 2 commits September 5, 2025 10:01

now tests should pass.

7c4d260

Linter passes.

6c47405

finbarrtimbers mentioned this pull request Sep 5, 2025

Streams prompts in and out of LLMRayActor. #990

Merged

hamishivi approved these changes Sep 5, 2025

View reviewed changes

finbarrtimbers added 2 commits September 5, 2025 10:33

now, tests should pass

78b2df2

now, tests should actually pass

ea1dbdc

finbarrtimbers added this pull request to the merge queue Sep 5, 2025

now, tests should actually pass

cdb3af3

finbarrtimbers removed this pull request from the merge queue due to a manual request Sep 5, 2025

finbarrtimbers added this pull request to the merge queue Sep 5, 2025

Merged via the queue into main with commit af3f921 Sep 5, 2025
3 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Modify grpo_fast.py so that we now pass individual prompts through the queue, not batches. #972

Modify grpo_fast.py so that we now pass individual prompts through the queue, not batches. #972

Uh oh!

finbarrtimbers commented Aug 30, 2025 •

edited

Loading

Uh oh!

hamishivi commented Sep 5, 2025

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

hamishivi left a comment

Uh oh!

hamishivi Sep 5, 2025

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Modify grpo_fast.py so that we now pass individual prompts through the queue, not batches. #972

Modify grpo_fast.py so that we now pass individual prompts through the queue, not batches. #972

Uh oh!

Conversation

finbarrtimbers commented Aug 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

hamishivi commented Sep 5, 2025

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

hamishivi left a comment

Choose a reason for hiding this comment

Uh oh!

hamishivi Sep 5, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

finbarrtimbers commented Aug 30, 2025 •

edited

Loading