-
Notifications
You must be signed in to change notification settings - Fork 447
Now, LLMRayActor
returns logprobs, and we calculate some stats about them vs the trainer logprobs in grpo_fast.py
.
#1041
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from all commits
Commits
Show all changes
37 commits
Select commit
Hold shift + click to select a range
8f2145c
Now, llmrayactor returns logprobs
finbarrtimbers 4632792
Updated code
finbarrtimbers fb6028d
CLeaned up PR.
finbarrtimbers c275203
Cleaned up PR.
finbarrtimbers 966c78b
Updated logprob code
finbarrtimbers 1ef90d8
Fixed code
finbarrtimbers 090d104
now uses nan
finbarrtimbers 5ca9d0b
Now, we filter nans
finbarrtimbers 54728e9
Cleaned up code.
finbarrtimbers bdaf060
Fixed tests
finbarrtimbers 2d4f540
Updated code
finbarrtimbers 280b4f8
Added vllm logprobs
finbarrtimbers 4f871d6
Cleaned up code
finbarrtimbers a6ea5da
Undo changes to script.
finbarrtimbers 3d7c852
Fixed bug in logprobs
finbarrtimbers 03c4207
fixed failing tests
finbarrtimbers c8c3afe
Merge branch 'main' into vllm-logprobs
finbarrtimbers f7c6bca
Added importance sampling ratio
finbarrtimbers 82d9535
Added back comment
finbarrtimbers deedd09
Removed comment
finbarrtimbers b493408
Test config
finbarrtimbers dfd10b8
Add truncated importance sampling with debug assertions to identify N…
finbarrtimbers 9236c9e
Fix NaN handling in truncated importance sampling
finbarrtimbers 4d6fa40
added reverse kl
finbarrtimbers 8eaee10
Merge branch 'main' into vllm-logprobs
finbarrtimbers 4f07ed6
Simplified code
finbarrtimbers 61f1441
changes to debug mask
finbarrtimbers ddf2425
more logging
finbarrtimbers 8f4448a
more logging
finbarrtimbers 2b7e531
Addedcomment describing
finbarrtimbers 707cf10
Add diagnostic logging to vllm_utils3 to detect logprobs length misma…
finbarrtimbers fc036c4
Fix vLLM logprobs N-1 mismatch by only appending EOS for empty responses
finbarrtimbers 88fcfcc
Address review comments.
finbarrtimbers 4afb6dd
Updated scripts
finbarrtimbers dbd3cb2
Updated scripts
finbarrtimbers 5dea688
Cleaned up PR.
finbarrtimbers ed7bb9c
Added assert
finbarrtimbers File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.