Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding script for processing many intermediate checkpoints at once for offline evals #731

Open
wants to merge 79 commits into
base: main
Choose a base branch
from

Conversation

IanMagnusson
Copy link
Contributor

Making a draft PR for this so we can consider merging this in to main. It would be nice if we could do this so we don't run into version issues if we train models in the future that are not compatible with the version of the code forked here.

@jenahwang jenahwang requested a review from dirkgr October 11, 2024 23:40
@jenahwang jenahwang marked this pull request as ready for review October 11, 2024 23:40
Copy link
Member

@dirkgr dirkgr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@soldni , don't you already have a checkpoint converter script that runs in Beaker?

.gitignore Outdated
@@ -1,3 +1,6 @@
# beaker yaml
guided-trout-2f805b9.yaml
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is this?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed

log.txt Outdated
@@ -0,0 +1,10 @@

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't commit temp output.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry. removed

@@ -308,6 +314,9 @@ def main():
upload_local_checkpoint(local_checkpoint_dir, args.destination_dir)

print(f"Converted checkpoint saved to {args.destination_dir}")
if args.cleanup_local_dir:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there ever a reason not to do this?

Copy link

@jenahwang jenahwang Oct 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed the if statement & the flag.

requirements.txt Outdated
@@ -0,0 +1,7 @@
torch
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't use requirements.txt in OLMo. We use pyproject.toml.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed --- it was created to troubleshoot.

@jenahwang
Copy link

@soldni , don't you already have a checkpoint converter script that runs in Beaker?

He does. What this one does is very similar but the focus is on batch conversion and wildcard acceptance. And it was written for oe-eval consistent ranking project with expediting its pipeline in mind.

@soldni
Copy link
Member

soldni commented Oct 26, 2024

@jenahwang would it be possible to merge in your changes to the other script? or consolidate the two?

it's confusing to have two conversion scripts, and it doubles maintenance.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants