Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Training code #420

Closed
borgr opened this issue Feb 1, 2024 · 1 comment · Fixed by #421
Closed

Training code #420

borgr opened this issue Feb 1, 2024 · 1 comment · Fixed by #421
Assignees
Labels
type/documentation An issue or pull request related to documentation

Comments

@borgr
Copy link

borgr commented Feb 1, 2024

📚 The doc issue

In the release there is a link to "training code" but it just leads to the github that doesn't document how can one reproduce training

Suggest a potential alternative/fix

Is it a reasonable thing for someone to rerun your stuff? is it possible? If so, can you document? (even documenting and saying it is complicated and one should prefer e.g. pythia is good to know...)

@borgr borgr added the type/documentation An issue or pull request related to documentation label Feb 1, 2024
@epwalsh
Copy link
Member

epwalsh commented Feb 1, 2024

Hey @borgr, at the moment the biggest obstacle is accessing the preprocessed training data. Without that you'd have to preprocess it on your own using tools in Dolma, which takes some time. So to address that we're copying the preprocessed data from a private S3 bucket to a public R2 bucket (no egress costs). Once that's done we'll update the paths in the training configs and add an example to the README.

@epwalsh epwalsh linked a pull request Feb 1, 2024 that will close this issue
3 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type/documentation An issue or pull request related to documentation
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants