We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Hi, When I use dolma to tokenize OLMoE-mix-0924 data, it takes a long time, probably more than a week. Is this normal?
The text was updated successfully, but these errors were encountered:
I think you can parallelize it across more processes? cc @soldni for more insights
Sorry, something went wrong.
We have set --processes to 32, but the DCLM subset still needs longer. I don't know if there is another method to accelerate.
No branches or pull requests
Hi, When I use dolma to tokenize OLMoE-mix-0924 data, it takes a long time, probably more than a week. Is this normal?
The text was updated successfully, but these errors were encountered: