Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[LLM] llm.c training for GPT 2 #3611
[LLM] llm.c training for GPT 2 #3611
Changes from 23 commits
374f0f2
79323f7
03623ee
3636ea6
1694ecd
2c80dcb
1bef798
9282873
0ee942c
5af0d93
71bcdd0
488347f
8ec06a8
2e5bacf
c070da0
87d2a3c
d6e9554
0c2d799
ef26ecd
2b0a085
aa8ecfe
265e43c
598dca5
3056c2c
815d23c
faf63d8
4c44935
3b7312e
b6566d7
bea72d5
8887435
7609990
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: I like the fact that we have different YAMLs for pre-processing and training, but one concern is it may alienate lambda-only, azure-only or fluidstack-only users since they won't have any cloud object store access to write preprocessed data to.
If it's not too complicated, can we add a one-shot YAML that does all pre-processing and training in a single YAML?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point! Just separate it into two different sections. One for a combined YAML anothe for the pipeline. Wdyt?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since cost + time seems like a big motivation behind this work ("Reproducing GPT-2 (124M) in llm.c in 90 minutes for $20"), should we mention that here? Perhaps we can show the optimizer output?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point. Added the comparison in the sentence. How does it look to you?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: if we have graphs similar to karpathy's, it will be nice to put them here :) No worries if not.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added the loss for training. The eval figure requires additional dependency which I unfortunately did not installed before the current training.