Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Summary:
adding torchao apis to gpt-fast and some minor tweaks
Test Plan:
(in progress)
export MODEL_REPO=meta-llama/Meta-Llama-3-8B
python quantize.py --checkpoint_path checkpoints/$MODEL_REPO/model.pth --mode torchao-int8 python generate.py --checkpoint_path checkpoints/$MODEL_REPO/model_torchao-int8.pth --compile python generate.py --checkpoint_path checkpoints/$MODEL_REPO/model_torchao-int8.pth python eval.py --checkpoint_path checkpoints/$MODEL_REPO/model_torchao-int8.pth --tasks wikitext --compile
For model checkpoints/meta-llama/Meta-Llama-3-8B/model_torchao-int8.pth
wikitext: {'word_perplexity,none': 7.900496793735154, 'word_perplexity_stderr,none': 'N/A', 'byte_perplexity,none': 1.4718578218273202, 'byte_perplexity_stderr,none': 'N/A', 'bits_per_byte,none': 0.5576383170121927, 'bits_per_byte_stderr,none': 'N/A', 'alias': 'wikitext'}
python quantize.py --checkpoint_path checkpoints/$MODEL_REPO/model.pth --mode torchao-int4-hqq python generate.py --checkpoint_path checkpoints/$MODEL_REPO/model_torchao-int4-hqq.pth --compile python generate.py --checkpoint_path checkpoints/$MODEL_REPO/model_torchao-int4-hqq.pth python eval.py --checkpoint_path checkpoints/$MODEL_REPO/model_torchao-int4-hqq.pth --tasks wikitext --compile
For model checkpoints/meta-llama/Meta-Llama-3-8B/model_torchao-int4-hqq.pth
wikitext: {'word_perplexity,none': 8.44187872159186, 'word_perplexity_stderr,none': 'N/A', 'byte_perplexity,none': 1.4902143610748824, 'byte_perplexity_stderr,none': 'N/A', 'bits_per_byte,none': 0.575519871235033, 'bits_per_byte_stderr,none': 'N/A', 'alias': 'wikitext'}
python quantize.py --checkpoint_path checkpoints/$MODEL_REPO/model.pth --mode torchao-int4 python generate.py --checkpoint_path checkpoints/$MODEL_REPO/model_torchao-int4.pth --compile python generate.py --checkpoint_path checkpoints/$MODEL_REPO/model_torchao-int4.pth python eval.py --checkpoint_path checkpoints/$MODEL_REPO/model_torchao-int4.pth --tasks wikitext --compile
For model checkpoints/meta-llama/Meta-Llama-3-8B/model_torchao-int4.pth
wikitext: {'word_perplexity,none': 8.59031159441983, 'word_perplexity_stderr,none': 'N/A', 'byte_perplexity,none': 1.4950796712267396, 'byte_perplexity_stderr,none': 'N/A', 'bits_per_byte,none': 0.5802223661766339, 'bits_per_byte_stderr,none': 'N/A', 'alias': 'wikitext'}
Reviewers:
Subscribers:
Tasks:
Tags: