Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

int4 gptq shape fix #142

Merged
merged 3 commits into from
Mar 27, 2024
Merged

int4 gptq shape fix #142

merged 3 commits into from
Mar 27, 2024

Conversation

HDCharles
Copy link
Contributor

@HDCharles HDCharles commented Mar 20, 2024

Stack from ghstack (oldest at bottom):

Summary: redoing
5bf70c1
in a way that doesn't get reverted. note, needed to fix a device issue
as well.

Test Plan:

export MODEL_REPO=meta-llama/Llama-2-7b-chat-hf
python quantize.py --checkpoint_path checkpoints/$MODEL_REPO/model.pth --mode int4-gptq --calibration_tasks wikitext --calibration_limit 5
python eval.py --checkpoint_path checkpoints/$MODEL_REPO/model_int4-gptq.g32.cuda.pth --tasks wikitext --limit 5

Reviewers:

Subscribers:

Tasks:

Tags:

5bf70c1
in a way that doesn't get reverted

Test Plan: sh run.sh

Reviewers:

Subscribers:

Tasks:

Tags:

[ghstack-poisoned]
HDCharles added a commit that referenced this pull request Mar 20, 2024
5bf70c1
in a way that doesn't get reverted

Test Plan: sh run.sh

Reviewers:

Subscribers:

Tasks:

Tags:

ghstack-source-id: 4307b5c7904d59fdbd40a0455687ae90e84585d4
Pull Request resolved: #142
@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Mar 20, 2024
model.py Outdated Show resolved Hide resolved
5bf70c1
in a way that doesn't get reverted

Test Plan:

export MODEL_REPO=meta-llama/Llama-2-7b-chat-hf
python quantize.py --checkpoint_path checkpoints/$MODEL_REPO/model.pth --mode int4-gptq --calibration_tasks wikitext --calibration_limit 5
python eval.py --checkpoint_path checkpoints/$MODEL_REPO/model_int4-gptq.g32.cuda.pth --tasks wikitext --limit 5

Reviewers:

Subscribers:

Tasks:

Tags:

[ghstack-poisoned]
HDCharles added a commit that referenced this pull request Mar 26, 2024
5bf70c1
in a way that doesn't get reverted

Test Plan:

export MODEL_REPO=meta-llama/Llama-2-7b-chat-hf
python quantize.py --checkpoint_path checkpoints/$MODEL_REPO/model.pth --mode int4-gptq --calibration_tasks wikitext --calibration_limit 5
python eval.py --checkpoint_path checkpoints/$MODEL_REPO/model_int4-gptq.g32.cuda.pth --tasks wikitext --limit 5

Reviewers:

Subscribers:

Tasks:

Tags:

ghstack-source-id: 1b4a8b43482ff27c8a300b571b2e3e81a13b29e4
Pull Request resolved: #142
Summary: redoing
5bf70c1
in a way that doesn't get reverted. note, needed to fix a device issue
as well.

Test Plan:

export MODEL_REPO=meta-llama/Llama-2-7b-chat-hf
python quantize.py --checkpoint_path checkpoints/$MODEL_REPO/model.pth --mode int4-gptq --calibration_tasks wikitext --calibration_limit 5
python eval.py --checkpoint_path checkpoints/$MODEL_REPO/model_int4-gptq.g32.cuda.pth --tasks wikitext --limit 5

Reviewers:

Subscribers:

Tasks:

Tags:

[ghstack-poisoned]
@HDCharles HDCharles changed the title Summary: redoing int4 gptq shape fix Mar 26, 2024
@HDCharles HDCharles mentioned this pull request Mar 26, 2024
@HDCharles HDCharles requested a review from cpuhrsch March 27, 2024 18:40
@HDCharles HDCharles merged commit 2e76737 into gh/HDCharles/6/base Mar 27, 2024
1 check passed
HDCharles added a commit that referenced this pull request Mar 27, 2024
Summary: redoing
5bf70c1
in a way that doesn't get reverted. note, needed to fix a device issue
as well.

Test Plan:

export MODEL_REPO=meta-llama/Llama-2-7b-chat-hf
python quantize.py --checkpoint_path checkpoints/$MODEL_REPO/model.pth --mode int4-gptq --calibration_tasks wikitext --calibration_limit 5
python eval.py --checkpoint_path checkpoints/$MODEL_REPO/model_int4-gptq.g32.cuda.pth --tasks wikitext --limit 5

Reviewers:

Subscribers:

Tasks:

Tags:

ghstack-source-id: d3928b07e8be7e1e9e98b43584e125b6d60770d6
Pull Request resolved: #142
@HDCharles HDCharles deleted the gh/HDCharles/6/head branch March 27, 2024 18:49
griff4692 pushed a commit to AnswerDotAI/cold-compress that referenced this pull request May 20, 2024
Summary: redoing
pytorch-labs/gpt-fast@5bf70c1
in a way that doesn't get reverted. note, needed to fix a device issue
as well.

Test Plan:

export MODEL_REPO=meta-llama/Llama-2-7b-chat-hf
python quantize.py --checkpoint_path checkpoints/$MODEL_REPO/model.pth --mode int4-gptq --calibration_tasks wikitext --calibration_limit 5
python eval.py --checkpoint_path checkpoints/$MODEL_REPO/model_int4-gptq.g32.cuda.pth --tasks wikitext --limit 5

Reviewers:

Subscribers:

Tasks:

Tags:

ghstack-source-id: d3928b07e8be7e1e9e98b43584e125b6d60770d6
Pull Request resolved: pytorch-labs/gpt-fast#142
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants