Summary: redoing

5bf70c1 in a way that doesn't get reverted Test Plan: export MODEL_REPO=meta-llama/Llama-2-7b-chat-hf python quantize.py --checkpoint_path checkpoints/$MODEL_REPO/model.pth --mode int4-gptq --calibration_tasks wikitext --calibration_limit 5 python eval.py --checkpoint_path checkpoints/$MODEL_REPO/model_int4-gptq.g32.cuda.pth --tasks wikitext --limit 5 Reviewers: Subscribers: Tasks: Tags: ghstack-source-id: 1b4a8b43482ff27c8a300b571b2e3e81a13b29e4 Pull Request resolved: #142
pytorch-labs · Mar 26, 2024 · 0ad385c · 0ad385c
1 parent c955dac
commit 0ad385c
Show file tree

Hide file tree

Showing 5 changed files with 513 additions and 23 deletions.
diff --git a/GPTQ.py b/GPTQ.py
@@ -150,9 +150,9 @@ def __init__(
         }
 
         # trace model for one input
-        one_input = [multi.values[0] for multi in inputs]
+        one_input = tuple([multi.values[0].cpu() for multi in inputs])
         exported_model = torch._dynamo.export(
-            model, aten_graph=True, pre_dispatch=True, tracing_mode="fake"
+            model.cpu(), aten_graph=True, pre_dispatch=True, tracing_mode="fake"
         )(*one_input)
         super().__init__(exported_model.graph_module)
         self.new_state_dict = model.state_dict()