Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update: SparseGPT recipes #1142

Merged
merged 4 commits into from
Feb 12, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
12 changes: 4 additions & 8 deletions examples/finetuning/example_alternating_recipe.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -4,12 +4,10 @@ initial_sparsity_stage:
SparseGPTModifier:
sparsity: 0.5
block_size: 128
sequential_update: False
percdamp: 0.01
mask_structure: "0:0"
targets: [
"re:model.layers.\\d+$"
]
targets: ["Linear"]
ignore: ["re:.*lm_head"]
initial_training_stage:
run_type: train
pruning_modifiers:
Expand All @@ -22,12 +20,10 @@ next_sparsity_stage:
SparseGPTModifier:
sparsity: 0.7
block_size: 128
sequential_update: False
percdamp: 0.01
mask_structure: "0:0"
targets: [
"re:model.layers.\\d+$"
]
targets: ["Linear"]
ignore: ["re:.*lm_head"]
next_training_stage:
run_type: train
pruning_modifiers:
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,8 @@ sparsity_stage:
SparseGPTModifier:
sparsity: 0.5
mask_structure: "2:4"
sequential_update: false
targets: ["Linear"]
ignore: ["re:.*lm_head"]
finetuning_stage:
run_type: train
finetuning_modifiers:
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,8 @@ sparsity_stage:
SparseGPTModifier:
sparsity: 0.5
mask_structure: "2:4"
sequential_update: false
targets: ["Linear"]
ignore: ["re:.*lm_head"]
finetuning_stage:
run_type: train
finetuning_modifiers:
Expand Down
3 changes: 2 additions & 1 deletion src/llmcompressor/modifiers/obcq/base.py
Original file line number Diff line number Diff line change
Expand Up @@ -33,9 +33,10 @@ class SparseGPTModifier(SparsityModifierMixin, Modifier):
| SparseGPTModifier:
| sparsity: 0.5
| mask_structure: "2:4"
| sequential_update: True
| dampening_frac: 0.001
| block_size: 128
| targets: ['Linear']
| ignore: ['re:.*lm_head']

Lifecycle:
- on_initialize
Expand Down
24 changes: 20 additions & 4 deletions src/llmcompressor/modifiers/obcq/sgpt_mixin.py
Original file line number Diff line number Diff line change
Expand Up @@ -139,10 +139,26 @@ def on_initialize(self, state: "State", **kwargs) -> bool:

for name, module in get_prunable_layers(layer).items():
name = f"{layer_name}.{name}"
if not match_targets(name, self.ignore)[0]:
self._module_names[module] = name
self._module_sparsities[module] = layer_sparsity
self.register_hook(module, self.calibrate_module, "forward")

if match_targets(name, self.ignore)[0]:
continue

# HACK: previously, embeddings were not quantized because they were not
# accessible by the layer compressor. For now, we manually ignore it,
# but in the FUTURE this should be ignored by the user
if isinstance(module, torch.nn.Embedding):
continue

if name.endswith("lm_head"):
logger.warning(
"`lm_head` was previously auto-ignored by SparseGPT and Wanda "
"modifiers and is not advised. Please add `re:.*lm_head` to "
"your ignore list if this was unintentional"
)

self._module_names[module] = name
self._module_sparsities[module] = layer_sparsity
self.register_hook(module, self.calibrate_module, "forward")

# infer and run pipeline
model_name = state.model.__class__.__name__
Expand Down
3 changes: 2 additions & 1 deletion tests/e2e/vLLM/recipes/Sparse_2of4/recipe_sparse_2of4.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -3,4 +3,5 @@ sparsity_stage:
SparseGPTModifier:
sparsity: 0.5
mask_structure: "2:4"
sequential_update: false
targets: ["Linear"]
ignore: ["re:.*lm_head"]
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,8 @@ sparsity_stage:
SparseGPTModifier:
sparsity: 0.5
mask_structure: "2:4"
sequential_update: false
targets: ["Linear"]
ignore: ["re:.*lm_head"]
quantization_stage:
run_type: oneshot
quantization_modifiers:
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,8 @@ sparsity_stage:
SparseGPTModifier:
sparsity: 0.5
mask_structure: "2:4"
sequential_update: false
targets: ["Linear"]
ignore: ["re:.*lm_head"]
quantization_stage:
run_type: oneshot
quantization_modifiers:
Expand Down
3 changes: 2 additions & 1 deletion tests/e2e/vLLM/recipes/WNA16_2of4/2of4_w4a16_recipe.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,8 @@ sparsity_stage:
SparseGPTModifier:
sparsity: 0.5
mask_structure: "2:4"
sequential_update: false
targets: ["Linear"]
ignore: ["re:.*lm_head"]
quantization_stage:
run_type: oneshot
quantization_modifiers:
Expand Down
1 change: 0 additions & 1 deletion tests/e2e/vLLM/test_vllm.py
Original file line number Diff line number Diff line change
Expand Up @@ -130,7 +130,6 @@ def test_vllm(self):
session.reset()

if SKIP_HF_UPLOAD.lower() != "yes":

logger.info("================= UPLOADING TO HUB ======================")

stub = f"{HF_MODEL_HUB_NAME}/{self.save_dir}-e2e"
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,6 @@ pruning_stage:
obcq_modifiers:
SparseGPTModifier:
sparsity: 0.5
sequential_update: true
mask_structure: "2:4"
targets: ['re:model.layers.\d*$']
targets: ["Linear"]
ignore: ["re:.*lm_head"]
Original file line number Diff line number Diff line change
Expand Up @@ -2,9 +2,9 @@ pruning_stage:
obcq_modifiers:
SparseGPTModifier:
sparsity: 0.5
sequential_update: true
mask_structure: "2:4"
targets: ['re:model.layers.\d*$']
targets: ["Linear"]
ignore: ["re:.*lm_head"]
quant_stage:
quant_modifiers:
QuantizationModifier:
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -3,10 +3,10 @@ test_oneshot_stage:
SparseGPTModifier:
sparsity: 0.7
block_size: 128
sequential_update: False
percdamp: 0.01
mask_structure: "0:0"
target_ids: ["attention_mask", "position_ids"]
targets: ["Linear"]
ignore: ["re:.*lm_head"]
test_train_stage:
pruning_modifiers:
ConstantPruningModifier:
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,6 @@ test_stage:
SparseGPTModifier:
sparsity: 0.7
block_size: 128
sequential_update: True
percdamp: 0.01
mask_structure: "0:0"
targets: ["model.layers.0"]
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,6 @@ test_stage:
SparseGPTModifier:
sparsity: 0.7
block_size: 128
sequential_update: False
percdamp: 0.01
mask_structure: "0:0"
targets: [
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,6 @@ test_stage:
SparseGPTModifier:
sparsity: 0.5
block_size: 128
sequential_update: False
percdamp: 0.01
mask_structure: "0:0"
targets: ["model.layers.0"]
1 change: 0 additions & 1 deletion tests/llmcompressor/transformers/obcq/recipes/sparse.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,6 @@ test_stage:
SparseGPTModifier:
sparsity: 0.3
block_size: 128
sequential_update: False
percdamp: 0.01
targets: ["model.layers.0", "model.layers.1"]
mask_structure: "0:0"
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,6 @@ test_stage:
SparseGPTModifier:
sparsity: 0.5
block_size: 128
sequential_update: False
percdamp: 0.01
mask_structure: "2:4"
targets: [
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,6 @@ test_stage:
SparseGPTModifier:
sparsity: 0.5
block_size: 128
sequential_update: False
percdamp: 0.01
mask_structure: "0:0"
targets: [
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,6 @@ test_stage:
SparseGPTModifier:
sparsity: 0.5
block_size: 128
sequential_update: False
targets: [
're:model.layers.3.mlp.gate_proj.weight'
]
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,6 @@ recipe: |
SparseGPTModifier:
sparsity: 0.5
block_size: 128
sequential_update: False
targets: [
're:model.layers.3.mlp.gate_proj.weight'
]
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,6 @@ recipe: |
SparseGPTModifier:
sparsity: 0.5
block_size: 128
sequential_update: False
targets: [
're:model.layers.3.mlp.gate_proj.weight'
]