-
Notifications
You must be signed in to change notification settings - Fork 66
Issues: vllm-project/llm-compressor
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Author
Label
Projects
Milestones
Assignee
Sort
Issues list
How to speedup the process of quantization which takes almost 20 hours to quantize llama3-70B with w8a8
enhancement
New feature or request
#968
opened Dec 11, 2024 by
moonlightian
KeyError: 'model.layers.0.self_attn.k_scale'
bug
Something isn't working
#967
opened Dec 10, 2024 by
wxsms
[Bug]: When I use llmcompressor to quantify the llama3 70b model to int8-a8w8,it shows ValueError: Failed to invert hessian due to numerical instability.
bug
Something isn't working
#966
opened Dec 10, 2024 by
rexmxw02
The new version 0.3.0 takes a long time for quantization and eventually fails due to OOM
bug
Something isn't working
#965
opened Dec 10, 2024 by
okwinds
Error when quantizing LLama 3.3 70b to FP8
bug
Something isn't working
#963
opened Dec 6, 2024 by
Syst3m1cAn0maly
Can I load the Something isn't working
stage_quantization
model using SparseAutoModelForCausalLM
?
bug
#962
opened Dec 6, 2024 by
jiangjiadi
How to recover stage quantization from finetuning stage after an error
bug
Something isn't working
#957
opened Dec 5, 2024 by
jiangjiadi
About lora finetuning of 2:4 sparse and sparse quant models
enhancement
New feature or request
#952
opened Dec 4, 2024 by
arunpatala
quantization + sparsification - model outputs zeros
bug
Something isn't working
#942
opened Nov 28, 2024 by
nirey10
Got Error when I load a 2of4 model using vllm.
bug
Something isn't working
#926
opened Nov 19, 2024 by
jiangjiadi
Finetuning in 2:4 sparsity w4a16 example fails with multiple GPUs
bug
Something isn't working
#911
opened Nov 13, 2024 by
arunpatala
Does llm-compressor support minicpm3 which is MLA architecture?
enhancement
New feature or request
#860
opened Oct 22, 2024 by
piamo
Is it possible to quantize to FP8 W8A16 without calibration data
enhancement
New feature or request
#858
opened Oct 21, 2024 by
us58
Perplexity (ppl) Calculation of Local Sparse Model: NaN issue
bug
Something isn't working
#853
opened Oct 19, 2024 by
HengJayWang
SmoothQuant doesn't respect ignored modules for VLMs
bug
Something isn't working
#687
opened Sep 26, 2024 by
mgoin
KV Cache Quantization example cause problem
bug
Something isn't working
#660
opened Sep 25, 2024 by
weicheng59
[USAGE] FP8 W8A8 (+KV) with LORA Adapters
enhancement
New feature or request
#164
opened Sep 11, 2024 by
paulliwog
Yaml parsing fails with a custom mapping provided to SmoothQuantModifier recipe
bug
Something isn't working
#105
opened Aug 22, 2024 by
aatkinson
Layers not skipped with ignore=[ "re:.*"]
bug
Something isn't working
#91
opened Aug 15, 2024 by
horheynm
ProTip!
Updated in the last three days: updated:>2024-12-09.