smoothquant fp8 support #25

huangtingwei9988 · 2024-08-24T11:21:00Z

We have implemented smoothquant fp8 quantization. Compared with the int8 model, the fp8 model has significantly better model performance. At the same time, in comparison with the AutoFP8 method, smoothquant fp8 has a slightly lower perplexity in WikiText2. On some zero-shot tasks, smoothquant fp8 has better performance.

Model Performance

We evaluated the model performance on WikiText2 and five zero-shot tasks. Currently only for llama2-7b.

TODO list

support for more models
more model evaluations
Automatically search for smooth strength param

init support for smoothquant fp8 and use lm_eval to eval

84c1dcc

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

smoothquant fp8 support #25

smoothquant fp8 support #25

huangtingwei9988 commented Aug 24, 2024

smoothquant fp8 support #25

Are you sure you want to change the base?

smoothquant fp8 support #25

Conversation

huangtingwei9988 commented Aug 24, 2024

Model Performance

TODO list