Skip to content

Commit

Permalink
Add quanto install and instructions (#1976)
Browse files Browse the repository at this point in the history
* chore: add quanto install option

* docs: add quanto to README

* Apply suggestions from code review

Co-authored-by: Ella Charlaix <[email protected]>

---------

Co-authored-by: Ella Charlaix <[email protected]>
  • Loading branch information
dacorvo and echarlaix authored Sep 9, 2024
1 parent 2335ec2 commit e604af3
Show file tree
Hide file tree
Showing 2 changed files with 32 additions and 0 deletions.
31 changes: 31 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -268,3 +268,34 @@ You can find more examples in the [documentation](https://huggingface.co/docs/op
```

You can find more examples in the [documentation](https://huggingface.co/docs/optimum/onnxruntime/usage_guides/trainer) and in the [examples](https://github.com/huggingface/optimum/tree/main/examples/onnxruntime/training).


### Quanto

[Quanto](https://github.com/huggingface/optimum-quanto) is a pytorch quantization backend.

You can quantize a model either using the python API or the `optimum-cli`.

```python
from transformers import AutoModelForCausalLM
from optimum.quanto import QuantizedModelForCausalLM, qint4

model = AutoModelForCausalLM.from_pretrained('meta-llama/Meta-Llama-3.1-8B')
qmodel = QuantizedModelForCausalLM.quantize(model, weights=qint4, exclude='lm_head')
```

The quantized model can be saved using `save_pretrained`:

```python
qmodel.save_pretrained('./Llama-3.1-8B-quantized')
```

It can later be reloaded using `from_pretrained`:

```python
from optimum.quanto import QuantizedModelForCausalLM

qmodel = QuantizedModelForCausalLM.from_pretrained('Llama-3.1-8B-quantized')
```

You can see more details and [examples](https://github.com/huggingface/optimum-quanto/tree/main/examples) in the [Quanto](https://github.com/huggingface/optimum-quanto) repository.
1 change: 1 addition & 0 deletions setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -88,6 +88,7 @@
"graphcore": "optimum-graphcore",
"furiosa": "optimum-furiosa",
"amd": "optimum-amd",
"quanto": ["optimum-quanto>=0.2.4"],
"dev": TESTS_REQUIRE + QUALITY_REQUIRE,
"tests": TESTS_REQUIRE,
"quality": QUALITY_REQUIRE,
Expand Down

0 comments on commit e604af3

Please sign in to comment.