You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have done my due diligence in trying to find the answer myself.
Topic
The PyTorch implementation
Question
Hi there,
First I would like to thank you for this amazing project :)
Question : I would like to know if a int8 version of the models are planned for PyTorch.
I see it is supported for mlx and Rust.
I tried to run python -m moshi.server -h but i don't see any parameters to load it in lower precision. With the default settings i get an OOM on my rtx 4060 8GB.
Thank you for the great work
The text was updated successfully, but these errors were encountered:
I would strongly recommend against downloading and running code from random people commenting on issues. For safety reason I have deleted the comment.
At the moment only the Rust backend supports int8 quant. Might change in the future. In any case it still probably wouldn't fit on a 8GB GPU, as there are also the weights of the codec Mimi, and the depth transformer + the KV cache.
@adefossez , thank you for your answer.
Yes, I tried the Rust backend and also got an OOM.
I assume a lot of people have an 8GB GPU.
To your knowledge, would it be possible to run the model in int4 quant?
I would strongly recommend against downloading and running code from random people commenting on issues. For safety reason I have deleted the comment.
At the moment only the Rust backend supports int8 quant. Might change in the future. In any case it still probably wouldn't fit on a 8GB GPU, as there are also the weights of the codec Mimi, and the depth transformer + the KV cache.
What is the minimum GBU memory requirement! Would you mind providing the q-8 pytorch version and when? Thank you for your wonderful work!
Due diligence
Topic
The PyTorch implementation
Question
Hi there,
First I would like to thank you for this amazing project :)
Question : I would like to know if a int8 version of the models are planned for PyTorch.
I see it is supported for mlx and Rust.
I tried to run
python -m moshi.server -h
but i don't see any parameters to load it in lower precision. With the default settings i get an OOM on my rtx 4060 8GB.Thank you for the great work
The text was updated successfully, but these errors were encountered: