Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

int8 for pytorch #70

Open
1 task done
boehm-e opened this issue Sep 19, 2024 · 3 comments
Open
1 task done

int8 for pytorch #70

boehm-e opened this issue Sep 19, 2024 · 3 comments
Assignees
Labels
may_implement Tag for stuff we might implement in the near future. question Further information is requested

Comments

@boehm-e
Copy link

boehm-e commented Sep 19, 2024

Due diligence

  • I have done my due diligence in trying to find the answer myself.

Topic

The PyTorch implementation

Question

Hi there,
First I would like to thank you for this amazing project :)

Question : I would like to know if a int8 version of the models are planned for PyTorch.

I see it is supported for mlx and Rust.
I tried to run python -m moshi.server -h but i don't see any parameters to load it in lower precision. With the default settings i get an OOM on my rtx 4060 8GB.

Thank you for the great work

@boehm-e boehm-e added the question Further information is requested label Sep 19, 2024
@kyutai-labs kyutai-labs deleted a comment Sep 19, 2024
@adefossez
Copy link
Collaborator

I would strongly recommend against downloading and running code from random people commenting on issues. For safety reason I have deleted the comment.

At the moment only the Rust backend supports int8 quant. Might change in the future. In any case it still probably wouldn't fit on a 8GB GPU, as there are also the weights of the codec Mimi, and the depth transformer + the KV cache.

@adefossez adefossez added the may_implement Tag for stuff we might implement in the near future. label Sep 19, 2024
@adefossez adefossez self-assigned this Sep 19, 2024
@boehm-e
Copy link
Author

boehm-e commented Sep 19, 2024

@adefossez , thank you for your answer.
Yes, I tried the Rust backend and also got an OOM.
I assume a lot of people have an 8GB GPU.
To your knowledge, would it be possible to run the model in int4 quant?

@meicale
Copy link

meicale commented Sep 22, 2024

I would strongly recommend against downloading and running code from random people commenting on issues. For safety reason I have deleted the comment.

At the moment only the Rust backend supports int8 quant. Might change in the future. In any case it still probably wouldn't fit on a 8GB GPU, as there are also the weights of the codec Mimi, and the depth transformer + the KV cache.

What is the minimum GBU memory requirement! Would you mind providing the q-8 pytorch version and when? Thank you for your wonderful work!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
may_implement Tag for stuff we might implement in the near future. question Further information is requested
Projects
None yet
Development

No branches or pull requests

3 participants