Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation Stopped After a While #51

Open
AtakanTekparmak opened this issue Sep 18, 2024 · 5 comments
Open

Conversation Stopped After a While #51

AtakanTekparmak opened this issue Sep 18, 2024 · 5 comments
Assignees
Labels
bug Something isn't working may_implement Tag for stuff we might implement in the near future.

Comments

@AtakanTekparmak
Copy link

Backend impacted

The MLX implementation

Operating system

Mac OS X

Hardware

Metal with MLX

Description

I received the following stack trace after going through with the conversation (and playing some music), with the following stack trace in the extra information. The conversation stopped after that, with no reply from moshi.

Extra information

[Info] accepted connection
error in encoder thread narrow invalid args start + len > dim_len: [4096, 32], dim: 0, start: 4096, len:2
error in encoder thread narrow invalid args start + len > dim_len: [4096, 32], dim: 0, start: 4096, len:2
error in encoder thread narrow invalid args start + len > dim_len: [4096, 32], dim: 0, start: 4096, len:2
error in encoder thread narrow invalid args start + len > dim_len: [4096, 32], dim: 0, start: 4096, len:2
error in encoder thread narrow invalid args start + len > dim_len: [4096, 32], dim: 0, start: 4096, len:2

Environment

  • Operating system version: MacOS Sonoma 14.6.1
  • Mac model: M1 Max 64GB
@AtakanTekparmak AtakanTekparmak added the bug Something isn't working label Sep 18, 2024
@adefossez
Copy link
Collaborator

Yes that's expected, once it reaches the max cache size, conv will stop. That should match roughly 5 min of conv. We will try to expand that in the future.

@darkacorn
Copy link

pruning should work .. i mean llm ctx is removing from the start too and prunes it down .. maybe we can do something like that ?

@adefossez
Copy link
Collaborator

Not clear yet if we are going to fix this or not. I'll keep the issue open until we decide :)
There is not technical challenge, mostly just bandwidth to implement and test it!

@LaurentMazare
Copy link
Member

LaurentMazare commented Sep 22, 2024

PR #102 should make the MLX conversations ~2x longer (the culprit was that rustymimi only add a max-seq-len of 4096 but the mimi transformer runs at twice the frame rate of the main moshi transformer).
Longer term, we'll be adding a rotating kv cache in candle huggingface/candle#2493 , this should make infinite conversations possible though as Alex mentioned the model is only trained for ~5 mins with the current training pipeline.

@alkeryn
Copy link

alkeryn commented Oct 9, 2024

@LaurentMazare now that the pr is merged, would it be possible to trim all tokens older than n minutes?
so even if it loses context at least it can stay somewhat coherent.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working may_implement Tag for stuff we might implement in the near future.
Projects
None yet
Development

No branches or pull requests

5 participants