Conversation Stopped After a While #51

AtakanTekparmak · 2024-09-18T16:14:37Z

Backend impacted

The MLX implementation

Operating system

Mac OS X

Hardware

Metal with MLX

Description

I received the following stack trace after going through with the conversation (and playing some music), with the following stack trace in the extra information. The conversation stopped after that, with no reply from moshi.

Extra information

[Info] accepted connection
error in encoder thread narrow invalid args start + len > dim_len: [4096, 32], dim: 0, start: 4096, len:2
error in encoder thread narrow invalid args start + len > dim_len: [4096, 32], dim: 0, start: 4096, len:2
error in encoder thread narrow invalid args start + len > dim_len: [4096, 32], dim: 0, start: 4096, len:2
error in encoder thread narrow invalid args start + len > dim_len: [4096, 32], dim: 0, start: 4096, len:2
error in encoder thread narrow invalid args start + len > dim_len: [4096, 32], dim: 0, start: 4096, len:2

Environment

Operating system version: MacOS Sonoma 14.6.1
Mac model: M1 Max 64GB

The text was updated successfully, but these errors were encountered:

adefossez · 2024-09-18T16:17:52Z

Yes that's expected, once it reaches the max cache size, conv will stop. That should match roughly 5 min of conv. We will try to expand that in the future.

darkacorn · 2024-09-18T18:35:16Z

pruning should work .. i mean llm ctx is removing from the start too and prunes it down .. maybe we can do something like that ?

adefossez · 2024-09-19T15:21:22Z

Not clear yet if we are going to fix this or not. I'll keep the issue open until we decide :)
There is not technical challenge, mostly just bandwidth to implement and test it!

LaurentMazare · 2024-09-22T18:12:51Z

PR #102 should make the MLX conversations ~2x longer (the culprit was that rustymimi only add a max-seq-len of 4096 but the mimi transformer runs at twice the frame rate of the main moshi transformer).
Longer term, we'll be adding a rotating kv cache in candle huggingface/candle#2493 , this should make infinite conversations possible though as Alex mentioned the model is only trained for ~5 mins with the current training pipeline.

alkeryn · 2024-10-09T01:46:25Z

@LaurentMazare now that the pr is merged, would it be possible to trim all tokens older than n minutes?
so even if it loses context at least it can stay somewhat coherent.

AtakanTekparmak added the bug Something isn't working label Sep 18, 2024

RoversX mentioned this issue Sep 19, 2024

error in encoder thread narrow invalid args start + len > dim_len: [4096, 32], dim: 0, start: 4096, len:2 #68

Closed

adefossez added the may_implement Tag for stuff we might implement in the near future. label Sep 19, 2024

adefossez assigned LaurentMazare Sep 19, 2024

This was referenced Sep 20, 2024

M2 Pro Mac MLX installs, model loads, command line works, web version does not. #75

Closed

Add a RotatingKVCache. huggingface/candle#2493

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Conversation Stopped After a While #51

Conversation Stopped After a While #51

AtakanTekparmak commented Sep 18, 2024

adefossez commented Sep 18, 2024

darkacorn commented Sep 18, 2024

adefossez commented Sep 19, 2024

LaurentMazare commented Sep 22, 2024 •

edited

Loading

alkeryn commented Oct 9, 2024 •

edited

Loading

Conversation Stopped After a While #51

Conversation Stopped After a While #51

Comments

AtakanTekparmak commented Sep 18, 2024

Backend impacted

Operating system

Hardware

Description

Extra information

Environment

adefossez commented Sep 18, 2024

darkacorn commented Sep 18, 2024

adefossez commented Sep 19, 2024

LaurentMazare commented Sep 22, 2024 • edited Loading

alkeryn commented Oct 9, 2024 • edited Loading

LaurentMazare commented Sep 22, 2024 •

edited

Loading

alkeryn commented Oct 9, 2024 •

edited

Loading