Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add simple-tts example #12261

Open
wants to merge 24 commits into
base: master
Choose a base branch
from
Open

Add simple-tts example #12261

wants to merge 24 commits into from

Conversation

danemadsen
Copy link
Contributor

@danemadsen danemadsen commented Mar 8, 2025

This differs from the existing tts example by not relying on common.h and instead only using llama.h.

@danemadsen
Copy link
Contributor Author

I have no idea why that last one is failing?

Copy link
Member

@ggerganov ggerganov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suppose if we merge this version, we can remove the existing common-based tts example.

Comment on lines +617 to +622

std::vector<llama_seq_id> seq_ids(n_parallel, 0);
for (int32_t i = 0; i < n_parallel; ++i) {
seq_ids[i] = i;
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note the parallel functionality is incomplete. We can either remove it to make the example simpler, or we can extend the example to support it. The latter is relatively easy to do - just store multiple sets of codes - one for each parallel sequence. And after that, generate multiple audio files - one for each set of codes.

But this can be done later to keep this PR simple - just making a note.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. Also n_predict and n_parallel are constant values so they can probably be removed but I kept them in to align with the original example.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried to get rid of parallel processing like you suggested but it breaks the example and i cant get it to work without it. I added a not to the top of main for future devs.

Didn't want to waste too much time on it anyway as @ngxson suggests the example might not be needed at all.

@ngxson
Copy link
Collaborator

ngxson commented Mar 8, 2025

Tbh I think this is not needed, the tts support is very experimental for the moment, and having 2 examples for tts only adds more complexity to support for more tts models to come in the future. (Also please note that, the whole tts logic is currently built around OuteTTS. Many other tts models have different mechanism)

We should focus on faster R&D for now by relying on common, then later decide whether to move tts into a fully functional example (or even a shared library)

@danemadsen danemadsen requested a review from ggerganov March 11, 2025 04:22
@danemadsen
Copy link
Contributor Author

Though the experimental nature of TTS is a valid reason to keep using common.h, i feel that this example benefits downstream developers who are trying to build api's for this feature. I'm currently trying to implement this in dart and its very hard to do so when the only example supplied uses C++ code.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants