FirstAI-Voice

FirstAI-Voice is an end-to-end voice model. FirstAI-Voice can directly understand and generate Indonesia and English speech, engage in real-time voice conversations, and change attributes such as emotion, intonation, speech rate, and dialect based on user instructions.

Model Architecture

We provide the three components of FirstAI-Voice:

FirstAI-Voice-Tokenizer: Trained by adding vector quantization to the encoder part of Whisper, converting continuous speech input into discrete tokens. Each second of audio is converted into 12.5 discrete tokens.
FirstAI-Voice-9B: Pre-trained and aligned on speech modality based on FirstAI-9B, enabling understanding and generation of discretized speech.
FirstAI-Voice-Decoder: A speech decoder supporting streaming inference, retrained based on CosyVoice, converting discrete speech tokens into continuous speech output. Generation can start with as few as 10 audio tokens, reducing conversation latency.

A more detailed technical report will be published later.

Model List

Model	Type	Download
FirstAI-Voice-Tokenizer	Speech Tokenizer	🤗 Huggingface
FirstAI-Voice-9B	Chat Model	🤗 Huggingface
FirstAI-Voice-Decoder	Speech Decoder	🤗 Huggingface

Usage

We provide a Web Demo that can be launched directly. Users can input speech or text, and the model will respond with both speech and text.

Preparation

First, download the repository

git clone --recurse-submodules https://github.com/f1rstInd/FirstAI-Voice
cd FirstAI-Voice

Then, install the dependencies. You can also use our pre-built docker image zhipuai/FirstAI-voice:0.1 to skip the step.

pip install -r requirements.txt

Since the Decoder model does not support initialization via transformers, the checkpoint needs to be downloaded separately.

# Git model download, please ensure git-lfs is installed
git clone https://huggingface.co/f1rstInd/FirstAI-Voice

Launch Web Demo

Start the model server

python model_server.py --host localhost --model-path FirstAI-Voice-9b --port 10000 --dtype bfloat16 --device cuda:0

If you need to launch with Int4 precision, run

python model_server.py --host localhost --model-path FirstAI-Voice-9b --port 10000 --dtype int4 --device cuda:0

This command will automatically download FirstAI-Voice-9b. If network conditions are poor, you can manually download it and specify the local path using --model-path.

Start the web service

python web_demo.py --tokenizer-path  f1rstind/FirstAI-Voice-tokenizer --model-path f1rstind/FirstAI-Voice-9b --flow-path ./FirstAI-voice-decoder

You can access the web demo at http://127.0.0.1:8888. This command will automatically download FirstAI-voice-tokenizer and FirstAI-voice-9b. Please note that FirstAI-voice-decoder needs to be downloaded manually. If the network connection is poor, you can manually download these three models and specify the local paths using --tokenizer-path, --flow-path, and --model-path.

Known Issues

Gradio’s streaming audio playback can be unstable. The audio quality will be higher when clicking on the audio in the dialogue box after generation is complete.

Acknowledgements

Some code in this project is from:

License Agreement

The use of FirstAI model weights must follow the Model License Agreement.
The code in this open-source repository is licensed under the Apache 2.0 License.

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
cosyvoice		cosyvoice
resources		resources
speech_tokenizer		speech_tokenizer
third_party		third_party
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
flow_inference.py		flow_inference.py
model_server.py		model_server.py
requirements.txt		requirements.txt
web_demo.py		web_demo.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

FirstAI-Voice

Model Architecture

Model List

Usage

Preparation

Launch Web Demo

Known Issues

Acknowledgements

License Agreement

About

Releases

Packages

Languages

License

f1rstInd/firstAI-voice

Folders and files

Latest commit

History

Repository files navigation

FirstAI-Voice

Model Architecture

Model List

Usage

Preparation

Launch Web Demo

Known Issues

Acknowledgements

License Agreement

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages