Canary-ComfyUI

NVIDIA’s Canary is a state-of-the-art multilingual speech-to-text and speech-translation model (ASR + AST) offering punctuation and capitalization this ComfyUI custom node supports :

canary-1b-v2,
canary-1b-flash,
canary-180m-flash.

Installation

Follow these steps to install and configure the nodes.

1. Clone the Repository

Navigate to your ComfyUI custom_nodes directory and clone this repository:

# Example path: ComfyUI/custom_nodes/
cd /path/to/your/ComfyUI/custom_nodes/
git https://github.com/Juste-Leo2/Canary-ComfyUI.git
cd Canary-ComfyUI

2. Install Dependencies

Python environments within ComfyUI can be tricky. The recommended way to install the required nemo_toolkit is by using uv, which is included with recent versions of ComfyUI.

Open a terminal or command prompt and run the following command. You must replace path/to/your/python.exe with the actual path to the Python executable used by ComfyUI.

For the portable version of ComfyUI, this is typically ComfyUI/python_embeded/python.exe.
If you use a virtual environment (venv), activate it and use python.

# Command to run from the root of the Canary-ComfyUI folder
# (ComfyUI/custom_nodes/Canary-ComfyUI)
/path/to/your/python.exe -m uv pip install -r requirements.txt --no-deps --force-reinstall --index-strategy unsafe-best-match

This command uses uv to install NeMo in a way that is less likely to cause conflicts with ComfyUI's existing packages.

3. Download the Model

Go to the model's files page on Hugging Face: nvidia/canary-1b-v2
Download the model file, which is named canary-1b-v2.nemo.

4. Place the Model

Place the downloaded canary-1b-v2.nemo file inside the ComfyUI/models/canary/ directory.
You may need to create the canary folder yourself if it doesn't exist.

The final path should look like this: ComfyUI/models/canary/canary-1b-v2.nemo.

5. Restart ComfyUI

Restart ComfyUI completely. The new nodes should appear in the "Add Node" menu under the Canary-ComfyUI category.

Usage

Add the Load Canary Model node and select canary-1b-v2.nemo.
Add an audio loading node (e.g., Load Audio)
Connect the CANARY_MODEL and AUDIO outputs to one of the three task nodes (Canary Transcription, Canary Translate to English, or Canary Translate from English).
Select the desired languages and queue the prompt. The resulting text will be available as an output.

Roadmap

Here are some of the features and improvements planned for the future of this project:

Timestamp Support
Support for main SOTA Canary Models
Support canary-1b
nodes fusion for simplified use

License

The Python code in this repository is released under the Apache 2.0 License.
The NVIDIA Canary-1b-v2 model is subject to its own license, the Creative Commons Attribution-NonCommercial 4.0 International. Please review its terms before use, especially regarding commercial applications.

Acknowledgements

A big thank you to NVIDIA for creating and open-sourcing the Canary model.
Thanks to the entire ComfyUI team for building such a flexible and powerful tool for the community.

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
docs		docs
src		src
.gitattributes		.gitattributes
LICENSE		LICENSE
README.md		README.md
__init__.py		__init__.py
nodes.py		nodes.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Canary-ComfyUI

Installation

Usage

Roadmap

License

Acknowledgements

About

Uh oh!

Releases 1

Packages

Languages

License

Juste-Leo2/Canary-ComfyUI

Folders and files

Latest commit

History

Repository files navigation

Canary-ComfyUI

Installation

Usage

Roadmap

License

Acknowledgements

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Languages

Packages