⚡ FocalCodec

A low-bitrate single-codebook 16 kHz speech codec based on focal modulation.

📜 Preprint: https://arxiv.org/abs/2502.04465
🌐 Project Page: https://lucadellalib.github.io/focalcodec-web/
🔊 Downstream Tasks: https://github.com/lucadellalib/audiocodecs

📌 Available Checkpoints

Checkpoint	Token Rate (Hz)	Bitrate (kbps)	Dataset
lucadellalib/focalcodec_50hz	50.0	0.65	LibriTTS960
lucadellalib/focalcodec_25hz	25.0	0.33	LibriTTS960
lucadellalib/focalcodec_12_5hz	12.5	0.16	LibriTTS960

🛠️️ Installation

First of all, install Python 3.8 or later. Then, open a terminal and run:

pip install huggingface-hub safetensors soundfile torch torchaudio

▶️ Quickstart

NOTE: the audio-samples directory contains audio samples that you can download and use to test the codec.

You can easily load the model using torch.hub without cloning the repository:

import torch
import torchaudio

# Load FocalCodec model
config = "lucadellalib/focalcodec_50hz"
codec = torch.hub.load(
    "lucadellalib/focalcodec", "focalcodec", config=config, force_reload=True
)
codec.eval().requires_grad_(False)

# Load and preprocess the input audio
audio_file = "audio-samples/librispeech-dev-clean/251-118436-0003.wav"
sig, sample_rate = torchaudio.load(audio_file)
sig = torchaudio.functional.resample(sig, sample_rate, codec.sample_rate)

# Encode audio into tokens
toks = codec.sig_to_toks(sig)  # Shape: (batch, time)
print(toks.shape)
print(toks)

# Convert tokens to their corresponding binary spherical codes
codes = codec.toks_to_codes(toks)  # Shape: (batch, time, log2 codebook_size)
print(codes.shape)
print(codes)

# Decode tokens back into a waveform
rec_sig = codec.toks_to_sig(toks)

# Save the reconstructed audio
rec_sig = torchaudio.functional.resample(rec_sig, codec.sample_rate, sample_rate)
torchaudio.save("reconstruction.wav", rec_sig, sample_rate)

Alternatively, you can install FocalCodec as a standard Python package using pip:

pip install focalcodec@git+https://github.com/lucadellalib/focalcodec.git@main#egg=focalcodec

Once installed, you can import it in your scripts:

import focalcodec

config = "lucadellalib/focalcodec_50hz"
codec = focalcodec.FocalCodec.from_pretrained(config)

Check the code documentation for more details on model usage and available configurations.

🎤 Running the Demo Script

Clone or download and extract the repository, navigate to <path-to-repository>, open a terminal and run:

Speech Resynthesis

python demo.py \
--input_file audio-samples/librispeech-dev-clean/251-118436-0003.wav \
--output_file reconstruction.wav

Voice Conversion

python demo.py \
--input_file audio-samples/librispeech-dev-clean/251-118436-0003.wav \
--output_file reconstruction.wav \
--reference_files audio-samples/librispeech-dev-clean/84

@ Citing

@article{dellalibera2025focalcodec,
    title   = {{FocalCodec}: Low-Bitrate Speech Coding via Focal Modulation Networks},
    author  = {Luca {Della Libera} and Francesco Paissan and Cem Subakan and Mirco Ravanelli},
    journal = {arXiv preprint arXiv:2502.04465},
    year    = {2025},
}

📧 Contact

[email protected]

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
audio-samples		audio-samples
docs		docs
focalcodec		focalcodec
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
NOTICE		NOTICE
README.md		README.md
demo.py		demo.py
hubconf.py		hubconf.py
requirements.txt		requirements.txt
setup.cfg		setup.cfg
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

⚡ FocalCodec

📌 Available Checkpoints

🛠️️ Installation

▶️ Quickstart

🎤 Running the Demo Script

@ Citing

📧 Contact

About

Releases

Languages

License

lucadellalib/focalcodec

Folders and files

Latest commit

History

Repository files navigation

⚡ FocalCodec

📌 Available Checkpoints

🛠️️ Installation

▶️ Quickstart

🎤 Running the Demo Script

@ Citing

📧 Contact

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Languages