This is the development branch of upcoming pyannote.audio
2.0 for which it has been decided to rewrite almost everything from scratch. Highlights of this upcoming release will be:
- a much smaller and cleaner codebase
- Python-first API (the good old pyannote-audio CLI will still be available, though)
- multi-GPU and TPU training thanks to pytorch-lightning
- data augmentation with torch-audiomentations
- huggingface model hosting
- prodigy recipes for audio annotations
- online demo based on streamlit
conda create -n pyannote python=3.8.5
conda activate pyannote
# pyannote.audio relies on torchaudio's soundfile backend, itself relying
# on libsndfile, sometimes tricky to install. This seems to work fine but
# is provided with no guarantee of success:
conda install numpy cffi
conda install libsndfile=1.0.28 -c conda-forge
# until a proper release of pyannote.audio 2.x is available on PyPI,
# install from the `develop` branch:
pip install https://github.com/pyannote/pyannote-audio/archive/develop.zip
For now, this is the closest you can get to an actual documentation.
Experimental protocol is reproducible thanks to pyannote.database
.
Here, we use the AMI "only_words" speaker diarization protocol.
from pyannote.database import get_protocol
ami = get_protocol('AMI.SpeakerDiarization.only_words')
Data augmentation is supported via torch-audiomentations
.
from torch_audiomentations import Compose, ApplyImpulseResponse, AddBackgroundNoise
augmentation = Compose(transforms=[ApplyImpulseResponse(...),
AddBackgroundNoise(...)])
A growing collection of tasks can be addressed. Here, we address speaker segmentation.
from pyannote.audio.tasks import Segmentation
seg = Segmentation(ami, augmentation=augmentation)
A growing collection of model architecture can be used. Here, we use the PyanNet (sincnet + LSTM) architecture.
from pyannote.audio.models.segmentation import PyanNet
model = PyanNet(task=seg)
We benefit from all the nice things that pytorch-lightning
has to offer: distributed (GPU & TPU) training, model checkpointing, logging, etc.
In this example, we don't really use any of this...
from pytorch_lightning import Trainer
trainer = Trainer()
trainer.fit(model)
Predictions are obtained by wrapping the model into the Inference
engine.
from pyannote.audio import Inference
inference = Inference(model)
predictions = inference('audio.wav')
Pretrained models can be shared on Huggingface.co model hub. Here, we download and use a pretrained segmentation model.
inference = Inference('pyannote/segmentation')
predictions = inference('audio.wav')
Fine-tuning is as easy as setting the task
attribute, freezing early layers and training.
Here, we fine-tune on AMI dataset the pretrained segmentation model.
from pyannote.audio import Model
model = Model.from_pretrained('pyannote/segmentation')
model.task = Segmentation(ami)
model.freeze_up_to('sincnet')
trainer.fit(model)
Transfer learning is also supported out of the box. Here, we do transfer learning from segmentation to overlapped speech detection.
from pyannote.audio.tasks import OverlappedSpeechDetection
osd = OverlappedSpeechDetection(ami)
model.task = osd
trainer.fit(model)
Default optimizer (Adam
with default parameters) is automatically set up for you. Customizing optimizer (and scheduler) requires overriding model.configure_optimizers
method:
from types import MethodType
from torch.optim import SGD
from torch.optim.lr_scheduler import ExponentialLR
def configure_optimizers(self):
return {"optimizer": SGD(self.parameters()),
"lr_scheduler": ExponentialLR(optimizer, 0.9)}
model.configure_optimizers = MethodType(configure_optimizers, model)
trainer.fit(model)
The commands below will setup pre-commit hooks and packages needed for developing the pyannote.audio
library.
pip install -e .[dev,testing]
pre-commit install
Tests rely on a set of debugging files available in test/data
directory.
Set PYANNOTE_DATABASE_CONFIG
environment variable to test/data/database.yml
before running tests:
PYANNOTE_DATABASE_CONFIG=tests/data/database.yml pytest