Neural speaker diarization with pyannote.audio

This is the development branch of upcoming pyannote.audio 2.0 for which it has been decided to rewrite almost everything from scratch. Highlights of this upcoming release will be:

a much smaller and cleaner codebase
Python-first API (the good old pyannote-audio CLI will still be available, though)
multi-GPU and TPU training thanks to pytorch-lightning
data augmentation with torch-audiomentations
huggingface model hosting
prodigy recipes for audio annotations
online demo based on streamlit

Installation

conda create -n pyannote python=3.8.5
conda activate pyannote

# pyannote.audio relies on torchaudio's soundfile backend, itself relying
# on libsndfile, sometimes tricky to install. This seems to work fine but
# is provided with no guarantee of success:
conda install numpy cffi
conda install libsndfile=1.0.28 -c conda-forge

# until a proper release of pyannote.audio 2.x is available on PyPI,
# install from the `develop` branch:
pip install https://github.com/pyannote/pyannote-audio/archive/develop.zip

pyannote.audio 101

For now, this is the closest you can get to an actual documentation.

Experimental protocol is reproducible thanks to pyannote.database. Here, we use the AMI "only_words" speaker diarization protocol.

from pyannote.database import get_protocol
ami = get_protocol('AMI.SpeakerDiarization.only_words')

Data augmentation is supported via torch-audiomentations.

from torch_audiomentations import Compose, ApplyImpulseResponse, AddBackgroundNoise
augmentation = Compose(transforms=[ApplyImpulseResponse(...),
                                   AddBackgroundNoise(...)])

A growing collection of tasks can be addressed. Here, we address speaker segmentation.

from pyannote.audio.tasks import Segmentation
seg = Segmentation(ami, augmentation=augmentation)

A growing collection of model architecture can be used. Here, we use the PyanNet (sincnet + LSTM) architecture.

from pyannote.audio.models.segmentation import PyanNet
model = PyanNet(task=seg)

We benefit from all the nice things that pytorch-lightning has to offer: distributed (GPU & TPU) training, model checkpointing, logging, etc. In this example, we don't really use any of this...

from pytorch_lightning import Trainer
trainer = Trainer()
trainer.fit(model)

Predictions are obtained by wrapping the model into the Inference engine.

from pyannote.audio import Inference
inference = Inference(model)
predictions = inference('audio.wav')

Pretrained models can be shared on Huggingface.co model hub. Here, we download and use a pretrained segmentation model.

inference = Inference('pyannote/segmentation')
predictions = inference('audio.wav')

Fine-tuning is as easy as setting the task attribute, freezing early layers and training. Here, we fine-tune on AMI dataset the pretrained segmentation model.

from pyannote.audio import Model
model = Model.from_pretrained('pyannote/segmentation')
model.task = Segmentation(ami)
model.freeze_up_to('sincnet')
trainer.fit(model)

Transfer learning is also supported out of the box. Here, we do transfer learning from segmentation to overlapped speech detection.

from pyannote.audio.tasks import OverlappedSpeechDetection
osd = OverlappedSpeechDetection(ami)
model.task = osd
trainer.fit(model)

Default optimizer (Adam with default parameters) is automatically set up for you. Customizing optimizer (and scheduler) requires overriding model.configure_optimizers method:

from types import MethodType
from torch.optim import SGD
from torch.optim.lr_scheduler import ExponentialLR
def configure_optimizers(self):
    return {"optimizer": SGD(self.parameters()),
            "lr_scheduler": ExponentialLR(optimizer, 0.9)}
model.configure_optimizers = MethodType(configure_optimizers, model)
trainer.fit(model)

Contributing

The commands below will setup pre-commit hooks and packages needed for developing the pyannote.audio library.

pip install -e .[dev,testing]
pre-commit install

Testing

Tests rely on a set of debugging files available in test/data directory. Set PYANNOTE_DATABASE_CONFIG environment variable to test/data/database.yml before running tests:

PYANNOTE_DATABASE_CONFIG=tests/data/database.yml pytest

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
.github		.github
doc		doc
notebook		notebook
pyannote		pyannote
tests		tests
tutorials		tutorials
.gitattributes		.gitattributes
.gitignore		.gitignore
.gitmodules		.gitmodules
.pre-commit-config.yaml		.pre-commit-config.yaml
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
README.md		README.md
codecov.yml		codecov.yml
environment.yaml		environment.yaml
requirements.txt		requirements.txt
setup.cfg		setup.cfg
setup.py		setup.py
version.txt		version.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Neural speaker diarization with pyannote.audio

Installation

pyannote.audio 101

Contributing

Testing

About

Releases

Sponsor this project

Packages

Languages

License

suissemaxx/pyannote-audio-develop_colab

Folders and files

Latest commit

History

Repository files navigation

Neural speaker diarization with pyannote.audio

Installation

pyannote.audio 101

Contributing

Testing

About

Resources

License

Stars

Watchers

Forks

Releases

Sponsor this project

Packages 0

Languages

Packages