Skip to content

suissemaxx/pyannote-audio-develop_colab

Repository files navigation

Neural speaker diarization with pyannote.audio

This is the development branch of upcoming pyannote.audio 2.0 for which it has been decided to rewrite almost everything from scratch. Highlights of this upcoming release will be:

Installation

conda create -n pyannote python=3.8.5
conda activate pyannote

# pyannote.audio relies on torchaudio's soundfile backend, itself relying
# on libsndfile, sometimes tricky to install. This seems to work fine but
# is provided with no guarantee of success:
conda install numpy cffi
conda install libsndfile=1.0.28 -c conda-forge

# until a proper release of pyannote.audio 2.x is available on PyPI,
# install from the `develop` branch:
pip install https://github.com/pyannote/pyannote-audio/archive/develop.zip

pyannote.audio 101

For now, this is the closest you can get to an actual documentation.

Experimental protocol is reproducible thanks to pyannote.database. Here, we use the AMI "only_words" speaker diarization protocol.

from pyannote.database import get_protocol
ami = get_protocol('AMI.SpeakerDiarization.only_words')

Data augmentation is supported via torch-audiomentations.

from torch_audiomentations import Compose, ApplyImpulseResponse, AddBackgroundNoise
augmentation = Compose(transforms=[ApplyImpulseResponse(...),
                                   AddBackgroundNoise(...)])

A growing collection of tasks can be addressed. Here, we address speaker segmentation.

from pyannote.audio.tasks import Segmentation
seg = Segmentation(ami, augmentation=augmentation)

A growing collection of model architecture can be used. Here, we use the PyanNet (sincnet + LSTM) architecture.

from pyannote.audio.models.segmentation import PyanNet
model = PyanNet(task=seg)

We benefit from all the nice things that pytorch-lightning has to offer: distributed (GPU & TPU) training, model checkpointing, logging, etc. In this example, we don't really use any of this...

from pytorch_lightning import Trainer
trainer = Trainer()
trainer.fit(model)

Predictions are obtained by wrapping the model into the Inference engine.

from pyannote.audio import Inference
inference = Inference(model)
predictions = inference('audio.wav')

Pretrained models can be shared on Huggingface.co model hub. Here, we download and use a pretrained segmentation model.

inference = Inference('pyannote/segmentation')
predictions = inference('audio.wav')

Fine-tuning is as easy as setting the task attribute, freezing early layers and training. Here, we fine-tune on AMI dataset the pretrained segmentation model.

from pyannote.audio import Model
model = Model.from_pretrained('pyannote/segmentation')
model.task = Segmentation(ami)
model.freeze_up_to('sincnet')
trainer.fit(model)

Transfer learning is also supported out of the box. Here, we do transfer learning from segmentation to overlapped speech detection.

from pyannote.audio.tasks import OverlappedSpeechDetection
osd = OverlappedSpeechDetection(ami)
model.task = osd
trainer.fit(model)

Default optimizer (Adam with default parameters) is automatically set up for you. Customizing optimizer (and scheduler) requires overriding model.configure_optimizers method:

from types import MethodType
from torch.optim import SGD
from torch.optim.lr_scheduler import ExponentialLR
def configure_optimizers(self):
    return {"optimizer": SGD(self.parameters()),
            "lr_scheduler": ExponentialLR(optimizer, 0.9)}
model.configure_optimizers = MethodType(configure_optimizers, model)
trainer.fit(model)

Contributing

The commands below will setup pre-commit hooks and packages needed for developing the pyannote.audio library.

pip install -e .[dev,testing]
pre-commit install

Testing

Tests rely on a set of debugging files available in test/data directory. Set PYANNOTE_DATABASE_CONFIG environment variable to test/data/database.yml before running tests:

PYANNOTE_DATABASE_CONFIG=tests/data/database.yml pytest

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Sponsor this project

 

Packages

No packages published