Skip to content

Commit

Permalink
Merge branch 'v3' of https://github.com/m-bain/whisperX into v3
Browse files Browse the repository at this point in the history
Conflicts:
	setup.py
  • Loading branch information
m-bain committed May 13, 2023
2 parents fd8f100 + 7ad554c commit 9ffb7e7
Show file tree
Hide file tree
Showing 2 changed files with 12 additions and 2 deletions.
3 changes: 2 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
@@ -1,2 +1,3 @@
whisperx.egg-info/
**/__pycache__/
**/__pycache__/
.ipynb_checkpoints
11 changes: 10 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -52,6 +52,12 @@ This repository provides fast automatic speech recognition (70x realtime with la

**Speaker Diarization** is the process of partitioning an audio stream containing human speech into homogeneous segments according to the identity of each speaker.

- v3 pre-release [this branch](https://github.com/m-bain/whisperX/tree/v3) *70x speed-up open-sourced. Using batched whisper with faster-whisper backend*!
- v2 released, code cleanup, imports whisper library. VAD filtering is now turned on by default, as in the paper.
- Paper drop🎓👨‍🏫! Please see our [ArxiV preprint](https://arxiv.org/abs/2303.00747) for benchmarking and details of WhisperX. We also introduce more efficient batch inference resulting in large-v2 with *60-70x REAL TIME speed (not provided in this repo).
- VAD filtering: Voice Activity Detection (VAD) from [Pyannote.audio](https://huggingface.co/pyannote/voice-activity-detection) is used as a preprocessing step to remove reliance on whisper timestamps and only transcribe audio segments containing speech. add `--vad_filter True` flag, increases timestamp accuracy and robustness (requires more GPU mem due to 30s inputs in wav2vec2)
- Character level timestamps (see `*.char.ass` file output)
- Diarization (still in beta, add `--diarize`)

<h2 align="left", id="highlights">New🚨</h2>

Expand Down Expand Up @@ -247,6 +253,7 @@ Bug finding and pull requests are also highly appreciated to keep this project g

<h2 align="left" id="contact">Contact/Support 📇</h2>


Contact [email protected] for queries. WhisperX v4 development is underway with with siginificantly improved diarization. To support v4 and get early access, get in touch.

<a href="https://www.buymeacoffee.com/maxhbain" target="_blank"><img src="https://cdn.buymeacoffee.com/buttons/default-orange.png" alt="Buy Me A Coffee" height="41" width="174"></a>
Expand All @@ -257,7 +264,9 @@ Contact [email protected] for queries. WhisperX v4 development is underway with
This work, and my PhD, is supported by the [VGG (Visual Geometry Group)](https://www.robots.ox.ac.uk/~vgg/) and the University of Oxford.

Of course, this is builds on [openAI's whisper](https://github.com/openai/whisper).
And borrows important alignment code from [PyTorch tutorial on forced alignment](https://pytorch.org/tutorials/intermediate/forced_alignment_with_torchaudio_tutorial.html)
Borrows important alignment code from [PyTorch tutorial on forced alignment](https://pytorch.org/tutorials/intermediate/forced_alignment_with_torchaudio_tutorial.html)
And uses the wonderful pyannote VAD / Diarization https://github.com/pyannote/pyannote-audio


Valuable VAD & Diarization Models from [pyannote audio][https://github.com/pyannote/pyannote-audio]

Expand Down

0 comments on commit 9ffb7e7

Please sign in to comment.