Skip to content

Latest commit

 

History

History
75 lines (47 loc) · 2.6 KB

README.md

File metadata and controls

75 lines (47 loc) · 2.6 KB

Realtime onscreen transcription with Whisper

Transcribe your speech or the audio playing on your computer with Whisper in realtime, and show the captions on your screen.

demo.mp4

Installation

Install the following packages:

$ pip install -r requirement.txt

Usage

Transcribe your speech

Run the following command to start transcribing your speech:

$ python transcribe.py --input-provider speech-recognition --model base --no-faster-whisper

A list of audio input device will be displayed. Choose your microphone to start transcribing.

Some options:

  1. You can choose the --input-provider from "speech-recognition" and "pyaudio". The difference is speech-recognition will surpress silence input.

  2. To run with faster whisper, omit the --no-faster-whisper option. Note for Cuda 12.x, you need to update your LD_LIBRARY_PATH, see Troubleshoot - 1.

  3. For better percision, use the --language option to specify the input language (in ISO 639-1 codes).

Transcribe the audio output on your computer

  1. (Optional) Setup monitor device for audio output.

    Setup a loopback device for your audio output. Skip this step if you already setup a monitor device in other way.

    First list available devices for monitoring (this will also list your microphone).

    $ pactl list sources short

    Example output:

    2   alsa_input.microphone   module-alsa-card.c    s16le 1ch 44100Hz   SUSPENDED
    30   alsa_output.hdmi-stereo.monitor   module-alsa-card.c    s16le 2ch 44100Hz RUNNING
    

    Then set the pulse source to your chosen device, for example "alsa_output.hdmi-stereo.monitor".

    $ export PULSE_SOURCE=alsa_output.hdmi-stereo.monitor
  2. Start transcribing.

    For transcribing from the device chosen in step 1, use "pulse" as input.

    $ python transcribe.py --input pulse --input-provider speech-recognition --model base --no-faster-whisper

Troubleshoot

  1. To run with faster whisper with Cuda 12.x, udpate your LD_LIBRARY_PATH as follow.

    $ export pyvenv_path=YOUR_VENV_PATH # e.g. $HOME/.pyvenv/onscreen-transcription
    $ export pyvenv_py_version=YOUR_PYTHON_VERSION # python3.10
    $ export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$pyvenv_path/lib64/$pyvenv_py_version/site-packages/nvidia/cublas/lib:$pyvenv_path/lib64/$pyvenv_py_version/site-packages/nvidia/cudnn/lib