Effortless speech-to-text transcription for Ubuntu desktop.
Voice Transcribe is a simple, keyboard-driven solution for transcribing spoken words to text. Powered by OpenAI's Whisper model for accurate speech recognition, this tool allows users to trigger transcription with a global hotkey, automatically inserting the transcribed text into any active application.
- Global hotkey: Trigger transcription with a customizable hotkey combination (Shift + Windows key by default).
- Accurate speech recognition: Powered by the Whisper model for high-quality transcription.
- Automatic text insertion: Transcribed text is automatically inserted into the active application at the current cursor position.
- Ubuntu desktop integration: Seamlessly integrates with the Ubuntu desktop environment.
- Ubuntu Linux (Tested on Ubuntu Desktop)
- Python 3.9+
- Conda (for virtual environment management)
- Docker (optional, for containerized deployment)
-
Clone the repository:
git clone https://github.com/your-username/voice-transcribe.git cd voice-transcribe
-
Set up Conda environment:
Install dependencies using the provided Conda environment:
conda env create -f environment.yml conda activate voice_transcribe
-
Install system dependencies:
Ensure
xdotool
is installed to allow text insertion:sudo apt-get install xdotool
-
Run the application:
python hotkey_listener.py
This will start the application, allowing it to listen for the hotkey combination and perform transcription.
You can also run the application in a Docker container:
-
Build the Docker image:
docker build -t voice_transcribe:latest .
-
Run the Docker container:
docker run -d \ --name voice_transcribe \ --device /dev/snd \ -v /tmp/.X11-unix:/tmp/.X11-unix \ -e DISPLAY=$DISPLAY \ -v $XAUTHORITY:/root/.Xauthority \ --network host \ --privileged \ voice_transcribe:latest
- Conda virtual environment: Dependencies are managed with Conda for consistent development environments (see
environment.yml
). - Docker containerization: The application can be containerized and deployed using Docker (see
Dockerfile
).
Contributions are welcome! Please see the CONTRIBUTING.md
file for guidelines on how to contribute to this project.
[Insert license information, e.g., MIT License]
- Whisper model: This project uses the Whisper model for speech recognition, developed by OpenAI.
Replace your-username
with your GitHub username and update the script name or license information accordingly.