Vocalize is an interactive tool designed to facilitate audio transcription and text refinement through an AI-driven workflow. It supports recording audio, transcribing it into text, and enriching the transcription with more concise and coherent summaries using artificial intelligence.
Vocalize is a shell script-based application that combines multiple tools such as whisper, tgpt, and whiptail to provide an efficient and user-friendly audio-to-text transcription process. This tool is particularly useful for users who need to transcribe audio files into text, refine the transcribed text with the help of AI, and manage the transcriptions seamlessly within the terminal.
The tool offers two main modes of operation:
- Interactive Mode: This mode presents a menu-driven interface where users can record audio, transcribe it, refine it with AI, and manage saved files easily.
- Minimal Mode: This mode hides most of the UI elements, only showing the audio recording screen, allowing for a more focused experience.
- Audio Recording: Users can record audio directly from their microphone using the
arecord
command. Once recorded, the audio is converted to the appropriate format usingffmpeg
and saved in a temporary location. - Transcription: Using Whisper, the recorded audio is transcribed into text in Portuguese. The transcription can be viewed, copied to the clipboard, or saved as a file.
- Text Enrichment: The transcribed text can be improved with the help of tgpt, which rewrites the transcription to be more concise and clear, adhering to the user-defined prompt for summarization.
- Clipboard Integration: Both the original transcription and the AI-enhanced text can be copied to the clipboard for easy pasting into other applications.
- File Management: The application allows saving the recorded audio and transcriptions, or removing them from the system when no longer needed. It also supports playback of the recorded audio files directly from the terminal using mpv.
- Audio Recording: Record audio directly from the microphone with a simplified interface.
- Transcription: Convert audio to text using Whisper with support for Portuguese (Brazil).
- Text Enrichment: Improve the quality of the transcribed text with an AI-based prompt.
- Clipboard Support: Easily copy the transcribed or AI-enhanced text to the system clipboard.
- File Management: Save, delete, or open transcriptions and audio recordings.
- Minimal Mode: Run the application with only the audio recording interface visible for a distraction-free experience.
- Interactive Mode: Use an easy-to-navigate terminal menu to access all features of the tool.
- Whisper.cpp: A speech-to-text engine for audio transcription.
- tgpt: A command-line interface to interact with GPT-based AI models.
- whiptail: A utility for creating dialog boxes in shell scripts.
- mpv: A media player to playback recorded audio.
- arecord: For recording audio from the microphone.
- ffmpeg: To convert audio files into the appropriate format for transcription.
- xclip: A clipboard manager that allows copying text to the system clipboard.
Before installing Vocalize, ensure that you have the necessary dependencies installed on your system. For a Debian-based OS, you can install them using the following commands:
sudo apt update
sudo apt install -y arecord ffmpeg whiptail mpv xclip
- arecord: A command-line sound recorder for capturing audio from your microphone.
- ffmpeg: A powerful tool to process audio and video files.
- whiptail: A tool to create dialog boxes in shell scripts.
- mpv: A media player used to play back the recorded audio.
- xclip: A clipboard manager that allows copying text to the system clipboard.
To install tgpt you can do it by running the following command:
curl -sSL https://raw.githubusercontent.com/aandrew-me/tgpt/main/install | bash -s /usr/local/bin
To install whisper.cpp, you need to first clone the repo:
git clone https://github.com/ggerganov/whisper.cpp.git
Navigate into the directory:
cd whisper.cpp
Then, download one of the Whisper models converted in ggml
. For example:
sh ./models/download-ggml-model.sh ggml-base.en
Now build the main
example and transcribe an audio file like this:
# build the main example
make
Now export the whisper.cpp
path to your $PATH
env var:
export PATH=$PATH:/path/to/whisper.cpp
And then create a WHISPER_MODELS_DIR
variable by exporting the models path:
export WHISPER_MODELS_DIR=$WHISPER_MODELS_DIR:/path/to/whisper.cpp/models
-
Clone the repository or download the source files for Vocalize.
-
Navigate to the Vocalize directory and run the following command to install the tool:
sudo make install
This will install Vocalize into /usr/local/bin
by default. You can specify a different installation path by setting the prefix
variable during installation, like this:
sudo make install prefix=/custom/path
After installation, the tool will be available to use from the terminal as vocalize
.
To remove Vocalize from your system, you can use the following command:
sudo make uninstall
This will remove the vocalize
executable from the installation directory.
(To be added...)
All software is covered by the GNU General Public License v3.0.