A desktop application for macOS and Windows that utilizes local AI models like whisper.cpp for speech-to-text extraction and llama.cpp for text translation. The app can process both audio and video files, converting speech to text and translating the extracted text into multiple languages without requiring cloud services.
- Audio/Video to Text: Extracts text from audio and video files using whisper.cpp, a local implementation of OpenAI's Whisper for speech-to-text.
- Text Translation: Translates the extracted text into various languages using llama.cpp and a local translation model.
- Completely Offline: All processing is done locally, so there's no need for an internet connection.
- Cross-Platform: Runs on both macOS and Windows.
https://ggml.ggerganov.com/ggml-model-whisper-large-q5_0.bin
https://ggml.ggerganov.com/ggml-model-whisper-medium-q5_0.bin
https://huggingface.co/notjjustnumbers/madlad400-3b-mt-Q4_K_M-GGUF/resolve/main/madlad400-3b-mt-q4_k_m.gguf?download=true
- transript screen auto scroll
- file list
- mp3 format
- save transripts to tmp files with time tag
- save file to srt or vvt
- translate with gpt2 or llama
- setting modal
- adjust llama.cpp to handle file as input
- move translate to app.tsx
- refactoring app.tsx
- add stop button to transcripts and translates
- add license
- finish setting download models etc
This project is licensed under the GNU General Public License v3.0. See the LICENSE file for details.