This project serves as a proof-of-concept for a minimum viable product (MVP) inspired by the capabilities of the OMNI model from ChatGPT. However, it offers a significant advantage: local deployment without restrictions. This empowers users to leverage its functionalities for various purposes, including:
- Translation across languages
- Learning Enhancement by practicing writing, reading, and audio skills
- Customization for tailored use cases
- Language Classification: classify if it's UA or EN for authomatic mode."Lang-id-voxlingua107-ecapa" by speechbrain (supports 100+ lang's).
- Google legacy recognizer: it uses a generic key that works out of the box. It's fast and works well.
- Wav2Vec2-Bert: best (for now) Ukrainian Speech-to-text converter.
- Edge-TTS: best (not generated) voices I can get for free.
- Ollama-python: lib to download and use most popular LLM's.
- Streamlit: for GUI.
- dialogue saved in json: HISTORY.json (only for main.py. For app.py it's only short-term context-window memory).
- Config.py: prompt for best user experience (modify it for your own purposes).
- WSL 22.04.3
- Geforce (mobile) GTX 1050Ti (4GB)
- RAM (32GB)
- Python 3.9+
- Virtual environment (Conda 3.9+)
- CUDA (optional)
- Clone the repository
- Create conda venv (Conda 3.9)
- sudo apt install portaudio19-dev
- Install the required packages: pip install -r requirements.txt
After installation of required libs run main.py for console experience or app.py for GUI lovers.