Demo python script app to interact with llama.cpp server using whisper API, microphone and webcam devices.
Clone the Llama C++ repository from GitHub:
git clone https://github.com/ggerganov/llama.cpp
cd llama.cpp
Build with make:
make
Or, if you prefer cmake:
cmake --build . --config Release
you need to install these dependencies in your computer: ffmpeg and portaudio
brew install ffmpeg portaudio
Also be sure to provide permissions to the terminal in the Security & Privacy > Privacy options
- Download from Hugging Face - mys/ggml_bakllava-1 this 2 files:
- ggml-model-q4_k.gguf (or any other quantized model) - only one is required!
- mmproj-model-f16.gguf
-
Copy the paths of those 2 files.
-
Run this in the llama.cpp repository (replace YOUR_PATH with the paths to the files you downloaded):
./server -m YOUR_PATH/ggml-model-q4_k.gguf --mmproj YOUR_PATH/mmproj-model-f16.gguf -ngl 1
server.exe -m REPLACE_WITH_YOUR_PATH\ggml-model-q4_k.gguf --mmproj REPLACE_WITH_YOUR_PATH\mmproj-model-f16.gguf -ngl 1
-
The llama server is now up and running!
⚠️ NOTE: Keep the server running in the background. -
Let's run the script to use the webcam and microphone
Open a new terminal window and clone the demo app:
git clone https://github.com/herrera-luis/vision-core-ai.git
cd vision-core-ai
pip install -r requirements.txt
python main.py
When the application is running you need to press the keys i
or c
to enable the recording and a second time the same key to stop it
i
will use your webcamc
will use chat