- Storage >= 16GB
- RAM >= 8GB
- CPU cores >= 4
- Core Frequency >= 3GHz
python3 -m venv agents_env
source agents_env/bin/activate
python -m pip install --upgrade pip
pip install -r requirements-agents.txt
huggingface-cli download ojjsaw/reranking_model
huggingface-cli download ojjsaw/embedding_model
huggingface-cli download OpenVINO/Phi-3-mini-128k-instruct-int4-ov
- Ideal Setup: Run LLM model on GPU and other 2 models on CPU (default all models run on CPU)
- To switch to gpu, update LLM_DEVICE to "GPU" in llamaindex-minimal.py
uvicorn llamaindex-minimal:app
- To run via Swagger (non streaming responses), navigate to http://127.0.0.1:8000/docs
- To use upload api via terminal CURL:
curl -X 'POST' \
'http://127.0.0.1:8000/upload-pdf-vector-index' \
-H 'accept: application/json' \
-H 'Content-Type: multipart/form-data' \
-F '[email protected];type=application/pdf'
- To run ask-me-anything api via terminal preferred for streaming responses.
Note: Use -N param to prevent buffer cache and see live chunked response data
curl -N -X 'POST' \
'http://127.0.0.1:8000/ask-me-anything' \
-H 'accept: application/json' \
-H 'Content-Type: application/json' \
-d '{
"question": "What is the range of Thermal Design Power (TDP) for Intel Xeon 6 processors with E-cores?"
}'
- Repo Signing
wget -qO - https://repositories.intel.com/gpu/intel-graphics.key |
sudo gpg --yes --dearmor --output /usr/share/keyrings/intel-graphics.gpg
- APT Repo Setup
wget -qO - https://repositories.intel.com/gpu/intel-graphics.key | \
sudo gpg --yes --dearmor --output /usr/share/keyrings/intel-graphics.gpg
echo "deb [arch=amd64,i386 signed-by=/usr/share/keyrings/intel-graphics.gpg] https://repositories.intel.com/gpu/ubuntu jammy client" | \
sudo tee /etc/apt/sources.list.d/intel-gpu-jammy.list
sudo apt update
- Install minimal GPU deps for AI workload
apt-get install -y ocl-icd-libopencl1 intel-opencl-icd intel-level-zero-gpu level-zero
Troubleshooting/alternate OS Steps: