Intel IPEX-LLM Benchmarking on 5th Gen Xeon Processors

Overview

The main goal of this repository is to evaluate the performance of Intel's 5th Generation Xeon "Emerald Rapids" processors in multimodal Retrieval-Augmented Generation (RAG) scenarios using CPUs. Specifically, this benchmarking focuses on three key models that form the multimodal pipeline:

Embeddings (BAAI/bge-large-en-v1.5): For generating high-quality semantic text representations.
Large Language Model (Llama-3.2-1B-Instruct): A compact instruction-following LLM.
Vision Language Model (Phi-3.5-vision-instruct): Handles tasks that combine visual and textual data.

This repository provides scripts and instructions to measure inference times for these models on both CPU and GPU environments.

Getting Started

Hugging Face Configuration

To use the Llama-3.2-1B-Instruct model, you first need to obtain access via Hugging Face. Follow these steps:

Request access to the model from meta-llama/Llama-3.2-1B-Instruct.
Once access is granted, generate a user access token to authorize model downloads. Refer to the Hugging Face documentation for detailed instructions.

Create a .env file in the llm folder with the following content HF_TOKEN=REPLACE_TOKEN

Main Dependencies

Create a Python 3.11 environment using your preferred environment manager and ensure pip is updated to version 24.2 or later:

python -m pip install --upgrade pip

Benchmarking Models

1. Embeddings

CPU Environment

Install the required dependencies for CPU-based inference:

pip install --pre --upgrade ipex-llm[all] --extra-index-url https://download.pytorch.org/whl/cpu
pip install -r requirements_embeddings_cpu.txt

GPU Environment

Install the required dependencies for GPU-based inference:

pip install -r requirements_embeddings_gpu.txt

Script Execution

To measure inference time for embeddings, run the following script:

python embeddings/main.py

2. Large Language Model and Vision Language Model

CPU Environment

Install the required dependencies for CPU-based inference:

pip install --pre --upgrade ipex-llm[all] --extra-index-url https://download.pytorch.org/whl/cpu
pip install -r requirements_cpu.txt

GPU Environment

Install the required dependencies for GPU-based inference:

pip install -r requirements_gpu.txt

Script Execution

To measure inference time for the large language model, execute the script:

python llm/main.py

To measure inference time for the vision language model, execute the script:

python vlm/main.py

Note: In linux environments execute the following commands before script execution:

> source ipex-llm-init 
> numactl -C 0-NUM_PROCESSORS -m 0 python SCRIPT_PATH

Results and Outputs

The benchmarking scripts will output inference time metrics for each model. These metrics can be used to compare CPU and GPU performance under different configurations.

Additional Resources

For further assistance, please create an issue in this repository.

License

This project is licensed under the MIT License. See the LICENSE file for details.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Intel IPEX-LLM Benchmarking on 5th Gen Xeon Processors

Overview

Getting Started

Hugging Face Configuration

Main Dependencies

Benchmarking Models

1. Embeddings

CPU Environment

GPU Environment

Script Execution

2. Large Language Model and Vision Language Model

CPU Environment

GPU Environment

Script Execution

Results and Outputs

Additional Resources

License

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
embeddings		embeddings
llm		llm
vlm		vlm
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements_cpu.txt		requirements_cpu.txt
requirements_embeddings_cpu.txt		requirements_embeddings_cpu.txt
requirements_embeddings_gpu.txt		requirements_embeddings_gpu.txt
requirements_gpu.txt		requirements_gpu.txt

License

EvergineTeam/IntelCPU

Folders and files

Latest commit

History

Repository files navigation

Intel IPEX-LLM Benchmarking on 5th Gen Xeon Processors

Overview

Getting Started

Hugging Face Configuration

Main Dependencies

Benchmarking Models

1. Embeddings

CPU Environment

GPU Environment

Script Execution

2. Large Language Model and Vision Language Model

CPU Environment

GPU Environment

Script Execution

Results and Outputs

Additional Resources

License

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages