Skip to content

Commit

Permalink
[Community] Support llama-index-embeddings-ipex-llm for Intel GPUs (#…
Browse files Browse the repository at this point in the history
  • Loading branch information
Oscilloscope98 authored Apr 25, 2024
1 parent 5cb907d commit c80e56a
Show file tree
Hide file tree
Showing 7 changed files with 207 additions and 16 deletions.
6 changes: 5 additions & 1 deletion docs/docs/examples/embeddings/ipex_llm.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -4,12 +4,16 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"# Local Embeddings with IPEX-LLM\n",
"# Local Embeddings with IPEX-LLM on Intel CPU\n",
"\n",
"> [IPEX-LLM](https://github.com/intel-analytics/ipex-llm/) is a PyTorch library for running LLM on Intel CPU and GPU (e.g., local PC with iGPU, discrete GPU such as Arc, Flex and Max) with very low latency.\n",
"\n",
"This example goes over how to use LlamaIndex to conduct embedding tasks with `ipex-llm` optimizations on Intel CPU. This would be helpful in applications such as RAG, document QA, etc.\n",
"\n",
"> **Note**\n",
">\n",
"> You could refer to [here](https://github.com/run-llama/llama_index/tree/main/llama-index-integrations/embeddings/llama-index-embeddings-ipex-llm/examples) for full examples of `IpexLLMEmbedding`. Please note that for running on Intel CPU, please specify `-d 'cpu'` in command argument when running the examples.\n",
"\n",
"## Install `llama-index-embeddings-ipex-llm`\n",
"\n",
"This will also install `ipex-llm` and its dependencies."
Expand Down
112 changes: 112 additions & 0 deletions docs/docs/examples/embeddings/ipex_llm_gpu.ipynb
Original file line number Diff line number Diff line change
@@ -0,0 +1,112 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Local Embeddings with IPEX-LLM on Intel GPU\n",
"\n",
"> [IPEX-LLM](https://github.com/intel-analytics/ipex-llm/) is a PyTorch library for running LLM on Intel CPU and GPU (e.g., local PC with iGPU, discrete GPU such as Arc, Flex and Max) with very low latency.\n",
"\n",
"This example goes over how to use LlamaIndex to conduct embedding tasks with `ipex-llm` optimizations on Intel GPU. This would be helpful in applications such as RAG, document QA, etc.\n",
"\n",
"> **Note**\n",
">\n",
"> You could refer to [here](https://github.com/run-llama/llama_index/tree/main/llama-index-integrations/embeddings/llama-index-embeddings-ipex-llm/examples) for full examples of `IpexLLMEmbedding`. Please note that for running on Intel GPU, please specify `-d 'xpu'` in command argument when running the examples.\n",
"\n",
"## Install Prerequisites\n",
"To benefit from IPEX-LLM on Intel GPUs, there are several prerequisite steps for tools installation and environment preparation.\n",
"\n",
"If you are a Windows user, visit the [Install IPEX-LLM on Windows with Intel GPU Guide](https://ipex-llm.readthedocs.io/en/latest/doc/LLM/Quickstart/install_windows_gpu.html), and follow [**Install Prerequisites**](https://ipex-llm.readthedocs.io/en/latest/doc/LLM/Quickstart/install_windows_gpu.html#install-prerequisites) to install Visual Studio 2022, GPU driver, Conda, and Intel® oneAPI Base Toolkit 2024.0.\n",
"\n",
"If you are a Linux user, visit the [Install IPEX-LLM on Linux with Intel GPU](https://ipex-llm.readthedocs.io/en/latest/doc/LLM/Quickstart/install_linux_gpu.html), and follow [**Install Prerequisites**](https://ipex-llm.readthedocs.io/en/latest/doc/LLM/Quickstart/install_linux_gpu.html#install-prerequisites) to install GPU driver, Intel® oneAPI Base Toolkit 2024.0, and Conda.\n",
"\n",
"## Install `llama-index-embeddings-ipex-llm`\n",
"\n",
"After the prerequisites installation, you should have created a conda environment with all prerequisites installed, activate your conda environment and install `llama-index-embeddings-ipex-llm` as follows:\n",
"\n",
"```bash\n",
"conda activate <your-conda-env-name>\n",
"\n",
"pip install llama-index-embeddings-ipex-llm[xpu] --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/\n",
"```\n",
"This step will also install `ipex-llm` and its dependencies.\n",
"\n",
"> **Note**\n",
">\n",
"> You can also use `https://pytorch-extension.intel.com/release-whl/stable/xpu/cn/` as the `extra-indel-url`.\n",
"\n",
"\n",
"## Runtime Configuration\n",
"\n",
"For optimal performance, it is recommended to set several environment variables based on your device:\n",
"\n",
"### For Windows Users with Intel Core Ultra integrated GPU\n",
"\n",
"In Anaconda Prompt:\n",
"\n",
"```\n",
"set SYCL_CACHE_PERSISTENT=1\n",
"set BIGDL_LLM_XMX_DISABLED=1\n",
"```\n",
"\n",
"### For Linux Users with Intel Arc A-Series GPU\n",
"\n",
"```bash\n",
"# Configure oneAPI environment variables. Required step for APT or offline installed oneAPI.\n",
"# Skip this step for PIP-installed oneAPI since the environment has already been configured in LD_LIBRARY_PATH.\n",
"source /opt/intel/oneapi/setvars.sh\n",
"\n",
"# Recommended Environment Variables for optimal performance\n",
"export USE_XETLA=OFF\n",
"export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1\n",
"export SYCL_CACHE_PERSISTENT=1\n",
"```\n",
"\n",
"> **Note**\n",
">\n",
"> For the first time that each model runs on Intel iGPU/Intel Arc A300-Series or Pro A60, it may take several minutes to compile.\n",
">\n",
"> For other GPU type, please refer to [here](https://ipex-llm.readthedocs.io/en/latest/doc/LLM/Overview/install_gpu.html#runtime-configuration) for Windows users, and [here](https://ipex-llm.readthedocs.io/en/latest/doc/LLM/Overview/install_gpu.html#id5) for Linux users.\n",
"\n",
"## `IpexLLMEmbedding`\n",
"\n",
"Setting `device=\"xpu\"` when initializing `IpexLLMEmbedding` will put the embedding model on Intel GPU and benefit from IPEX-LLM optimizations:\n",
"\n",
"```python\n",
"from llama_index.embeddings.ipex_llm import IpexLLMEmbedding\n",
"\n",
"embedding_model = IpexLLMEmbedding(\n",
" model_name=\"BAAI/bge-large-en-v1.5\", device=\"xpu\"\n",
")\n",
"```\n",
"\n",
"> Please note that `IpexLLMEmbedding` currently only provides optimization for Hugging Face Bge models.\n",
"\n",
"You could then conduct the embedding tasks as normal:\n",
"\n",
"```python\n",
"sentence = \"IPEX-LLM is a PyTorch library for running LLM on Intel CPU and GPU (e.g., local PC with iGPU, discrete GPU such as Arc, Flex and Max) with very low latency.\"\n",
"query = \"What is IPEX-LLM?\"\n",
"\n",
"text_embedding = embedding_model.get_text_embedding(sentence)\n",
"print(f\"embedding[:10]: {text_embedding[:10]}\")\n",
"\n",
"text_embeddings = embedding_model.get_text_embedding_batch([sentence, query])\n",
"print(f\"text_embeddings[0][:10]: {text_embeddings[0][:10]}\")\n",
"print(f\"text_embeddings[1][:10]: {text_embeddings[1][:10]}\")\n",
"\n",
"query_embedding = embedding_model.get_query_embedding(query)\n",
"print(f\"query_embedding[:10]: {query_embedding[:10]}\")\n",
"```"
]
}
],
"metadata": {
"language_info": {
"name": "python"
}
},
"nbformat": 4,
"nbformat_minor": 2
}
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
python_sources()
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
# IpexLLMEmbedding Examples

This folder contains examples showcasing how to use LlamaIndex with `ipex-llm` Embeddings integration `llama_index.embeddings.ipex_llm.IpexLLMEmbedding` on Intel CPU and GPU.

## Installation

### On Intel CPU

Please refer to [here](https://docs.llamaindex.ai/en/stable/examples/embeddings/ipex_llm/#install-llama-index-embeddings-ipex-llm) for installation details.

### On Intel GPU

Please refer to [here](https://docs.llamaindex.ai/en/stable/examples/embeddings/ipex_llm_gpu/) for install prerequisites, `llama-index-embeddings-ipex-llm` installation, and runtime configuration.

## List of Examples

### Basic Usage Example

The example [basic.py](./basic.py) shows how to run `IpexLLMEmbedding` on Intel CPU or GPU and conduct embedding tasks such as text and query embedding. Run the example as following:

```bash
python basic.py -m <path_to_model> -d <cpu_or_xpu> -t <text_to_embed> -q <query_to_embed>
```
Original file line number Diff line number Diff line change
@@ -0,0 +1,54 @@
import argparse
from llama_index.embeddings.ipex_llm import IpexLLMEmbedding

if __name__ == "__main__":
parser = argparse.ArgumentParser(description="IpexLLMEmbedding Basic Usage Example")
parser.add_argument(
"--model-name",
"-m",
type=str,
default="BAAI/bge-large-en-v1.5",
help="The huggingface repo id for the embedding model to be downloaded"
", or the path to the huggingface checkpoint folder",
)
parser.add_argument(
"--device",
"-d",
type=str,
default="cpu",
choices=["cpu", "xpu"],
help="The device (Intel CPU or Intel GPU) the embedding model runs on",
)
parser.add_argument(
"--text",
"-t",
type=str,
default="IPEX-LLM is a PyTorch library for running LLM on Intel CPU and GPU (e.g., local PC with iGPU, discrete GPU such as Arc, Flex and Max) with very low latency.",
help="The sentence you prefer for text embedding",
)
parser.add_argument(
"--query",
"-q",
type=str,
default="What is IPEX-LLM?",
help="The sentence you prefer for query embedding",
)

args = parser.parse_args()
model_name = args.model_name
device = args.device
text = args.text
query = args.query

# load the embedding model on Intel GPU with IPEX-LLM optimizations
embedding_model = IpexLLMEmbedding(model_name=model_name, device=device)

text_embedding = embedding_model.get_text_embedding(text)
print(f"embedding[:10]: {text_embedding[:10]}")

text_embeddings = embedding_model.get_text_embedding_batch([text, query])
print(f"text_embeddings[0][:10]: {text_embeddings[0][:10]}")
print(f"text_embeddings[1][:10]: {text_embeddings[1][:10]}")

query_embedding = embedding_model.get_query_embedding(query)
print(f"query_embedding[:10]: {query_embedding[:10]}")
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,7 @@

import logging
from typing import Any, List, Optional
from ipex_llm.transformers.convert import _optimize_pre, _optimize_post

from llama_index.core.base.embeddings.base import (
DEFAULT_EMBED_BATCH_SIZE,
Expand Down Expand Up @@ -83,11 +84,10 @@ def __init__(
**model_kwargs,
)

from ipex_llm.transformers.convert import _optimize_pre, _optimize_post

if self._device == "cpu":
self._model = _optimize_pre(self._model)
self._model = _optimize_post(self._model)
# TODO: optimize using ipex-llm optimize_model
elif self._device == "xpu":
self._model = _optimize_pre(self._model)
self._model = _optimize_post(self._model)
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -30,23 +30,20 @@ license = "MIT"
name = "llama-index-embeddings-ipex-llm"
packages = [{include = "llama_index/"}]
readme = "README.md"
version = "0.1.0"
version = "0.1.1"

[tool.poetry.dependencies]
python = ">=3.9,<4.0"
llama-index-core = "^0.10.0"
ipex-llm = {allow-prereleases = true, version = ">=2.1.0b20240409"}
py-cpuinfo = "*"
protobuf = "*"
intel-openmp = {markers = "platform_machine=='x86_64' or platform_machine == 'AMD64'", version = "*"}
mpmath = "<=1.3.0"
numpy = "<=1.26.4"
torch = "<2.2.0"
transformers = ">=4.34.0,<4.39.0"
sentencepiece = "*"
accelerate = "0.21.0"
tabulate = "*"
sentence-transformers = "^2.6.1"
ipex-llm = {allow-prereleases = true, extras = ["llama-index"], version = ">=2.1.0b20240423"}
torch = {optional = true, version = "2.1.0a0"}
torchvision = {optional = true, version = "0.16.0a0"}
intel_extension_for_pytorch = {optional = true, version = "2.1.10+xpu"}
bigdl-core-xe-21 = {optional = true, version = "*"}
bigdl-core-xe-esimd-21 = {optional = true, version = "*"}

[tool.poetry.extras]
xpu = ["bigdl-core-xe-21", "bigdl-core-xe-esimd-21", "intel_extension_for_pytorch", "torch", "torchvision"]

[tool.poetry.group.dev.dependencies]
black = {extras = ["jupyter"], version = "<=23.9.1,>=23.7.0"}
Expand Down

0 comments on commit c80e56a

Please sign in to comment.