|
| 1 | +# Install BigDL-LLM on Windows for Intel GPU |
| 2 | + |
| 3 | +This guide applies to Intel Core Ultra and Core 12 - 14 gen integrated GPUs, as well as Intel Arc Series GPU. |
| 4 | + |
| 5 | +## Install GPU driver |
| 6 | + |
| 7 | +* Download and Install Visual Studio 2022 Community Edition from the [official Microsoft Visual Studio website](https://visualstudio.microsoft.com/downloads/). Ensure you select the **Desktop development with C++ workload** during the installation process. |
| 8 | + |
| 9 | + > Note: The installation could take around 15 minutes, and requires at least 7GB of free disk space. |
| 10 | + > If you accidentally skip adding the **Desktop development with C++ workload** during the initial setup, you can add it afterward by navigating to **Tools > Get Tools and Features...**. Follow the instructions on [this Microsoft guide](https://learn.microsoft.com/en-us/cpp/build/vscpp-step-0-installation?view=msvc-170#step-4---choose-workloads) to update your installation. |
| 11 | + > |
| 12 | + > <img src="https://llm-assets.readthedocs.io/en/latest/_images/quickstart_windows_gpu_1.png" alt="image-20240221102252560" width=100%; /> |
| 13 | +
|
| 14 | +* Download and install the latest GPU driver from the [official Intel download page](https://www.intel.com/content/www/us/en/download/785597/intel-arc-iris-xe-graphics-windows.html). A system reboot is necessary to apply the changes after the installation is complete. |
| 15 | + |
| 16 | + > Note: the process could take around 10 minutes. After reboot, check for the **Intel Arc Control** application to verify the driver has been installed correctly. If the installation was successful, you should see the **Arc Control** interface similar to the figure below |
| 17 | +
|
| 18 | + > <img src="https://llm-assets.readthedocs.io/en/latest/_images/quickstart_windows_gpu_3.png" width=80%; /> |
| 19 | +
|
| 20 | +* To monitor your GPU's performance and status, you can use either use the **Windows Task Manager** (see the left side of the figure below) or the **Arc Control** application (see the right side of the figure below) or : |
| 21 | + > <img src="https://llm-assets.readthedocs.io/en/latest/_images/quickstart_windows_gpu_4.png" width=70%; /> |
| 22 | +
|
| 23 | +## Setup Python Environment |
| 24 | + |
| 25 | +* Visit [Miniconda installation page](https://docs.anaconda.com/free/miniconda/), download the **Miniconda installer for Windows**, and follow the instructions to complete the installation. |
| 26 | + |
| 27 | + > <img src="https://llm-assets.readthedocs.io/en/latest/_images/quickstart_windows_gpu_5.png" width=50%; /> |
| 28 | +
|
| 29 | +* After installation, open the **Anaconda Prompt**, create a new python environment `llm`: |
| 30 | + ```bash |
| 31 | + conda create -n llm python=3.9 libuv |
| 32 | + ``` |
| 33 | +* Activate the newly created environment `llm`: |
| 34 | + ```bash |
| 35 | + conda activate llm |
| 36 | + ``` |
| 37 | + |
| 38 | +## Install oneAPI |
| 39 | + |
| 40 | +* With the `llm` environment active, use `pip` to install the **OneAPI Base Toolkit**: |
| 41 | + ```bash |
| 42 | + pip install dpcpp-cpp-rt==2024.0.2 mkl-dpcpp==2024.0.0 onednn==2024.0.0 |
| 43 | + ``` |
| 44 | + |
| 45 | +## Install `bigdl-llm` |
| 46 | + |
| 47 | +* With the `llm` environment active, use `pip` to install `bigdl-llm` for GPU: |
| 48 | + ```bash |
| 49 | + pip install --pre --upgrade bigdl-llm[xpu] --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/cn/ |
| 50 | + ``` |
| 51 | + > Note: If there are network issues when installing IPEX, refer to [this guide](https://bigdl.readthedocs.io/en/latest/doc/LLM/Overview/install_gpu.html#install-bigdl-llm-from-wheel) for more details. |
| 52 | +
|
| 53 | +* You can verfy if bigdl-llm is successfully by simply importing a few classes from the library. For example, in the Python interactive shell, execute the following import command: |
| 54 | + ```python |
| 55 | + from bigdl.llm.transformers import AutoModel,AutoModelForCausalLM |
| 56 | + ``` |
| 57 | + |
| 58 | +## A quick example |
| 59 | +* Next step you can start play with a real LLM. We use [phi-1.5](https://huggingface.co/microsoft/phi-1_5) (an 1.3B model) for demostration. You can copy/paste the following code in a python script and run it. |
| 60 | +> Note: to use phi-1.5, you may need to update your transformer version to 4.37.0. |
| 61 | +> ``` |
| 62 | +> pip install -U transformers==4.37.0 |
| 63 | +> ``` |
| 64 | +> Note: when running LLMs on Intel iGPUs for Windows users, we recommend setting `cpu_embedding=True` in the from_pretrained function. |
| 65 | +> This will allow the memory-intensive embedding layer to utilize the CPU instead of iGPU. |
| 66 | +
|
| 67 | + ```python |
| 68 | + import torch |
| 69 | + from bigdl.llm.transformers import AutoModelForCausalLM |
| 70 | + from transformers import AutoTokenizer, GenerationConfig |
| 71 | + generation_config = GenerationConfig(use_cache = True) |
| 72 | + |
| 73 | + tokenizer = AutoTokenizer.from_pretrained("microsoft/phi-1_5", trust_remote_code=True) |
| 74 | + # load Model using bigdl-llm and load it to GPU |
| 75 | + model = AutoModelForCausalLM.from_pretrained( |
| 76 | + "microsoft/phi-1_5", load_in_4bit=True, cpu_embedding=True, trust_remote_code=True) |
| 77 | + model = model.to('xpu') |
| 78 | +
|
| 79 | + # Format the prompt |
| 80 | + question = "What is AI?" |
| 81 | + prompt = " Question:{prompt}\n\n Answer:".format(prompt=question) |
| 82 | + # Generate predicted tokens |
| 83 | + with torch.inference_mode(): |
| 84 | + input_ids = tokenizer.encode(prompt, return_tensors="pt").to('xpu') |
| 85 | + output = model.generate(input_ids, do_sample=False, max_new_tokens=32, generation_config = generation_config).cpu() |
| 86 | + output_str = tokenizer.decode(output[0], skip_special_tokens=True) |
| 87 | + print(output_str) |
| 88 | + ``` |
| 89 | +
|
| 90 | +* An example output on the laptop equipped with i7 11th Gen Intel Core CPU and Iris Xe Graphics iGPU looks like below. |
| 91 | + |
| 92 | +``` |
| 93 | +Question:What is AI? |
| 94 | +Answer: AI stands for Artificial Intelligence, which is the simulation of human intelligence in machines. |
| 95 | +``` |
| 96 | + |
0 commit comments