You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/tutorials/known_issues.md
+13-16Lines changed: 13 additions & 16 deletions
Original file line number
Diff line number
Diff line change
@@ -11,17 +11,8 @@ Troubleshooting
11
11
Optimization for Horovod\* at the end of the execution and triggers this error.
12
12
-**Solution**: Do `import intel_extension_for_pytorch` before `import horovod.torch as hvd`.
13
13
-**Problem**: Number of dpcpp devices should be greater than zero.
14
-
-**Cause**: If you use Intel® Extension for PyTorch\* in a conda environment, you might encounter this error. Conda also ships the libstdc++.so dynamic library file that may conflict with the one shipped
15
-
in the OS.
14
+
-**Cause**: If you use Intel® Extension for PyTorch\* in a conda environment, you might encounter this error. Conda also ships the libstdc++.so dynamic library file that may conflict with the one shipped in the OS.
16
15
-**Solution**: Export the `libstdc++.so` file path in the OS to an environment variable `LD_PRELOAD`.
17
-
-**Problem**: Symbol undefined caused by `_GLIBCXX_USE_CXX11_ABI`.
- **Cause**: Intel® Extension for PyTorch\* is compiled with `_GLIBCXX_USE_CXX11_ABI=1`. This symbol undefined issue appears when PyTorch\* is
22
-
compiled with `_GLIBCXX_USE_CXX11_ABI=0`.
23
-
- **Solution**: Pass `export GLIBCXX_USE_CXX11_ABI=1` and compile PyTorch\* with particular compiler which supports `_GLIBCXX_USE_CXX11_ABI=1`. We recommend using prebuilt wheels
24
-
in [download server](https://pytorch-extension.intel.com/release-whl/stable/xpu/us/) to avoid this issue.
25
16
-**Problem**: `-997 runtime error` when running some AI models on Intel® Arc™ Graphics family.
26
17
-**Cause**: Some of the `-997 runtime error` are actually out-of-memory errors. As Intel® Arc™ Graphics GPUs have less device memory than Intel® Data Center GPU Flex Series 170 and Intel® Data Center GPU
27
18
Max Series, running some AI models on them may trigger out-of-memory errors and cause them to report failure such as `-997 runtime error` most likely. This is expected. Memory usage optimization is working in progress to allow Intel® Arc™ Graphics GPUs to support more AI models.
@@ -32,6 +23,9 @@ Troubleshooting
32
23
-**Problem**: Some workloads terminate with an error `CL_DEVICE_NOT_FOUND` after some time on WSL2.
33
24
-**Cause**: This issue is due to the [TDR feature](https://learn.microsoft.com/en-us/windows-hardware/drivers/display/tdr-registry-keys#tdrdelay) on Windows.
34
25
-**Solution**: Try increasing TDRDelay in your Windows Registry to a large value, such as 20 (it is 2 seconds, by default), and reboot.
26
+
-**Problem**: RuntimeError: Can't add devices across platforms to a single context. -33 (PI_ERROR_INVALID_DEVICE).
27
+
-**Cause**: If you run Intel® Extension for PyTorch\* in a Windows environment where Intel® discrete GPU and integrated GPU co-exist, and the integrated GPU is not supported by Intel® Extension for PyTorch\* but is wrongly identified as the first GPU platform.
28
+
-**Solution**: Disable the integrated GPU in your environment to work around. For long term, Intel® Graphics Driver will always enumerate the discrete GPU as the first device so that Intel® Extension for PyTorch\* could provide the fastest device to end framework users in such co-exist scenario based on that.
35
29
36
30
## Library Dependencies
37
31
@@ -118,13 +112,16 @@ Troubleshooting
118
112
```
119
113
120
114
- **Problem**: ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
121
-
torch 2.6.0+xpu requires intel-cmplr-lib-rt==2025.0.2, but you have intel-cmplr-lib-rt 2025.0.4 which is incompatible.
122
-
torch 2.6.0+xpu requires intel-cmplr-lib-ur==2025.0.2, but you have intel-cmplr-lib-ur 2025.0.4 which is incompatible.
123
-
torch 2.6.0+xpu requires intel-cmplr-lic-rt==2025.0.2, but you have intel-cmplr-lic-rt 2025.0.4 which is incompatible.
124
-
torch 2.6.0+xpu requires intel-sycl-rt==2025.0.2, but you have intel-sycl-rt 2025.0.4 which is incompatible.
125
-
- **Cause**: The intel-extension-for-pytorch v2.6.10+xpu uses Intel Compiler 2025.0.4 for a distributed feature fix, while torch v2.6.0+xpu is pinned with 2025.0.2.
126
-
- **Solution**: Ignore the Error since actually torch v2.6.0+xpu is compatible with Intel Compiler 2025.0.4.
127
115
116
+
```
117
+
torch 2.6.0+xpu requires intel-cmplr-lib-rt==2025.0.2, but you have intel-cmplr-lib-rt 2025.0.4 which is incompatible.
118
+
torch 2.6.0+xpu requires intel-cmplr-lib-ur==2025.0.2, but you have intel-cmplr-lib-ur 2025.0.4 which is incompatible.
119
+
torch 2.6.0+xpu requires intel-cmplr-lic-rt==2025.0.2, but you have intel-cmplr-lic-rt 2025.0.4 which is incompatible.
120
+
torch 2.6.0+xpu requires intel-sycl-rt==2025.0.2, but you have intel-sycl-rt 2025.0.4 which is incompatible.
121
+
```
122
+
123
+
- **Cause**: The intel-extension-for-pytorch v2.6.10+xpu uses Intel DPC++ Compiler 2025.0.4 to get a crucial bug fix in unified runtime, while torch v2.6.0+xpu is pinned with 2025.0.2.
124
+
- **Solution**: Ignore the Error since actually torch v2.6.0+xpu is compatible with Intel Compiler 2025.0.4.
Copy file name to clipboardExpand all lines: docs/tutorials/releases.md
+43Lines changed: 43 additions & 0 deletions
Original file line number
Diff line number
Diff line change
@@ -1,6 +1,49 @@
1
1
Releases
2
2
=============
3
3
4
+
## 2.6.10+xpu
5
+
6
+
Intel® Extension for PyTorch\* v2.6.10+xpu is the new release which supports Intel® GPU platforms (Intel® Data Center GPU Max Series, Intel® Arc™ Graphics family, Intel® Core™ Ultra Processors with Intel® Arc™ Graphics, Intel® Core™ Ultra Series 2 with Intel® Arc™ Graphics, Intel® Core™ Ultra Series 2 Mobile Processors and Intel® Data Center GPU Flex Series) based on PyTorch* 2.6.0.
7
+
8
+
### Highlights
9
+
10
+
- Intel® oneDNN v3.7 integration
11
+
- Official PyTorch 2.6 prebuilt binaries support
12
+
13
+
Starting this release, Intel® Extension for PyTorch\* supports official PyTorch prebuilt binaries, as they are built with `_GLIBCXX_USE_CXX11_ABI=1` since PyTorch\* 2.6 and hence ABI compatible with Intel® Extension for PyTorch\* prebuilt binaries which are always built with `_GLIBCXX_USE_CXX11_ABI=1`.
14
+
15
+
- Large Language Model (LLM) optimization
16
+
17
+
Intel® Extension for PyTorch\* provides support for a variety of custom kernels, which include commonly used kernel fusion techniques, such as `rms_norm` and `rotary_embedding`, as well as attention-related kernels like `paged_attention` and `chunked_prefill`, and `punica` kernel for serving multiple LoRA finetuned LLM. It also provides the MoE (Mixture of Experts) custom kernels including `topk_softmax`, `moe_gemm`, `moe_scatter`, `moe_gather`, etc. These optimizations enhance the functionality and efficiency of the ecosystem on Intel® GPU platform by improving the execution of key operations.
18
+
19
+
Besides that, Intel® Extension for PyTorch\* optimizes more LLM models for inference and finetuning, such as Phi3-vision-128k, phi3-small-128k, llama3.2-11B-vision, etc. A full list of optimized models can be found at [LLM Optimizations Overview](https://intel.github.io/intel-extension-for-pytorch/xpu/latest/tutorials/llm.html).
20
+
21
+
- Serving framework support
22
+
23
+
Intel® Extension for PyTorch\* offers extensive support for various ecosystems, including [vLLM](https://github.com/vllm-project/vllm) and [TGI](https://github.com/huggingface/text-generation-inference), with the goal of enhancing performance and flexibility for LLM workloads on Intel® GPU platforms (intensively verified on Intel® Data Center GPU Max Series and Intel® Arc™ B-Series graphics on Linux). The vLLM/TGI features like chunked prefill, MoE (Mixture of Experts) etc. are supported by the backend kernels provided in Intel® Extension for PyTorch*. The support to low precision such as Weight Only Quantization (WOQ) INT4 is also enhanced in this release:
24
+
- The performance of INT4 GEMM kernel based on Generalized Post-Training Quantization (GPTQ) algorithm has been improved by approximately 1.3× compared with previous release. During the prefill stage, it achieves similar performance to FP16, while in the decode stage, it outperforms FP16 by approximately 1.5×.
25
+
- The support of Activation-aware Weight Quantization (AWQ) algorithm is added and the performance is on par with GPTQ without g_idx.
26
+
27
+
-[Prototype] NF4 QLoRA finetuning using BitsAndBytes
28
+
29
+
Intel® Extension for PyTorch\* now supports QLoRA finetuning with BitsAndBytes on Intel® GPU platforms. It enables efficient adaptation of LLMs using NF4 4-bit quantization with LoRA, reducing memory usage while maintaining accuracy.
30
+
31
+
-[Beta] Intel® Core™ Ultra Series 2 Mobile Processors support on Windows
32
+
33
+
Intel® Extension for PyTorch\* provides beta quality support of Intel® Core™ Ultra Series 2 Mobile Processors (codename Arrow Lake-H) on Windows in this release, based on redistributed PyTorch 2.6 prebuilt binaries with additional AOT compilation target for Arrow Lake-H in the [download server](https://pytorch-extension.intel.com/release-whl/stable/xpu/us/).
34
+
35
+
- Hybrid ATen operator implementation
36
+
37
+
Intel® Extension for PyTorch\* uses ATen operators available in [Torch XPU Operators](https://github.com/intel/torch-xpu-ops) as much as possible and overrides very limited operators for better performance and broad data type support.
38
+
39
+
### Breaking Changes
40
+
41
+
- Intel® Data Center GPU Flex Series support is being deprecated and will no longer be available starting from the release after v2.6.10+xpu.
42
+
43
+
### Known Issues
44
+
45
+
Please refer to [Known Issues webpage](./known_issues.md).
46
+
4
47
## 2.5.10+xpu
5
48
6
49
Intel® Extension for PyTorch\* v2.5.10+xpu is the new release which supports Intel® GPU platforms (Intel® Data Center GPU Max Series, Intel® Arc™ Graphics family, Intel® Core™ Ultra Processors with Intel® Arc™ Graphics, Intel® Core™ Ultra Series 2 with Intel® Arc™ Graphics and Intel® Data Center GPU Flex Series) based on PyTorch* 2.5.1.
0 commit comments