Skip to content

Commit 85136ca

Browse files
authored
[Doc] Update Release note and Known issues (#5394)
1 parent b95ea42 commit 85136ca

File tree

2 files changed

+56
-16
lines changed

2 files changed

+56
-16
lines changed

docs/tutorials/known_issues.md

Lines changed: 13 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -11,17 +11,8 @@ Troubleshooting
1111
Optimization for Horovod\* at the end of the execution and triggers this error.
1212
- **Solution**: Do `import intel_extension_for_pytorch` before `import horovod.torch as hvd`.
1313
- **Problem**: Number of dpcpp devices should be greater than zero.
14-
- **Cause**: If you use Intel® Extension for PyTorch\* in a conda environment, you might encounter this error. Conda also ships the libstdc++.so dynamic library file that may conflict with the one shipped
15-
in the OS.
14+
- **Cause**: If you use Intel® Extension for PyTorch\* in a conda environment, you might encounter this error. Conda also ships the libstdc++.so dynamic library file that may conflict with the one shipped in the OS.
1615
- **Solution**: Export the `libstdc++.so` file path in the OS to an environment variable `LD_PRELOAD`.
17-
- **Problem**: Symbol undefined caused by `_GLIBCXX_USE_CXX11_ABI`.
18-
```bash
19-
ImportError: undefined symbol: _ZNK5torch8autograd4Node4nameB5cxx11Ev
20-
```
21-
- **Cause**: Intel® Extension for PyTorch\* is compiled with `_GLIBCXX_USE_CXX11_ABI=1`. This symbol undefined issue appears when PyTorch\* is
22-
compiled with `_GLIBCXX_USE_CXX11_ABI=0`.
23-
- **Solution**: Pass `export GLIBCXX_USE_CXX11_ABI=1` and compile PyTorch\* with particular compiler which supports `_GLIBCXX_USE_CXX11_ABI=1`. We recommend using prebuilt wheels
24-
in [download server](https://pytorch-extension.intel.com/release-whl/stable/xpu/us/) to avoid this issue.
2516
- **Problem**: `-997 runtime error` when running some AI models on Intel® Arc™ Graphics family.
2617
- **Cause**: Some of the `-997 runtime error` are actually out-of-memory errors. As Intel® Arc™ Graphics GPUs have less device memory than Intel® Data Center GPU Flex Series 170 and Intel® Data Center GPU
2718
Max Series, running some AI models on them may trigger out-of-memory errors and cause them to report failure such as `-997 runtime error` most likely. This is expected. Memory usage optimization is working in progress to allow Intel® Arc™ Graphics GPUs to support more AI models.
@@ -32,6 +23,9 @@ Troubleshooting
3223
- **Problem**: Some workloads terminate with an error `CL_DEVICE_NOT_FOUND` after some time on WSL2.
3324
- **Cause**: This issue is due to the [TDR feature](https://learn.microsoft.com/en-us/windows-hardware/drivers/display/tdr-registry-keys#tdrdelay) on Windows.
3425
- **Solution**: Try increasing TDRDelay in your Windows Registry to a large value, such as 20 (it is 2 seconds, by default), and reboot.
26+
- **Problem**: RuntimeError: Can't add devices across platforms to a single context. -33 (PI_ERROR_INVALID_DEVICE).
27+
- **Cause**: If you run Intel® Extension for PyTorch\* in a Windows environment where Intel® discrete GPU and integrated GPU co-exist, and the integrated GPU is not supported by Intel® Extension for PyTorch\* but is wrongly identified as the first GPU platform.
28+
- **Solution**: Disable the integrated GPU in your environment to work around. For long term, Intel® Graphics Driver will always enumerate the discrete GPU as the first device so that Intel® Extension for PyTorch\* could provide the fastest device to end framework users in such co-exist scenario based on that.
3529

3630
## Library Dependencies
3731

@@ -118,13 +112,16 @@ Troubleshooting
118112
```
119113

120114
- **Problem**: ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
121-
torch 2.6.0+xpu requires intel-cmplr-lib-rt==2025.0.2, but you have intel-cmplr-lib-rt 2025.0.4 which is incompatible.
122-
torch 2.6.0+xpu requires intel-cmplr-lib-ur==2025.0.2, but you have intel-cmplr-lib-ur 2025.0.4 which is incompatible.
123-
torch 2.6.0+xpu requires intel-cmplr-lic-rt==2025.0.2, but you have intel-cmplr-lic-rt 2025.0.4 which is incompatible.
124-
torch 2.6.0+xpu requires intel-sycl-rt==2025.0.2, but you have intel-sycl-rt 2025.0.4 which is incompatible.
125-
- **Cause**: The intel-extension-for-pytorch v2.6.10+xpu uses Intel Compiler 2025.0.4 for a distributed feature fix, while torch v2.6.0+xpu is pinned with 2025.0.2.
126-
- **Solution**: Ignore the Error since actually torch v2.6.0+xpu is compatible with Intel Compiler 2025.0.4.
127115
116+
```
117+
torch 2.6.0+xpu requires intel-cmplr-lib-rt==2025.0.2, but you have intel-cmplr-lib-rt 2025.0.4 which is incompatible.
118+
torch 2.6.0+xpu requires intel-cmplr-lib-ur==2025.0.2, but you have intel-cmplr-lib-ur 2025.0.4 which is incompatible.
119+
torch 2.6.0+xpu requires intel-cmplr-lic-rt==2025.0.2, but you have intel-cmplr-lic-rt 2025.0.4 which is incompatible.
120+
torch 2.6.0+xpu requires intel-sycl-rt==2025.0.2, but you have intel-sycl-rt 2025.0.4 which is incompatible.
121+
```
122+
123+
- **Cause**: The intel-extension-for-pytorch v2.6.10+xpu uses Intel DPC++ Compiler 2025.0.4 to get a crucial bug fix in unified runtime, while torch v2.6.0+xpu is pinned with 2025.0.2.
124+
- **Solution**: Ignore the Error since actually torch v2.6.0+xpu is compatible with Intel Compiler 2025.0.4.
128125
129126
## Performance Issue
130127

docs/tutorials/releases.md

Lines changed: 43 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,49 @@
11
Releases
22
=============
33

4+
## 2.6.10+xpu
5+
6+
Intel® Extension for PyTorch\* v2.6.10+xpu is the new release which supports Intel® GPU platforms (Intel® Data Center GPU Max Series, Intel® Arc™ Graphics family, Intel® Core™ Ultra Processors with Intel® Arc™ Graphics, Intel® Core™ Ultra Series 2 with Intel® Arc™ Graphics, Intel® Core™ Ultra Series 2 Mobile Processors and Intel® Data Center GPU Flex Series) based on PyTorch* 2.6.0.
7+
8+
### Highlights
9+
10+
- Intel® oneDNN v3.7 integration
11+
- Official PyTorch 2.6 prebuilt binaries support
12+
13+
Starting this release, Intel® Extension for PyTorch\* supports official PyTorch prebuilt binaries, as they are built with `_GLIBCXX_USE_CXX11_ABI=1` since PyTorch\* 2.6 and hence ABI compatible with Intel® Extension for PyTorch\* prebuilt binaries which are always built with `_GLIBCXX_USE_CXX11_ABI=1`.
14+
15+
- Large Language Model (LLM) optimization
16+
17+
Intel® Extension for PyTorch\* provides support for a variety of custom kernels, which include commonly used kernel fusion techniques, such as `rms_norm` and `rotary_embedding`, as well as attention-related kernels like `paged_attention` and `chunked_prefill`, and `punica` kernel for serving multiple LoRA finetuned LLM. It also provides the MoE (Mixture of Experts) custom kernels including `topk_softmax`, `moe_gemm`, `moe_scatter`, `moe_gather`, etc. These optimizations enhance the functionality and efficiency of the ecosystem on Intel® GPU platform by improving the execution of key operations.
18+
19+
Besides that, Intel® Extension for PyTorch\* optimizes more LLM models for inference and finetuning, such as Phi3-vision-128k, phi3-small-128k, llama3.2-11B-vision, etc. A full list of optimized models can be found at [LLM Optimizations Overview](https://intel.github.io/intel-extension-for-pytorch/xpu/latest/tutorials/llm.html).
20+
21+
- Serving framework support
22+
23+
Intel® Extension for PyTorch\* offers extensive support for various ecosystems, including [vLLM](https://github.com/vllm-project/vllm) and [TGI](https://github.com/huggingface/text-generation-inference), with the goal of enhancing performance and flexibility for LLM workloads on Intel® GPU platforms (intensively verified on Intel® Data Center GPU Max Series and Intel® Arc™ B-Series graphics on Linux). The vLLM/TGI features like chunked prefill, MoE (Mixture of Experts) etc. are supported by the backend kernels provided in Intel® Extension for PyTorch*. The support to low precision such as Weight Only Quantization (WOQ) INT4 is also enhanced in this release:
24+
- The performance of INT4 GEMM kernel based on Generalized Post-Training Quantization (GPTQ) algorithm has been improved by approximately 1.3× compared with previous release. During the prefill stage, it achieves similar performance to FP16, while in the decode stage, it outperforms FP16 by approximately 1.5×.
25+
- The support of Activation-aware Weight Quantization (AWQ) algorithm is added and the performance is on par with GPTQ without g_idx.
26+
27+
- [Prototype] NF4 QLoRA finetuning using BitsAndBytes
28+
29+
Intel® Extension for PyTorch\* now supports QLoRA finetuning with BitsAndBytes on Intel® GPU platforms. It enables efficient adaptation of LLMs using NF4 4-bit quantization with LoRA, reducing memory usage while maintaining accuracy.
30+
31+
- [Beta] Intel® Core™ Ultra Series 2 Mobile Processors support on Windows
32+
33+
Intel® Extension for PyTorch\* provides beta quality support of Intel® Core™ Ultra Series 2 Mobile Processors (codename Arrow Lake-H) on Windows in this release, based on redistributed PyTorch 2.6 prebuilt binaries with additional AOT compilation target for Arrow Lake-H in the [download server](https://pytorch-extension.intel.com/release-whl/stable/xpu/us/).
34+
35+
- Hybrid ATen operator implementation
36+
37+
Intel® Extension for PyTorch\* uses ATen operators available in [Torch XPU Operators](https://github.com/intel/torch-xpu-ops) as much as possible and overrides very limited operators for better performance and broad data type support.
38+
39+
### Breaking Changes
40+
41+
- Intel® Data Center GPU Flex Series support is being deprecated and will no longer be available starting from the release after v2.6.10+xpu.
42+
43+
### Known Issues
44+
45+
Please refer to [Known Issues webpage](./known_issues.md).
46+
447
## 2.5.10+xpu
548

649
Intel® Extension for PyTorch\* v2.5.10+xpu is the new release which supports Intel® GPU platforms (Intel® Data Center GPU Max Series, Intel® Arc™ Graphics family, Intel® Core™ Ultra Processors with Intel® Arc™ Graphics, Intel® Core™ Ultra Series 2 with Intel® Arc™ Graphics and Intel® Data Center GPU Flex Series) based on PyTorch* 2.5.1.

0 commit comments

Comments
 (0)