Skip to content

Commit ecb3791

Browse files
authored
add release notes to 2.5.0 (#3360)
1 parent a7f9cef commit ecb3791

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

47 files changed

+417
-314
lines changed

cpu/2.5.0+cpu/_sources/tutorials/examples.md.txt

+1-1
Original file line numberDiff line numberDiff line change
@@ -502,7 +502,7 @@ print("Execution finished")
502502
import torch
503503
from transformers import BertModel
504504

505-
model = BertModel.from_pretrained("bert-base-uncased")
505+
model = BertModel.from_pretrained("bert-base-uncased", attn_implementation="eager")
506506
model.eval()
507507

508508
vocab_size = model.config.vocab_size

cpu/2.5.0+cpu/_sources/tutorials/features/fast_bert.md.txt

+2-2
Original file line numberDiff line numberDiff line change
@@ -9,7 +9,7 @@ Currently `ipex.fast_bert` API is only well optimized for training. For inferenc
99

1010
### Prerequisite
1111

12-
- Transformers 4.6.0 ~ 4.43.2
12+
- Transformers 4.6.0 ~ 4.45.0
1313

1414
### Usage Example
1515

@@ -20,7 +20,7 @@ An API `ipex.fast_bert` is provided for a simple usage. Usage of this API follow
2020
import torch
2121
from transformers import BertModel
2222

23-
model = BertModel.from_pretrained("bert-base-uncased")
23+
model = BertModel.from_pretrained("bert-base-uncased", attn_implementation="eager")
2424
model.eval()
2525

2626
vocab_size = model.config.vocab_size

cpu/2.5.0+cpu/_sources/tutorials/getting_started.md.txt

+2-3
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
# Quick Start
22

3-
The following instructions assume you have installed the Intel® Extension for PyTorch\*. For installation instructions, refer to [Installation](../../../index.html#installation?platform=cpu&version=main).
3+
The following instructions assume you have installed the Intel® Extension for PyTorch\*. For installation instructions, refer to [Installation](../../../index.html#installation?platform=cpu&version=v2.5.0%2Bcpu).
44

55
To start using the Intel® Extension for PyTorch\* in your code, you need to make the following changes:
66

@@ -64,7 +64,6 @@ In [Cheat Sheet](./cheat_sheet.md), you can find more commands that can help you
6464

6565
`ipex.llm.optimize` is used for Large Language Models (LLM).
6666

67-
6867
```python
6968
import torch
7069
#################### code changes ####################
@@ -157,4 +156,4 @@ with torch.inference_mode(), torch.cpu.amp.autocast(enabled=amp_enabled):
157156
print(gen_text, total_new_tokens, flush=True)
158157
```
159158

160-
More LLM examples, including usage of low precision data types are available in the [LLM Examples](https://github.com/intel/intel-extension-for-pytorch/tree/main/examples/cpu/llm) section.
159+
More LLM examples, including usage of low precision data types are available in the [LLM Examples](https://github.com/intel/intel-extension-for-pytorch/tree/release/2.5/examples/cpu/llm) section.
Original file line numberDiff line numberDiff line change
@@ -1,8 +1,8 @@
11
Installation
22
============
33

4-
Select your preferences and follow the installation instructions provided on the [Installation page](../../../index.html#installation?platform=cpu&version=v2.4.0%2Bcpu).
4+
Select your preferences and follow the installation instructions provided on the [Installation page](../../../index.html#installation?platform=cpu&version=v2.5.0%2Bcpu).
55

66
After successful installation, refer to the [Quick Start](getting_started.md) and [Examples](examples.md) sections to start using the extension in your code.
77

8-
**NOTE:** For detailed instructions on installing and setting up the environment for Large Language Models (LLM), as well as example scripts, refer to the [LLM best practices](https://github.com/intel/intel-extension-for-pytorch/tree/v2.4.0%2Bcpu/examples/cpu/llm).
8+
**NOTE:** For detailed instructions on installing and setting up the environment for Large Language Models (LLM), as well as example scripts, refer to the [LLM best practices](https://github.com/intel/intel-extension-for-pytorch/tree/v2.5.0%2Bcpu/examples/cpu/llm).

cpu/2.5.0+cpu/_sources/tutorials/introduction.rst.txt

+1-1
Original file line numberDiff line numberDiff line change
@@ -16,7 +16,7 @@ the `Large Language Models (LLM) <llm.html>`_ section.
1616

1717
Get Started
1818
-----------
19-
- `Installation <../../../index.html#installation?platform=cpu&version=v2.4.0%2Bcpu>`_
19+
- `Installation <../../../index.html#installation?platform=cpu&version=v2.5.0%2Bcpu>`_
2020
- `Quick Start <getting_started.md>`_
2121
- `Examples <examples.md>`_
2222

cpu/2.5.0+cpu/_sources/tutorials/releases.md.txt

+27
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,33 @@
11
Releases
22
========
33

4+
## 2.5.0
5+
6+
We are excited to announce the release of Intel® Extension for PyTorch* 2.5.0+cpu which accompanies PyTorch 2.5. This release mainly brings you the support for Llama3.2, optimization on newly launched Intel® Xeon® 6 P-core platform, GPTQ/AWQ format support, and latest optimization to push better performance for LLM models. This release also includes a set of bug fixing and small optimizations. We want to sincerely thank our dedicated community for your contributions. As always, we encourage you to try this release and feedback as to improve further on this product.
7+
8+
### Highlights
9+
10+
* Llama 3.2 support
11+
12+
Meta has newly released [Llama 3.2](https://ai.meta.com/blog/llama-3-2-connect-2024-vision-edge-mobile-devices/), which includes small and medium-sized vision LLMs (11B and 90B), and lightweight, text-only models (1B and 3B). Intel® Extension for PyTorch* provides [support of Llama 3.2](https://www.intel.com/content/www/us/en/developer/articles/technical/intel-ai-solutions-support-the-new-llama-3-2-model.html) since its launch date with early release version, and now support with this official release.
13+
14+
* Optimization for Intel® Xeon® 6
15+
Intel® Xeon® 6 deliver new degrees of performance with more cores, a choice of microarchitecture, additional memory bandwidth, and exceptional input/output (I/O) across a range of workloads. Intel® Extension for PyTorch* provides dedicated optimization on this new processor family for features like Multiplexed Rank DIMM (MRDIMM), SNC=3 scenario, etc..
16+
17+
* Large Language Model (LLM) optimization:
18+
Intel® Extension for PyTorch* provides more feature support of the weight only quantization including GPTQ/AWQ format support, symmetric quantization of activation and weight, and added chunked prefill/prefix prefill support in LLM module API, etc.. These features enable better adoption of community model weight and provides better performance for low-precision scenarios. This release also extended the optimized models to include newly published Llama 3.2 vision models. A full list of optimized models can be found at [LLM optimization](https://github.com/intel/intel-extension-for-pytorch/tree/v2.5.0+cpu/examples/cpu/llm/inference).
19+
20+
* Bug fixing and other optimization
21+
- Optimized the performance of the IndirectAccessKVCacheAttention kernel
22+
[#3185](https://github.com/intel/intel-extension-for-pytorch/commit/8572e1faf97998783ea2a7fc6ee3094090feebc4) [#3209](https://github.com/intel/intel-extension-for-pytorch/commit/65e96630a2e17f7b762c5c765f10264ad08db098) [#3214](https://github.com/intel/intel-extension-for-pytorch/commit/a04214f7ab4e43648d75abdcf0fae53e5076be2b) [#3218](https://github.com/intel/intel-extension-for-pytorch/commit/f219012ab1babbc67c9b545fa7251cd981a2a3a2) [#3248](https://github.com/intel/intel-extension-for-pytorch/commit/9f6178eb028d36b3ed1f5985e57b7cf160acf38a)
23+
- Fixed the Segmentation fault in the IndirectAccessKVCacheAttention kernel [#3246](https://github.com/intel/intel-extension-for-pytorch/commit/bee5ab644086c9b25eb61916c6773932c74667d3)
24+
- Fixed the correctness issue in the PagedAttention kernel for Llama-68M-Chat-v1 [#3307](https://github.com/intel/intel-extension-for-pytorch/commit/638a7d26acb33af450ea9869b5b43ccdbe0e962b)
25+
- Fixed the support in `ipex.llm.optimize` to ensure `model.generate` returns the correct output type when `return_dict_in_generate` is set to `True`. [#3333](https://github.com/intel/intel-extension-for-pytorch/commit/584a4e2e2c6193b926554f951d2608489cac5d7a)
26+
- Optimized the performance of the Flash Attention kernel [#3291](https://github.com/intel/intel-extension-for-pytorch/commit/8fb43ec45ed93b62efef07f4b2e8dcd7dd502b8b)
27+
- Upgraded oneDNN to v3.6 [#3305](https://github.com/intel/intel-extension-for-pytorch/commit/91639fa0812ee3c12c672002c2bf5cf1cac4bc0a)
28+
29+
**Full Changelog**: https://github.com/intel/intel-extension-for-pytorch/compare/v2.4.0+cpu...v2.5.0+cpu
30+
431
## 2.4.0
532

633
We are excited to announce the release of Intel® Extension for PyTorch\* 2.4.0+cpu which accompanies PyTorch 2.4. This release mainly brings you the support for Llama3.1, basic support for LLM serving frameworks like vLLM/TGI, and a set of optimization to push better performance for LLM models. This release also extends the list of optimized LLM models to a broader level and includes a set of bug fixing and small optimizations. We want to sincerely thank our dedicated community for your contributions. As always, we encourage you to try this release and feedback as to improve further on this product.

cpu/2.5.0+cpu/_static/htmls/tbl_deepspeed.html

+18-10
Original file line numberDiff line numberDiff line change
@@ -44,6 +44,18 @@
4444
<td><p style="text-align: center; vertical-align: middle;">🟩</p></td>
4545
<td><p style="text-align: center; vertical-align: middle;">🟩</p></td>
4646
</tr>
47+
<tr class="row-even">
48+
<td><p>LLAMA</p></td>
49+
<td><p>meta-llama/Llama-3.2-3B-Instruct</p></td>
50+
<td><p style="text-align: center; vertical-align: middle;">🟩</p></td>
51+
<td><p style="text-align: center; vertical-align: middle;">🟩</p></td>
52+
</tr>
53+
<tr class="row-odd">
54+
<td><p>LLAMA</p></td>
55+
<td><p>meta-llama/Llama-3.2-11B-Vision-Instruct</p></td>
56+
<td><p style="text-align: center; vertical-align: middle;">🟩</p></td>
57+
<td><p style="text-align: center; vertical-align: middle;">🟩</p></td>
58+
</tr>
4759
<tr class="row-even">
4860
<td><p>GPT-J</p></td>
4961
<td><p>EleutherAI/gpt-j-6b</p></td>
@@ -53,13 +65,13 @@
5365
<tr class="row-odd">
5466
<td><p>GPT-NEOX</p></td>
5567
<td><p>EleutherAI/gpt-neox-20b</p></td>
56-
<td><p style="text-align: center; vertical-align: middle;">🟨</p></td>
68+
<td><p style="text-align: center; vertical-align: middle;">🟩</p></td>
5769
<td><p style="text-align: center; vertical-align: middle;">🟩</p></td>
5870
</tr>
5971
<tr class="row-even">
6072
<td><p>DOLLY</p></td>
6173
<td><p>databricks/dolly-v2-12b</p></td>
62-
<td><p style="text-align: center; vertical-align: middle;">🟨</p></td>
74+
<td><p style="text-align: center; vertical-align: middle;">🟩</p></td>
6375
<td><p style="text-align: center; vertical-align: middle;">🟩</p></td>
6476
</tr>
6577
<tr class="row-odd">
@@ -77,7 +89,7 @@
7789
<tr class="row-odd">
7890
<td><p>OPT</p></td>
7991
<td><p>facebook/opt-30b</p></td>
80-
<td><p style="text-align: center; vertical-align: middle;">🟨</p></td>
92+
<td><p style="text-align: center; vertical-align: middle;">🟩</p></td>
8193
<td><p style="text-align: center; vertical-align: middle;">🟩</p></td>
8294
</tr>
8395
<tr class="row-even">
@@ -89,7 +101,7 @@
89101
<tr class="row-odd">
90102
<td><p>Bloom</p></td>
91103
<td><p>bigscience/bloom-1b7</p></td>
92-
<td><p style="text-align: center; vertical-align: middle;">🟨</p></td>
104+
<td><p style="text-align: center; vertical-align: middle;">🟩</p></td>
93105
<td><p style="text-align: center; vertical-align: middle;">🟩</p></td>
94106
</tr>
95107
<tr class="row-even">
@@ -113,7 +125,7 @@
113125
<tr class="row-odd">
114126
<td><p>Baichuan</p></td>
115127
<td><p>baichuan-inc/Baichuan-13B-Chat</p></td>
116-
<td><p style="text-align: center; vertical-align: middle;">🟨</p></td>
128+
<td><p style="text-align: center; vertical-align: middle;">🟩</p></td>
117129
<td><p style="text-align: center; vertical-align: middle;">🟩</p></td>
118130
</tr>
119131
<tr class="row-even">
@@ -207,8 +219,4 @@
207219
<td><p style="text-align: center; vertical-align: middle;">🟩</p></td>
208220
</tr>
209221
</tbody>
210-
</table>
211-
<ul class="simple">
212-
<li><p>🟩 signifies that the model can perform well and with good accuracy (&lt;1% difference as compared with FP32).</p></li>
213-
<li><p>🟨 signifies that the model can perform well while accuracy may not been in a perfect state (&gt;1% difference as compared with FP32).</p></li>
214-
</ul>
222+
</table>

0 commit comments

Comments
 (0)