Release ZenDNN Release v5.0 · amd/ZenDNN

The focus of the ZenDNN 5.0 release is on delivering support for Zen5 AMD EPYC™ architectures, as well as performance enhancements for generative LLM models through the PyTorch plug-in. The list of models supported includes architectures such as Llama2 and Llama3, Phi2, Phi3, Qwen, ChatGLM, and GPT. The release also delivers performance improvements to non generative LLM models such as BERT.

The ZenDNN library can be used in the following frameworks through a plug-in:

TensorFlow v2.16 and later.
PyTorch v2.0 and later.

The ZenDNN library is integrated with ONNX Runtime v1.19.2

The highlights of this release are as follows:

Support for the Zen5 family of AMD EPYC™ processors, codenamed Turin.
Compatibility with AOCL BLIS 5.0.
AMD EPYC™ specific enhancements to matmul operators and related fusions, specifically for BF16 precision.
An auto-tuning algorithm BF16:0 specifically targeting generative LLM models. Support for weight only quantization (WOQ) with INT4 weights and BF16 activations for LLMs; ZenDNN 5.0 natively supports models optimized and exported using the AMD Quantizer Quark.
AMD EPYC™ specific enhancements for WOQ matmul operators and related fusions.
Performance enhancements targeted at generative LLM models using the function zentorch.llm.optimize( ) available in the
ZenDNN PyTorch plug-in; this function contains additive AMD EPYC™ specific optimizations on top of the x86 optimizations available in
ipex.llm.optimize().
An optimized Scalar Dot Product Attention (SDPA) operator in the PyTorch plug-in, including KV cache performance optimizations tailored to AMD EPYC™ cache architectures.
Support for BF16 precision for Recommender System models in the PyTorch plug-in.
Graph optimization and pattern matching improvements in the PyTorch plug-in.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ZenDNN Release v5.0