Skip to content

ZenDNN Release v5.0

Latest
Compare
Choose a tag to compare
@kiriti-pendyala kiriti-pendyala released this 15 Nov 11:39

The focus of the ZenDNN 5.0 release is on delivering support for Zen5 AMD EPYC™ architectures, as well as performance enhancements for generative LLM models through the PyTorch plug-in. The list of models supported includes architectures such as Llama2 and Llama3, Phi2, Phi3, Qwen, ChatGLM, and GPT. The release also delivers performance improvements to non generative LLM models such as BERT.

The ZenDNN library can be used in the following frameworks through a plug-in:

  • TensorFlow v2.16 and later.
  • PyTorch v2.0 and later.

The ZenDNN library is integrated with ONNX Runtime v1.19.2

The highlights of this release are as follows:

  • Support for the Zen5 family of AMD EPYC™ processors, codenamed Turin.
  • Compatibility with AOCL BLIS 5.0.
  • AMD EPYC™ specific enhancements to matmul operators and related fusions, specifically for BF16 precision.
  • An auto-tuning algorithm BF16:0 specifically targeting generative LLM models. Support for weight only quantization (WOQ) with INT4 weights and BF16 activations for LLMs; ZenDNN 5.0 natively supports models optimized and exported using the AMD Quantizer Quark.
  • AMD EPYC™ specific enhancements for WOQ matmul operators and related fusions.
  • Performance enhancements targeted at generative LLM models using the function zentorch.llm.optimize( ) available in the
    ZenDNN PyTorch plug-in; this function contains additive AMD EPYC™ specific optimizations on top of the x86 optimizations available in
    ipex.llm.optimize().
  • An optimized Scalar Dot Product Attention (SDPA) operator in the PyTorch plug-in, including KV cache performance optimizations tailored to AMD EPYC™ cache architectures.
  • Support for BF16 precision for Recommender System models in the PyTorch plug-in.
  • Graph optimization and pattern matching improvements in the PyTorch plug-in.