Releases: intel/intel-extension-for-tensorflow
Intel® Extension for TensorFlow* 2.15.0.2
Features and Improvements
Intel® Extension for TensorFlow* extends the official TensorFlow capabilities, allowing TensorFlow workloads to run on Intel® Data Center GPU Max Series. This release includes the following features and improvements:
-
Toolkit Support: Supports Intel® oneAPI Base Toolkit (version 2025.0.1)
-
Updated Support: The Intel® Extension for TensorFlow* has been upgraded to support oneDNN v3.6.2
Bug Fixes
- Fixes page fault issues in
TruncatedMod
andUniqueOp
discovered by stricter checks for memory access. - Fixes
missing env_check.py
under the Intel® Extension for TensorFlow* library path when installing previously released xpu wheels. - Fixes build issues with GCC14.
- Fixes build issues related to oneDNN upgrade.
- Fixes issues related to oneAPI DPC++ Compiler upgrade
- Fixes build and linking issues originating from
crosstool_wrapper_driver.tpl
What's Changed
- XeTLA patch is updated to support the latest Intel® oneAPI DPC++ Compiler
- SYCL Shuffle API has been updated
traceme_encode.h
is updated to avoid potential compile warnings- Code is refactored to avoid potential nested queue submit issues.
Documentation
Intel® Extension for TensorFlow* 2.15.0.1
Major Features and Improvements
Intel® Extension for TensorFlow* extends the official TensorFlow capabilities, allowing TensorFlow workloads to run on Intel® Data Center GPU Max Series, Intel® Data Center GPU Flex Series, and Intel® Xeon® Scalable Processors. This release includes the following major features and improvements:
-
New Install Channel: New install channel is provided, to solve the package size limitation of Pypi.
pip install --upgrade intel-extension-for-tensorflow[xpu] -f https://developer.intel.com/itex-whl-weekly
-
Toolkit Support: Supports Intel® oneAPI Base Toolkit 2024.2.
-
Updated Support: The Intel® Extension for TensorFlow* has been upgraded to support oneDNN 3.4.3.
-
Expreimental Support: Continues to provide experimental support for Intel® Arc™ A-Series GPUs on Windows Subsystem for Linux 2 with Ubuntu Linux installed and native Ubuntu Linux.
Bug Fixes
- Fixes device memory leak issues exposed by
ZeroLike
,SetOneDnnLayout
,GetDeviceInfo
andSegmentReduce
. - Fixes potential host memory leak issue.
- Fixes accurancy issue exposed by
Softmax
. - Fixes performance regression issue exposed by
AddV2WithSoftmax
. - Fixes
SYCL ESIMD feature not support on host
issue.
Known Issues
- TensorList limitation: TensorList is not supported with NextPluggableDevice by TensorFlow 2.15.
- Allocation limitation of WSL: A maximum size of single allocation allowed on a single device is set on the Windows Subsystem for Linux (WSL2), which may cause Out-of-Memory error. Users can remove the limitation with environment variable
UR_L0_ENABLE_RELAXED_ALLOCATION_LIMITS=1
- FP64 support: FP64 is not natively supported by the Intel® Data Center GPU Flex Series platform. If you run any AI workload with the FP64 kernel on that platform, the workload will exit with an exception as
'XXX' Op uses fp64 data type, while fp64 instructions are not supported on the platform.
- GLIBC++ mismatch: A
GLIBC++
version mismatch may cause a workload exit with the exception,Can not find any devices. To check runtime environment on your host, please run itex/tools/python/env_check.py.
Try running env_check.py script to confirm.
Documentations
Intel® Extension for TensorFlow* 2.15.0.0
Major Features and Improvements
Intel® Extension for TensorFlow* extends the official TensorFlow capabilities, allowing TensorFlow workloads to run on Intel® Data Center GPU Max Series, Intel® Data Center GPU Flex Series, and Intel® Xeon® Scalable Processors. This release includes the following major features and improvements:
-
Updated Support: The Intel® Extension for TensorFlow* has been upgraded to support TensorFlow 2.15, the version released by Google and required for this release.
-
Toolkit Support: Supports Intel® oneAPI Base Toolkit 2024.1.
-
NextPluggableDevice integration: Integrates NextPluggableDevice (an advanced generation of the PluggableDevice mechanism) as a new device type to enable seamless integration of new accelerator plugin. For more details, see the NextPluggableDevice Overview.
-
Experimental support: Provides experimental support for Intel GPU backend for OpenXLA, enabling OpenXLA GPU backend in Intel® Extension for TensorFlow* via PJRT plugin. For more details, see the OpenXLA.
-
Compiler enablement: Enables Clang compiler to build Intel® Extension for TensorFlow* CPU wheels starting with this release. The currently supported version is LLVM/clang 17. The official Wheels, published on PyPI, will be based on Clang; however, users can choose to build wheels using the GCC compiler by following the steps in the Configure For CPU guide.
-
Performance optimization: Enables weight pre-pack support for Intel® Extension for TensorFlow* CPU to provide better performance and reduce memory footprint of
_ITEXMatMul
and_ITEXFusedMatMul
. For more details, see the Weight Pre-Pack. -
Package redefinition: Re-defines XPU package to support GPU backend only starting with this release. The official XPU wheels published on PyPI will support only the GPU backend, and the GPU wheels will be deprecated.
-
New Operations: Supports new OPs to cover the majority of TensorFlow 2.15 OPs.
-
Expreimental Support: Continues to provide experimental support for Intel® Arc™ A-Series GPUs on Windows Subsystem for Linux 2 with Ubuntu Linux installed and native Ubuntu Linux.
Known Issues
- TensorList limitation: TensorList is not supported with NextPluggableDevice by TensorFlow 2.15.
- Allocation limitation of WSL: A maximum size of single allocation allowed on a single device is set on the Windows Subsystem for Linux (WSL2), which may cause Out-of-Memory error. Users can remove the limitation with environment variable
UR_L0_ENABLE_RELAXED_ALLOCATION_LIMITS=1
- FP64 support: FP64 is not natively supported by the Intel® Data Center GPU Flex Series platform. If you run any AI workload with the FP64 kernel on that platform, the workload will exit with an exception as
'XXX' Op uses fp64 data type, while fp64 instructions are not supported on the platform.
- GLIBC++ mismatch: A
GLIBC++
version mismatch may cause a workload exit with the exception,Can not find any devices. To check runtime environment on your host, please run itex/tools/python/env_check.py.
Try running env_check.py script to confirm.
Other Information
- Performance Data: Provides a Performance Data document to demonstrate the training and inference performance as well as accuracy results on several popular AI workloads with Intel® Extension for TensorFlow* benchmarked on Intel GPUs.
Documentations
Intel® Extension for TensorFlow* 2.14.0.1
Major Features and Improvements
Intel® Extension for TensorFlow* extends official TensorFlow capabilities to run TensorFlow workloads on Intel® Data Center GPU Max Series, Intel® Data Center GPU Flex Series, and Intel® Xeon® Scalable Processors. This release contains the following major features and improvement:
-
The Intel® Extension for TensorFlow* supported TensorFlow version is successfully upgraded to Google released TensorFlow 2.14, which is the required TensorFlow version for this release.
-
Supports Intel® oneAPI Base Toolkit 2024.0.
-
Provides experimental support for selecting CPU thread pools using either OpenMP thread pool (default) or Eigen thread pool. You can select the more efficient thread pool based on the workload and hardware configuration. Refer to Selecting Thread Pool in Intel® Extension for TensorFlow* CPU for more details.
-
Enables FP8 functionality support for Transformer-like training models. Refer to FP8 BERT-Large Fine-tuning for Classifying Text on Intel GPU for more details.
-
Provides experimental support for quantization front-end python API, based on Intel® Neural Compressor.
-
Adds OPs performance optimizations:
- Optimizes
GroupNorm
/Unique
operators. - Optimizes
Einsum
/ScaledDotProductAttention
with XeTLA enabled.
- Optimizes
-
Supports new OPs to cover the majority of TensorFlow 2.14 OPs.
-
Continues to provide experimental support for Intel® Arc™ A-Series GPUs on Windows Subsystem for Linux 2 with Ubuntu Linux installed and native Ubuntu Linux.
-
Moves the experimental support for Intel GPU backend for OpenXLA from the Intel® Extension for TensorFlow repository to the Intel® Extension for OpenXLA* repository. Refer to Intel® Extension for OpenXLA* for more details.
Known Issues
- FP64 is not natively supported by the Intel® Data Center GPU Flex Series platform. If you run any AI workload with the FP64 kernel on that platform, the workload will exit with an exception as
'XXX' Op uses fp64 data type, while fp64 instructions are not supported on the platform.
- A
GLIBC++
version mismatch may cause a workload exit with the exception,Can not find any devices. To check runtime environment on your host, please run itex/tools/env_check.sh.
Try running env_check.sh script to confirm.
Documents
Intel® Extension for TensorFlow* 2.13.0.0
Major Features and Improvements
Intel® Extension for TensorFlow* extended official TensorFlow capability to run TensorFlow workloads on Intel® Data Center Max GPU, Intel® Data Center GPU Flex Series, Intel® Xeon® Scalable Processors. This release contains following major features and improvement:
- Intel® Extension for TensorFlow* supported TensorFlow version was successfully upgraded to Google latest released TensorFlow2.13, which is the unique supported TensorFlow version in this release.
- Refined Intel® Extension for TensorFlow* version to four digits version format v2.13.0.0 based on the three digits from stock TensorFlow v2.13.0 with the last digit incrementing per extension release. This will make it easier for users to understand Intel® Extension for TensorFlow* and stock TensorFlow version mapping relationship.
- Unified one XPU package to support both CPU and GPU backend and provided flexibility for users on different CPU or GPU hardware platforms.
- Supported TensorFlow Serving running above Intel® Extension for TensorFlow* to provide serving service in a production environment. Learn more in the TensorFlow Serving Installation Guide.
- Enabled INT8 quantization by oneDNN Graph API as default solution on GPU in Intel® Extension for TensorFlow* to provide better INT8 user experience together with Intel® Neural Compressor >= 2.2.
- Add OPs performance optimization
- Enabled SYCL native BFloat16 data type support.
- SpaceToBatchND/BatchToSpaceND 1.1x ~ 1.8x improvement compare with last release.
- SelectOP 1.3x ~ 1.7x improvement compare with last release.
- LstmEltwiseKernel 1.28x ~ 1.7x improvement compare with last release.
- BucketizeOp 4x improvement compare with last release.
- Supported new Ops to cover majority of TensorFlow 2.13.0 Ops.
- Dynamic loading Intel® Advanced Vector Extensions AVX2 and AVX512 Instructions by adapting to user's hardware to maximize CPU performance.
- Supported FP16 data type with AMX simulation on 4th Gen Intel® Xeon® Scalable processors (code name Sapphire Rapids).
- This release started to provide product support for second generation Intel® Xeon® Scalable Processors and newer (such as Cascade Lake, Cooper Lake, Ice Lake and Sapphire Rapids).
- This release continued to provide experimental support for Intel® Arc™ A-Series GPUs on Windows Subsystem for Linux 2 with Ubuntu Linux installed and native Ubuntu Linux.
Known Issues
- FP64 is not natively supported by the Intel® Data Center GPU Flex Series platform. If you run any AI workload with FP64 kernel on that platform, the workload will exit with exception as
'XXX' Op uses fp64 data type, while fp64 instructions are not supported on the platform.
GLIBC++
version mismatch may cause workload exit with exceptionCan not found any devices. To check runtime environment on your host, please run itex/tools/env_check.sh.
Please try env_check.sh to confirm.
Documents
- Welcome to Intel® Extension for TensorFlow* documentation
- Provided guide docs to users for TensorFlow Serving Installation Guide
- Distributed supported by Intel® Optimization for Horovod*
- Intel® Extension for TensorFlow* Installation guide
- Frequently Asked Questions
Intel® Extension for TensorFlow* 1.2.0
Major Features and Improvements
Intel® Extension for TensorFlow* extended official TensorFlow capability to run TensorFlow workload on Intel® Data Center Max GPU and Intel® Data Center GPU Flex Series. This release contains following major features and improvements:
-
The TensorFlow version supported by Intel® Extension for TensorFlow* v1.2.0 was successfully upgraded to Google latest released TensorFlow 2.12. Due to TensorFlow 2.12 break change in protobuf, Intel® Extension for TensorFlow* can only seamlessly binary co-work with TensorFlow 2.12 in this release.
-
Adopted a uniform Device API PJRT as the supported device plugin mechanism to implement Intel GPU backend for OpenXLA experimental support. Users can build Intel® Extension for TensorFlow* source and run JAX front end APIs with OpenXLA. Refer to OpenXLA Support GPU for more details.
-
Updated oneDNN version to v3.1 which includes multiple functional and performance improvements for CPU and GPU implementations.
-
Supported generative AI model Stable diffusion and optimized model to get better performance. Get started in Stable Diffusion Inference for Text2Image on Intel GPU.
-
Supported XPUAutoShard in Intel® Extension for TensorFlow* as an experimental feature. Given a set of homogeneous XPU devices (eg. 2 GPU tiles), XPUAutoShard automatically shards input data and TensorFlow graph by placing these data/graph shard on different GPU devices to maximize hardware usage. Refer to XPUAutoShard on GPU for more details.
-
Provided Python API
itex.experimental_ops_override()
to automatically override some TensorFlow operators by Customized Operators underitex.ops
namespace, as well as to be compatible with existing trained parameters. More in usage details. -
Added operators performance optimization
- Optimized
ResizeNearestNeighborGrad
/All
/Any
/Slice
/SpaceToBatchND
/BatchToSpaceND
/BiasAddGrad
operators. - Optimized math function(eg.
tanh
,rsqrt
) with small shape (eg. size=8192) on Intel® Data Center GPU Flex Series by vectorization optimization. - Optimized reduction series ops by improving threads and memory utility for Col/Row reduction separately.
- Optimized
-
Supported AOT(Ahead-of-time compilation) on Intel® Data Center Max GPU, Intel® Data Center GPU Flex Series and Intel® Arc™ A-Series GPUs in Intel® Extension for TensorFlow* package in PyPI channel. You can also specify hardware platform type when configure your system in source code build.
-
This release continued to provide experimental support for second generation Intel® Xeon® Scalable Processors and newer (such as Cascade Lake, Cooper Lake, Ice Lake and Sapphire Rapids) and Intel® Arc™ A-Series GPUs on Windows Subsystem for Linux 2 with Ubuntu Linux installed and native Ubuntu Linux.
Bug Fixes and Other Changes
- Upgraded pybind11 version to support Python 3.11 source build.
- Initialized environment variables for Intel® oneAPI Base Toolkit in docker container by default.
Known Issues
- FP64 is not natively supported by the Intel® Data Center GPU Flex Series platform. If you run any AI workload with FP64 kernel on that platform, the workload will exit with exception as
'XXX' Op uses fp64 data type, while fp64 instructions are not supported on the platform.
- Tensorboard cannot co-work with stock TensorFlow 2.12 due to two issues of tensorflow/tensorflow#60262 and tensorflow/profiler#602.
GLIBC++
version mismatch may cause workload exit with exceptionCan not found any devices. To check runtime environment on your host, please run itex/tools/env_check.sh.
Please try env_check.sh for assistance.
Documents
-
Provided new guide documentation to developers for How to write custom op.
-
Distributed supported by Intel® Optimization for Horovod*.
Intel® Extension for TensorFlow* 1.1.0
Major Features and Improvements
Intel® Extension for TensorFlow* has already extended official TensorFlow capability to run TensorFlow workload on Intel® Data Center Max GPU Series and Intel® Data Center GPU Flex Series. This release contains following major features and improvement:
- Intel® Extension for TensorFlow* supported TensorFlow version was successfully upgraded to Google latest released TensorFlow 2.11. So in this release Intel® Extension for TensorFlow* can seamlessly binary co-work with TensorFlow 2.11 and TensorFlow 2.10.
- Added Intel® Optimization for Horovod* in Intel ® Extension for TensorFlow* Intel® Data Center Max GPU Series docker container. Users only need to install GPU driver in host machine and launch docker container directly to run TensorFlow + Horovod distributed workloads. Please get start from Docker Container Guide and Horovod ResNet50 example.
- Enhanced unit tests to cover majority of TensorFlow Ops.
- Added new OPs support and performance optimization
- Added double data type support for
MatMul
/BatchMatMul
/BatchMatMulV2
. - Enabled Eigen vectorized RNE conversion between packed BF16 and FP32 for element-wise ops.
- Enabled vectorization pass for Sigmoid OP.
- Optimized ItexLSTM/NMS/ResizeNearestNeighbor OP.
- Added more fusion pattern support(Conv+BiasAdd+Relu+Add fusion, Conv + Mish fusion).
- Added double data type support for
- Enabled INT8 quantization by oneDNN Graph API as default solution on CPU in Intel® Extension for TensorFlow* to provide better INT8 user experience together with Intel® Neural Compressor >= 2.0.
- Added environment check script for users to check software stack installation status, including OS version, GPU driver, TensorFlow and other dependencies version in Intel® oneAPI Base Toolkit.
- This release continued to provide experimental support for second generation Intel® Xeon® Scalable Processors and newer (such as Cascade Lake, Cooper Lake, Ice Lake and Sapphire Rapids) and Intel® Arc™ A-Series GPUs on Windows Subsystem for Linux 2 with Ubuntu Linux installed and native Ubuntu Linux.
Bug Fixes and Other Changes
- Fixed several kernel bugs, including NAN issue in LogSoftmax OP, Segment fault failure in Unique/ ParallelConcat OP.
- Added cast from INT64 to BF16.
Known Issues
- FP64 is not natively supported by the Intel® Data Center GPU Flex Series platform. If you run any AI workload with FP64 kernel on that platform, the workload will exit with exception as
'XXX' Op uses fp64 data type, while fp64 instructions are not supported on the platform.
Documents
Intel® Extension for TensorFlow* 1.0.0
Major Features
Intel® Extension for TensorFlow* is an Intel optimized Python package to extend official TensorFlow capability of running TensorFlow workloads on Intel GPU, and brings the first Intel GPU product Intel® Data Center GPU Flex Series 170 into TensorFlow open source community for AI workload acceleration. It’s based on TensorFlow PluggableDevice interface and provides fully support from TensorFlow 2.10.
This release contains following major features:
-
AOT (Ahead-of-time compilation)
AOT Compilation is a performance feature which targets to remove just-in-time(JIT) overhead during application launch. It can be enabled when configure your system to source code build. Intel® Extension for TensorFlow* package in PyPI channel is built with AOT enabled.
-
Graph Optimization
Advanced Automatic Mixed Precision
Advanced Automatic Mixed Precision implements low-precision data types (
float16
orbfloat16
) with further boosted performance and less memory consumption. Please get started from how to enable.Graph fusion
Intel® Extension for TensorFlow* provides graph optimization to fuse specified operators pattern to new single operator for better performance, such as
Conv2D+ReLU
,Linear+ReLU
. Refer to the supported fusion list from Graph fusion. -
Python API
Public APIs to extend XPU operators are developed for better performance in the
itex.ops
namespace, includingAdamWithWeightDecayOptimizer
/gelu
/LayerNormalization
/ItexLSTM
. Please find more details from Intel® Extension for TensorFlow* ops. -
Intel® Extension for TensorFlow* Profiler
Intel® Extension for TensorFlow* provides support for TensorFlow* Profiler to trace TensorFlow* models performance on Intel GPU. Please refer to how to enable profiler for more details.
-
Docker Container Support
Intel® Extension for TensorFlow* Docker container is delivered to include Intel® oneAPI Base Toolkit and all other software stack except Intel GPU Drivers. Users only needs to install GPU driver in host machine, before pull and launch docker container directly. Please get started from Docker Container Guide.
-
FP32 Math Mode
Float32 precision is to reduce TensorFloat-32 execution by
ITEX_FP32_MATH_MODE
setting. Users can enable this feature by settingITEX_FP32_MATH_MODE
(defaultFP32
) to be equal with either value (GPU:TF32
/FP32
). More details in ITEX_FP32_MATH_MODE. -
Intel® Extension for TensorFlow* Verbose
ITEX_VERBOSE
is designed to help users get more Intel® Extension for TensorFlow* log message by different log levels. More details in ITEX_VERBOSE level introduction. -
INT8 Quantization
Intel® Extension for TensorFlow* co-works with Intel® Neural Compressor >= 1.14.1 to provide compatible TensorFlow INT8 quantization solution support with same user experience.
-
Experimental Support
This release provides experimental support for Intel® Arc™ A-Series GPUs on Windows Subsystem for Linux 2 with Ubuntu Linux installed and native Ubuntu Linux, and second generation Intel® Xeon® Scalable Processors and newer, such as Cascade Lake, Cooper Lake, Ice Lake and Sapphire Rapids.
Known Issues
- FP64 is not natively supported by the Intel® Data Center GPU Flex Series platform. If you run any AI workload on that platform and receive error message as "[CRITICAL ERROR] Kernel 'XXX' removed due to usage of FP64 instructions unsupported by the targeted hardware" , it means that a kernel requires FP64 instructions is removed and not executed, hence the accuracy of whole workload is wrong.