v24.09 Public Major Release
Provide a wrapper class to expose cpu::CpuSoftmaxGeneric
Detect number of cores in Windows®
Add Optimized SME kernel for QASYMM8_SIGNED elementwise addition operation
LogSoftmax Int8/UInt8 mismatches in Cpu
Rounding of negative integers in pooling 2d/3d gpu kernels
OpenMP® linker error on Windows®
Rounding of negative integers in pooling 2d/3d kernels
Patches linker failure for cpu::CpuSoftmaxGeneric in partial builds
Cpu/Gpu Reverse data type support
QSYMM16 broadcasted subtraction failures
CpuMulKernel validation when there is x-broadcasting for some types
Data type validation in depthwise op in Cpu
Update macOS® build instructions
Validation tests compute reference and target on each iteration
Reset permuted input and weights on configure in NEDepthwiseConvolutionLayer
Selectively enable CL job chaining
Generate only one shared library when building with CMake
Add BF16 LUT for Softmax Layer with tests
Move heuristic logic of activation kernel into separate class
Removed unused CommandBuffer.
Allocate Persistent and Prepare tensors at start of prepare()
Use mws in OMPScheduler for better thread throttling
Enable FP16 winograd in CpuConv2d for v8a multi_isa builds.
Documentation (API, build guide, contribution guide, errata, etc.) available here: