v24.09 Public Major Release
Feat
-
Provide a wrapper class to expose cpu::CpuSoftmaxGeneric
-
Detect number of cores in Windows®
-
Add Optimized SME kernel for QASYMM8_SIGNED elementwise addition operation
Fix
-
LogSoftmax Int8/UInt8 mismatches in Cpu
-
Rounding of negative integers in pooling 2d/3d gpu kernels
-
OpenMP® linker error on Windows®
-
Rounding of negative integers in pooling 2d/3d kernels
-
Patches linker failure for cpu::CpuSoftmaxGeneric in partial builds
-
Cpu/Gpu Reverse data type support
-
QSYMM16 broadcasted subtraction failures
-
CpuMulKernel validation when there is x-broadcasting for some types
-
Data type validation in depthwise op in Cpu
-
Update macOS® build instructions
-
Validation tests compute reference and target on each iteration
-
Reset permuted input and weights on configure in NEDepthwiseConvolutionLayer
-
Selectively enable CL job chaining
Refactor
-
Generate only one shared library when building with CMake
-
Add BF16 LUT for Softmax Layer with tests
-
Move heuristic logic of activation kernel into separate class
-
Removed unused CommandBuffer.
Perf
-
Allocate Persistent and Prepare tensors at start of prepare()
-
Use mws in OMPScheduler for better thread throttling
-
Enable FP16 winograd in CpuConv2d for v8a multi_isa builds.
Documentation (API, build guide, contribution guide, errata, etc.) available here:
https://artificial-intelligence.sites.arm.com/computelibrary/v24.09/index.xhtml