Releases: ARM-software/ComputeLibrary
v24.11
v24.11 Public Major Release
Feat
- Add SVE SoftmaxLayer kernel for BF16
- Provide stateless API for CpuGemmLowpMatrixMultiplyCore, CpuQuantize, and DequantizationLayer
- Extend static quantization interface for both matmul and convolution operations
Fix
- Clarify Third-Party IP licenses
- Check if CpuGemmAssemblyDispatch is configured in CpuMatMul before continue
- Add BF16 support for CpuGemmAssemblyDispatchWrapper
- Detect SVE support on Windows® to run the available kernels
- Fixed missing cstdint include which occurs with GCC 15
- Disable -O2 when building for Windows® as this crashes when certain compiler versions are used
- Make cast on CPU truncate float to int instead of round to be consistent with other ML frameworks
- Return error in validate() for CpuGemmLowpMatrixMultiplyCore if pretransposed A or B are true as this is not supported
- Avoid implicit conversion from __fp16 to arm_compute::bfloat16 to avoid illegal instructions in hardware with FP16 but no BF16 support
- Softmax SME2 kernel selection now correctly detects if SME2 is supported
- Requantization rounding issues in CPU/GPU Quantize
- Scale normalising coefficient in GPU LogSoftmax
- Apply consistent rounding policy in NEReduceMean
- Revert default memory manager for NEQLSTMLayer
- Create default memory manager when none is provided
Refactor
- Turn duplicated code in the elementwise_binary kernel into templates to reduce code size
- Move CpuSoftmaxKernel LUT to LUTManager to consolidate location of all LUTs
Perf
- Use SME instead of SVE for subtractions in SoftmaxLayer for Q8 relating to LUT address calculation
Documentation (API, build guide, contribution guide, errata, etc.) available here:
https://artificial-intelligence.sites.arm.com/computelibrary/v24.11/index.xhtml
v24.09
v24.09 Public Major Release
Feat
-
Provide a wrapper class to expose cpu::CpuSoftmaxGeneric
-
Detect number of cores in Windows®
-
Add Optimized SME kernel for QASYMM8_SIGNED elementwise addition operation
Fix
-
LogSoftmax Int8/UInt8 mismatches in Cpu
-
Rounding of negative integers in pooling 2d/3d gpu kernels
-
OpenMP® linker error on Windows®
-
Rounding of negative integers in pooling 2d/3d kernels
-
Patches linker failure for cpu::CpuSoftmaxGeneric in partial builds
-
Cpu/Gpu Reverse data type support
-
QSYMM16 broadcasted subtraction failures
-
CpuMulKernel validation when there is x-broadcasting for some types
-
Data type validation in depthwise op in Cpu
-
Update macOS® build instructions
-
Validation tests compute reference and target on each iteration
-
Reset permuted input and weights on configure in NEDepthwiseConvolutionLayer
-
Selectively enable CL job chaining
Refactor
-
Generate only one shared library when building with CMake
-
Add BF16 LUT for Softmax Layer with tests
-
Move heuristic logic of activation kernel into separate class
-
Removed unused CommandBuffer.
Perf
-
Allocate Persistent and Prepare tensors at start of prepare()
-
Use mws in OMPScheduler for better thread throttling
-
Enable FP16 winograd in CpuConv2d for v8a multi_isa builds.
Documentation (API, build guide, contribution guide, errata, etc.) available here:
https://artificial-intelligence.sites.arm.com/computelibrary/v24.09/index.xhtml
v24.08.1
v24.08.1 Public Patch Release
Fix
- Change inheritance qualifiers of experimental Cpu operator interface classes to public for cpu-wrappers.
- Mismatches in static quantization updated after configure tests
- CpuSoftmax configure ignores is_log on validation
- Linker errors in armv8.2a Windows® builds
Documentation (API, build guide, contribution guide, errata, etc.) available here:
https://artificial-intelligence.sites.arm.com/computelibrary/v24.08.1/index.xhtml
v24.08
v24.08 Public Major Release
Feature
- Expose CpuAdd functionality using the experimental operators api
- Expose CpuDepthwiseConv2d functionality using the experimental operators api
- Expose CpuElementwiseDivision functionality using the experimental operators api
- Expose CpuElementwiseMax functionality using the experimental operators api
- Expose CpuElementwiseMin functionality using the experimental operators api
- Expose CpuGemmAssemblyDispatch functionality using the experimental operators low-level api
- Expose CpuMul functionality using the experimental operators api
- Expose CpuSub functionality using the experimental operators api
Performance
- Solve performance issue on Arm® Mali™-G78
Fix
- Illegal intruction in multi_isa armv8a
- Set num_threads in ThreadInfo correctly in OMPScheduler
- Fix Alexnet graph example giving incorrect results
Documentation (API, build guide, contribution guide, errata, etc.) available here:
https://artificial-intelligence.sites.arm.com/computelibrary/v24.08/index.xhtml
v24.07
Public major release
Documentation (API, changelogs, build guide, contribution guide, errata, etc.) available here:
https://arm-software.github.io/ComputeLibrary/v24.07
v24.06
Public minor release
Documentation (API, changelogs, build guide, contribution guide, errata, etc.) available here:
https://arm-software.github.io/ComputeLibrary/v24.06
v24.05
Public major release
Documentation (API, changelogs, build guide, contribution guide, errata, etc.) available here:
https://arm-software.github.io/ComputeLibrary/v24.05
v24.04
Public major release
Documentation (API, changelogs, build guide, contribution guide, errata, etc.) available here:
https://arm-software.github.io/ComputeLibrary/v24.04
v24.02.1
Public patch release
Documentation (API, changelogs, build guide, contribution guide, errata, etc.) available here:
https://arm-software.github.io/ComputeLibrary/v24.02.1/
v24.02
Public major release
Documentation (API, changelogs, build guide, contribution guide, errata, etc.) available here:
https://arm-software.github.io/ComputeLibrary/v24.02