This page summarizes the major functional and performance changes in each release of the 3.x series.
All performance data on this page is measured on an Intel Core i5-9600K
clocked at 4.2 GHz, running astcenc
using AVX2 and 6 threads.
Status: April 2022
The 3.7 release contains another round of performance optimizations, including significant improvements to the command line front-end (faster PNG loader) and the arm64 build of the codec (faster NEON implementation).
- General:
- Feature: The command line tool PNG loader has been switched to use the Wuffs library, which is robust and significantly faster than the current stb_image implementation.
- Feature: Support for non-invariant builds returns. Opt-in to slightly
faster, but not bit-exact, builds by setting
-DNO_INVARIANCE=ON
for the CMake configuration. This improves performance by around 2%. - Optimization: Changed SIMD
select()
so that it matches the default NEON behavior (bitwise select), rather than the default x86-64 behavior (lane select on MSB). Specializationselect_msb()
added for the one case we want to select on a sign-bit, where NEON needs a different implementation. This provides a significant (>25%) performance uplift on NEON implementations.
Key for charts:
- Color = block size (see legend).
- Letter = image format (N = normal map, G = grayscale, L = LDR, H = HDR).
Relative performance vs 3.5 release:
Status: April 2022
The 3.6 release contains another round of performance optimizations.
There are no interface changes in this release, but in general the API is not
designed to be binary compatible across versions. We always recommend
rebuilding your client-side code using the updated astcenc.h
header.
- General:
- Feature: Data tables are now optimized for contexts without the
SELF_DECOMPRESS_ONLY
flag set. The flag therefore no longer improves compression performance, but still reduces context creation time and context data table memory footprint. - Feature: Image quality for 4x4
-fastest
configuration has been improved. - Optimization: Decimation modes are reliably excluded from processing when they are only partially selected in the compressor configuration (e.g. if used for single plane, but not dual plane modes). This is a significant performance optimization for all quality levels.
- Optimization: Fast-path block load function variant added for 2D LDR images with no swizzle. This is a moderate performance optimization for the fast and fastest quality levels.
- Feature: Data tables are now optimized for contexts without the
Key for charts:
- Color = block size (see legend).
- Letter = image format (N = normal map, G = grayscale, L = LDR, H = HDR).
Relative performance vs 3.5 release:
Status: March 2022
The 3.5 release contains another round of performance optimizations.
There are no interface changes in this release, but in general the API is not
designed to be binary compatible across versions. We always recommend
rebuilding your client-side code using the updated astcenc.h
header.
- General:
- Feature: Compressor configurations using
SELF_DECOMPRESS_ONLY
mode store compacted partition tables, which significantly improves both context create time and runtime performance. - Feature: Bilinear infill for decimated weight grids supports a new variant for half-decimated grids which are only decimated in one axis.
- Feature: Compressor configurations using
Key for charts:
- Color = block size (see legend).
- Letter = image format (N = normal map, G = grayscale, L = LDR, H = HDR).
Relative performance vs 3.4 release:
Status: February 2022
The 3.4 release introduces another round of optimizations, removing a number of power-user configuration options to simplify the core compressor data path.
Reminder for users of the library interface - the API is not designed to be
binary compatible across versions, and this release is not compatible with
earlier releases. Please update and rebuild your client-side code using the
updated astcenc.h
header.
- General:
- Feature: Many memory allocations have been moved off the stack into dynamically allocated working memory. This significantly reduces the peak stack usage, allowing the compressor to run in systems with 128KB stack limits.
- Feature: Builds now support
-DBLOCK_MAX_TEXELS=<count>
to allow a compressor to support a subset of block sizes. This can reduce binary size and runtime memory footprint, and improve performance. - Feature: The
-v
and-va
options to set a per-texel error weight function are no longer supported. - Feature: The
-b
option to set a per-texel error weight boost for block border texels is no longer supported. - Feature: The
-a
option to set a per-texel error weight based on texel alpha value is no longer supported as an error weighting tool, but is still supported for providing sprite-sheet RDO. - Feature: The
-mask
option to set an error metric for mask map textures is still supported, but is currently a no-op in the compressor. - Feature: The
-perceptual
option to set a perceptual error metric is still supported, but is currently a no-op in the compressor for mask map and normal map textures. - Bug-fix: Corrected decompression of error blocks in some cases, so now returning the expected error color (magenta for LDR, NaN for HDR). Note that astcenc determines the error color to use based on the output image data type not the decoder profile.
- Binary releases:
- Improvement: Windows binaries changed to use ClangCL 12.0, which gives up to 10% performance improvement.
Key for charts:
- Color = block size (see legend).
- Letter = image format (N = normal map, G = grayscale, L = LDR, H = HDR).
Relative performance vs 3.3 release:
Status: November 2021
The 3.3 release improves image quality for normal maps, and two component textures. Normal maps are expected to compress 25% slower than the 3.2 release, although it should be noted that they are still faster to compress in 3.3 than when using the 2.5 series. This release also fixes one reported stability issue.
- General:
- Feature: Normal map image quality has been improved.
- Feature: Two component image quality has been improved, provided
that unused components are correctly zero-weighted using e.g.
-cw
on the command line. - Bug-fix: Improved stability when trying to compress complex blocks that could not beat even the starting quality threshold. These will now always compress in to a constant color blocks.
Status: August 2021
The 3.2 release is a bugfix release; no significant image quality or performance differences are expected.
- General:
- Bug-fix: Improved stability when new contexts were created while other contexts were compressing or decompressing an image.
- Bug-fix: Improved stability when decompressing blocks with invalid block encodings.
Status: July 2021
The 3.1 release gives another performance boost, typically between 5 and 20% faster than the 3.0 release, as well as further incremental improvements to image quality. A number of build system improvements make astcenc easier and faster to integrate into other projects as a library, including support for building universal binaries on macOS. Full change list is shown below.
Reminder for users of the library interface - the API is not designed to be
binary compatible across versions, and this release is not compatible with
earlier releases. Please update and rebuild your client-side code using the
updated astcenc.h
header.
- General:
- Feature: RGB color data now supports
-perceptual
operation. The current implementation is simple, weighting color channel errors by their contribution to perceived luminance. This mimics the behavior of the human visual system, which is most sensitive to green, then red, then blue. - Feature: Codec supports a new low weight search mode, which is a
simpler weight assignment for encodings with a low number of weights in the
weight grid. The weight threshold can be overridden using the new
-lowweightmodelimit
command line option. - Feature: All platform builds now support building a native binary. Native binaries automatically select the SIMD level based on the default configuration of the compiler in use. Native binaries built on one machine may use different SIMD options than native binaries build on another.
- Feature: macOS platform builds now support building universal binaries
containing both
x86_64
andarm64
target support. - Feature: Building the command line can be disabled when using as a
library in another project. Set
-DCLI=OFF
during the CMake configure step. - Feature: A standalone minimal example of the core codec API usage has
been added in the
./Utils/Example/
directory.
- Feature: RGB color data now supports
- Core API:
- Feature: Config flag
ASTCENC_FLG_USE_PERCEPTUAL
works for color data. - Feature: Config option
tune_low_weight_count_limit
added. - Feature: New heuristic added which prunes dual weight plane searches if they are unlikely to help. This heuristic is not user controllable.
- Feature: Image quality has been improved. In general we see significant improvements (up to 0.2dB) for high bitrate encodings (4x4, 5x4), and a smaller improvement (up to 0.1dB) for lower bitrate encodings.
- Bug fix: Arm "none" SIMD builds could be invariant with other builds. This fix has also been back-ported to the 2.x LTS branch.
- Feature: Config flag
Key for charts:
- Color = block size (see legend).
- Letter = image format (N = normal map, G = grayscale, L = LDR, H = HDR).
Relative performance vs 3.0 release:
Status: June 2021
The 3.0 release is the first in a series of updates to the compressor that are making more radical changes than we felt we could make with the 2.x series. The primary goals of the 3.x series are to keep the image quality ~static or better compared to the 2.5 release, but continue to improve performance.
Reminder for users of the library interface - the API is not designed to be
binary compatible across versions, and this release is not compatible with
earlier releases. Please update and rebuild your client-side code using the
updated astcenc.h
header.
- General:
- Feature: The code has been significantly cleaned up, with improved comments, API documentation, function naming, and variable naming.
- Core API:
- API Change: The core APIs for
astcenc_compress_image()
and forastcenc_decompress_image()
now accept swizzle structures byconst
pointer, instead of pass-by-value. - API Change: Calling the
astcenc_compress_reset()
and theastcenc_decompress_reset()
functions between images is no longer required if the context was created for use by a single thread. - Feature: New heuristics have been added for controlling when to search
beyond 2 partitions and 1 plane, and when to search beyond 3 partitions and
1 plane. The previous
tune_partition_early_out_limit
config option has been removed, and replaced with two new optionstune_2_partition_early_out_limit_factor
andtune_3_partition_early_out_limit_factor
. See command line help for more detailed documentation. - Feature: New heuristics have been added for controlling when to use
dual weight planes. The previous
tune_two_plane_early_out_limit
has been renamed totune_2_plane_early_out_limit_correlation
. See command line help for more detailed documentation. - Feature: Support for using dual weight planes has been restricted to single partition blocks; it rarely helps blocks with 2 or more partitions and takes considerable compression search time.
- API Change: The core APIs for
Key for charts:
- Color = block size (see legend).
- Letter = image format (N = normal map, G = grayscale, L = LDR, H = HDR).
Relative performance vs 2.5 release:
Copyright © 2021-2022, Arm Limited and contributors. All rights reserved.