Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Backport mod msac improvements from dav1d 1.4.2 #1224

Merged
merged 5 commits into from
Jun 18, 2024

Conversation

fbossen
Copy link
Collaborator

@fbossen fbossen commented Jun 17, 2024

Various improvements to msac functions. fn {d,r}av1d_msac_decode_symbol_adapt4 cannot be used for 5 possible symbol values anymore.

Kyle Siefring and others added 5 commits June 17, 2024 17:42
Before:
msac_decode_hi_tok_c:               259.5 ( 1.00x)
msac_decode_hi_tok_neon:            220.7 ( 1.18x)
msac_decode_symbol_adapt4_c:        105.7 ( 1.00x)
msac_decode_symbol_adapt4_neon:      63.3 ( 1.67x)

After:
msac_decode_hi_tok_c:               260.9 ( 1.00x)
msac_decode_hi_tok_neon:            197.9 ( 1.32x)
msac_decode_symbol_adapt4_c:        105.7 ( 1.00x)
msac_decode_symbol_adapt4_neon:      63.3 ( 1.67x)

decode_symbol_adapt4 is not changed, but is included for reference since
decode_hi_tok calls it.
Port improvements from the hi token functions to the rest of the symbol
adaption functions. These weren't originally ported since they didn't
work with arbitrary padding. In practice, zero padding is already used
and only the tests need to be updated.

Results - Neoverse N1

Old:
msac_decode_symbol_adapt4_c:         41.4 ( 1.00x)
msac_decode_symbol_adapt4_neon:      31.0 ( 1.34x)
msac_decode_symbol_adapt8_c:         54.5 ( 1.00x)
msac_decode_symbol_adapt8_neon:      32.2 ( 1.69x)
msac_decode_symbol_adapt16_c:        85.6 ( 1.00x)
msac_decode_symbol_adapt16_neon:     37.5 ( 2.28x)

New:
msac_decode_symbol_adapt4_c:         41.5 ( 1.00x)
msac_decode_symbol_adapt4_neon:      27.7 ( 1.50x)
msac_decode_symbol_adapt8_c:         55.7 ( 1.00x)
msac_decode_symbol_adapt8_neon:      30.1 ( 1.85x)
msac_decode_symbol_adapt16_c:        82.4 ( 1.00x)
msac_decode_symbol_adapt16_neon:     35.2 ( 2.34x)
One addressing optimization and fix some missing changes to a previous
commit that ported improvements from hi tok to other decode tok
functions.
Changes stem from redesigning the reduction stage of the multisymbol
decode function.
* No longer use adapt4 for 5 possible symbol values
* Specialize reduction for 4/8/16 decode functions
* Modify control flow

+------------------------+--------------+--------------+---------------+
|                        |  Neoverse V1 |  Neoverse N1 |   Cortex A72  |
|                        | (Graviton 3) | (Graviton 2) |  (Graviton 1) |
+------------------------+-------+------+-------+------+-------+-------+
|                        |  Old  |  New |  Old  |  New |  Old  |  New  |
+------------------------+-------+------+-------+------+-------+-------+
| decode_bool_neon       |  13.0 | 12.9 |  14.9 | 14.0 |  39.3 |  29.0 |
+------------------------+-------+------+-------+------+-------+-------+
| decode_bool_adapt_neon |  15.4 | 15.6 |  17.5 | 16.8 |  41.6 |  33.5 |
+------------------------+-------+------+-------+------+-------+-------+
| decode_bool_equi_neon  |  11.3 | 12.0 |  14.0 | 12.2 |  35.0 |  26.3 |
+------------------------+-------+------+-------+------+-------+-------+
| decode_hi_tok_c        |  73.7 | 57.8 |  73.4 | 60.5 | 130.1 | 103.9 |
+------------------------+-------+------+-------+------+-------+-------+
| decode_hi_tok_neon     |  63.3 | 48.2 |  65.2 | 51.2 | 119.0 | 105.3 |
+------------------------+-------+------+-------+------+-------+-------+
| decode_symbol_\        |  28.6 | 22.5 |  28.4 | 23.5 |  67.8 |  55.1 |
| adapt4_neon            |       |      |       |      |       |       |
+------------------------+-------+------+-------+------+-------+-------+
| decode_symbol_\        |  29.5 | 26.6 |  29.0 | 28.8 |  76.6 |  74.0 |
| adapt8_neon            |       |      |       |      |       |       |
+------------------------+-------+------+-------+------+-------+-------+
| decode_symbol_\        |  31.6 | 31.2 |  33.3 | 33.0 |  77.5 |  68.1 |
| adapt16_neon           |       |      |       |      |       |       |
+------------------------+-------+------+-------+------+-------+-------+
@kkysen kkysen self-assigned this Jun 17, 2024
@@ -77,7 +77,7 @@ unsigned dav1d_msac_decode_bool_equi(MsacContext *s);
unsigned dav1d_msac_decode_bool(MsacContext *s, unsigned f);
unsigned dav1d_msac_decode_hi_tok(MsacContext *s, uint16_t *cdf);

/* Supported n_symbols ranges: adapt4: 1-4, adapt8: 1-7, adapt16: 3-15 */
/* Supported n_symbols ranges: adapt4: 1-3, adapt8: 1-7, adapt16: 3-15 */
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ahh, this makes a lot of sense. I was wondering why it was 1-4 when all of the others were 1-N - 1 for adaptN.

@fbossen fbossen merged commit 3c238c2 into memorysafety:main Jun 18, 2024
26 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants