Replace memcmp to s2n_constant_time_equals #4709

boquan-fang · 2024-08-15T17:57:51Z

Resolved issues:

Solving issue #3062

Description of changes:

Change most of memcmp usages to s2n_constant_time_equals.

There are two parts that weren't change, which are in s2n_cipher_suits.c (line 1110) and s2n_config.c (line 323). Both of those parts use the return integer from memcmp which s2n_constant_time_equals can't do.

Regression Tests Results

We recently added new regression tests that lets us know the instruction count differences over some common codepaths.

--------------------------------------------------------------------------------
-- Summary
--------------------------------------------------------------------------------
Ir____________ 

1,707 (100.0%)  PROGRAM TOTALS

--------------------------------------------------------------------------------
-- File:function summary
--------------------------------------------------------------------------------
  Ir____________________  file:function

< 2,190 (128.3%, 128.3%)  /home/ubuntu/workspace/s2n-tls/bindings/rust/s2n-tls-sys/lib/utils/s2n_safety.c:s2n_constant_time_equals

<  -384 (-22.5%, 105.8%)  ./string/../sysdeps/x86_64/multiarch/memcmp-avx2-movbe.S:__memcmp_avx2_movbe

<   -62  (-3.6%, 102.2%)  /rust/deps/hashbrown-0.14.5/src/raw/mod.rs:
    -58  (-3.4%)            hashbrown::map::HashMap<K,V,S,A>::insert
     -4  (-0.2%)            core::iter::adapters::try_process

<   -34  (-2.0%, 100.2%)  /rustc/3f5fd8dd41153bc5fdca9427e9e05be2c767ba23/library/core/src/../../stdarch/crates/core_arch/src/x86/sse2.rs:
    -30  (-1.8%)            hashbrown::map::HashMap<K,V,S,A>::insert
     -2  (-0.1%)            hashbrown::raw::RawTable<T,A>::reserve_rehash
     -2  (-0.1%)            core::iter::adapters::try_process

This shows us that there is about a 1,707 instruction count penalty for shifting to s2n_constant_time_equals. The main differences are the additional 2,190 instruction from s2n_constant_time_equals and the 384 instruction count reduction from memcmp. Relative to the 79,292,794 instructions for a complete RSA handshake, this is a 0.00215 % regression in handshake performance.

Note: The hashbrown instruction count differences are due to randomness between runs.

Handshake Benchmarks

We also have benchmarks covering handshake and throughput performance. Comparing these benchmarks between this PR and mainline we see the following differences.


handshake-rsa2048/s2n-tls
                        time:   [1.0697 ms 1.0702 ms 1.0707 ms]
                        change: [-0.8251% +0.1691% +1.2108%] (p = 0.78 > 0.05)
                        No change in performance detected.

handshake-rsa3072/s2n-tls
                        time:   [2.9331 ms 2.9364 ms 2.9396 ms]
                        change: [-0.1189% +0.0360% +0.1889%] (p = 0.65 > 0.05)
                        No change in performance detected.
                        
handshake-rsa4096/s2n-tls
                        time:   [6.2232 ms 6.2324 ms 6.2420 ms]
                        change: [-0.1699% +0.0467% +0.2659%] (p = 0.68 > 0.05)
                        No change in performance detected.

handshake-ecdsa384/s2n-tls
                        time:   [1.1160 ms 1.1162 ms 1.1164 ms]
                        change: [-0.1847% +0.0291% +0.2434%] (p = 0.81 > 0.05)
                        No change in performance detected.

handshake-ecdsa256/s2n-tls
                        time:   [398.52 µs 398.62 µs 398.74 µs]
                        change: [+0.3657% +0.7706% +1.1642%] (p = 0.00 < 0.05)
                        Change within noise threshold.

throughput-AES_128_GCM_SHA256/s2n-tls
                        time:   [108.28 µs 108.70 µs 109.14 µs]
                        thrpt:  [873.83 MiB/s 877.37 MiB/s 880.71 MiB/s]
                 change:
                        time:   [+0.8096% +1.3883% +1.9335%] (p = 0.00 < 0.05)
                        thrpt:  [-1.8968% -1.3693% -0.8031%]
                        Change within noise threshold.

throughput-AES_256_GCM_SHA384/s2n-tls
                        time:   [118.10 µs 118.55 µs 119.01 µs]
                        thrpt:  [801.35 MiB/s 804.47 MiB/s 807.49 MiB/s]
                 change:
                        time:   [-1.3449% -0.9600% -0.5646%] (p = 0.00 < 0.05)
                        thrpt:  [+0.5678% +0.9693% +1.3632%]

As expected, we do not see any significant changes. While there are some slight differences reported, these are below the accuracy threshold of the benchmarks.

s2n_constant_time_equals vs memcmp Benchmarks

We ran criterion benchmarks comparing the performance of memcmp against s2n_constant_time_equals.

small data, not equal: how long does it take to compare an unequal blob of 255 bytes?
small data, equal: how long does it take to compare an equal blob of 255 bytes?

These cases give numbers for the common case of comparing small pieces of data.

many small blobs, not equal: how long does it take to compare 16 kB of 255 byte blobs.

This is a representation of a pathological case possible in places like s2n_protocol_preferences.c. Generally there would be only a small number of protocols, but it is possible for a client to send up to 16 kB of preferences, where each protocol has a maximum size of 255 bytes. This case gives us an understanding of the performance impact in this worst case scenario.

# 386 ps vs 130,280 ps
low-level-comparison - small data, not equal/memcmp
                        time:   [386.08 ps 386.10 ps 386.12 ps]

low-level-comparison - small data, not equal/s2n_constant_time_equals
                        time:   [130.28 ns 130.28 ns 130.29 ns]


# 386 ps vs 130,300 ps
low-level-comparison - small data, equal/memcmp
                        time:   [386.07 ps 386.09 ps 386.11 ps]

low-level-comparison - small data, equal/s2n_constant_time_equals_c
                        time:   [130.29 ns 130.30 ns 130.31 ns]


# 519 ns vs 30,593 ns
low-level-comparison - many small blobs, not equal/memcmp
                        time:   [519.23 ns 519.37 ns 519.50 ns]

low-level-comparison - many small blobs, not equal/s2n_constant_time_equals
                        time:   [30.592 µs 30.593 µs 30.595 µs]

We see that while s2n_constant_time_equals is significantly slower than memcmp, it remains a very fast function, and its cost is small relative to the cost of an entire handshake (120ns / 1_000_000ns => .0000012% ).

For pathological cases (comparing an entire 16kb extension) the s2n_const_time_equals would represent a noticeable portion of the handshake (30us / 1_000us => 3%). Therefore we avoid the usage of s2n_constant_time_equals in scenarios where largeish data (> 1kB) would be compared.

Call-outs:

Needs to make a note to those two parts that aren't changed.

Testing:

This change is a refactor change. The code ran and passed all test cases via cmake.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

jmayclin

Let's also update grep_simple_mistakes as part of this PR.

crypto/s2n_rsa.c

tls/s2n_connection.c

tls/s2n_early_data.c

* Let s2n_stuffer_read_expected_str use memcmp to avoid CBMC problem.

* Adding one more known memcmp location to grep_simple_mistakes.sh.

* Replace S2N_ERROR_IF to POSIX_ENSURE.

* Reorder the .sh order to fit the directory order.

colmmacc · 2024-08-16T22:00:32Z

What's the performance impact of this change? libc's memcmp and memcpy are highly optimized to do word-wise comparisons, pipeline prefetching, etc.

Remove the usage of s2n_constant_time_equals from any functions where it is possible to compare relatively large amounts of data (~ > 1 kB) even if that scenario is unlikely.

jmayclin · 2024-08-20T17:05:56Z

Hi Colm! For context, the goal of this PR is to reduce the number of places that we are using memcmp. This helps reduce the risk of sidechannels by removing data-dependent comparisons.

In-depth benchmarks were added to the PR description. For the specific s2n-tls configuration that we use for handshake benchmarks:

no regression was detected using wall clock benchmarks
there was a 0.00215% regression in instruction count

This is in line with expectations, because most of the places that memcmp was replaced were comparing small, statically sized amounts of data. The places that aren’t comparing statically sized data are comparing small, bounded amounts of data. For these bounded comparisons, s2n_constant_time_equals adds ~130 ns. For more information see s2n_constant_time_equals vs memcmp Benchmarks in the PR description.

non static, bounded data

s2n_early_data.c:
- app_protocol_size is the size of the user supplied application protocol preferences
- This is guaranteed to be less than 256 bytes, because the buffer that we store it in has a capacity of 256 bytes.
- This happens at most once per handshake when validating the early data presented.

s2n-tls/tls/s2n_connection.h

Lines 335 to 340 in 87f4a05

    
               /* The application protocol decided upon during the client hello. 
        
                * If ALPN is being used, then: 
        
                * In server mode, this will be set by the time client_hello_cb is invoked. 
        
                * In client mode, this will be set after is_handshake_complete(connection) is true. 
        
                */ 
        
               char application_protocol[256];

s2n_server_hello.c
- session_id_len is the length of the session id
- This is guaranteed to be less than 32 bytes, because we bail with an error message if the length is more than 32 bytes.
- This happens at most once per handshake when the server_hello is parsed.

s2n-tls/tls/s2n_server_hello.c

Line 139 in fcc3184

S2N_ERROR_IF(session_id_len > S2N_TLS_SESSION_ID_MAX_LEN, S2N_ERR_BAD_MESSAGE);

Note: This PR was originally changing some “generally small” cases, but there was the possibility of pathological inputs, like a client sending 16 kB of application protocols. #4717 showed that this would have cost ~3 µs, so these cases were reverted.

Boquan Fang added 2 commits August 15, 2024 01:53

refactor: replacing memcmp by s2n_constant_time_equals

cdb3cbe

refactor: modify conditional statments for s2n_constant_time_equals

571f2ba

boquan-fang requested review from goatgoose and jmayclin August 15, 2024 17:57

github-actions bot added the s2n-core team label Aug 15, 2024

jmayclin reviewed Aug 15, 2024

View reviewed changes

goatgoose reviewed Aug 15, 2024

View reviewed changes

crypto/s2n_rsa.c Show resolved Hide resolved

crypto/s2n_rsa.c Outdated Show resolved Hide resolved

tls/s2n_connection.c Show resolved Hide resolved

tls/s2n_early_data.c Outdated Show resolved Hide resolved

refactor: respond to PR comments and attempt to fix CI problem

ac6f580

boquan-fang requested a review from dougch as a code owner August 15, 2024 22:17

dougch approved these changes Aug 15, 2024

View reviewed changes

boquan-fang force-pushed the replace-memcmp branch from 12f4ff8 to b95a1fd Compare August 15, 2024 23:37

Address PR and CI concerns

359e689

* Let s2n_stuffer_read_expected_str use memcmp to avoid CBMC problem.

boquan-fang force-pushed the replace-memcmp branch from b95a1fd to 359e689 Compare August 15, 2024 23:40

Address PR and CI concerns:

c676feb

* Adding one more known memcmp location to grep_simple_mistakes.sh.

boquan-fang enabled auto-merge (squash) August 16, 2024 00:07

Boquan Fang added 2 commits August 16, 2024 00:32

Address CR comments:

e7049af

* Replace S2N_ERROR_IF to POSIX_ENSURE.

Address CR concern:

12295ca

* Reorder the .sh order to fit the directory order.

boquan-fang disabled auto-merge August 16, 2024 21:39

boquan-fang requested review from goatgoose and jmayclin August 16, 2024 21:39

jmayclin approved these changes Aug 16, 2024

View reviewed changes

jmayclin mentioned this pull request Aug 20, 2024

bench: s2n_constant_time_equals #4717

Open

Remove cases with possibility of pathological inputs

0feaa68

Remove the usage of s2n_constant_time_equals from any functions where it is possible to compare relatively large amounts of data (~ > 1 kB) even if that scenario is unlikely.

fix: fix grep_simple_mistakes.sh to include additional memcmp calls

19f1016

goatgoose approved these changes Aug 26, 2024

View reviewed changes

lrstewart added the do_not_merge PR might needs something before merging, even if approved and passing label Aug 27, 2024

boquan-fang added 2 commits September 4, 2024 13:51

Merge branch 'main' into replace-memcmp

c541742

Merge branch 'main' into replace-memcmp

c0d426c

boquan-fang merged commit 08d413a into aws:main Sep 5, 2024
36 checks passed

boquan-fang removed the do_not_merge PR might needs something before merging, even if approved and passing label Sep 5, 2024

BrewTestBot mentioned this pull request Sep 6, 2024

s2n 1.5.2 Homebrew/homebrew-core#183760

Merged

boquan-fang mentioned this pull request Sep 26, 2024

Audit and replace memcmp usage with s2n_constant_time_equals #3062

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Replace memcmp to s2n_constant_time_equals #4709

Replace memcmp to s2n_constant_time_equals #4709

boquan-fang commented Aug 15, 2024 •

edited

Loading

jmayclin left a comment

colmmacc commented Aug 16, 2024

jmayclin commented Aug 20, 2024

Replace memcmp to s2n_constant_time_equals #4709

Replace memcmp to s2n_constant_time_equals #4709

Conversation

boquan-fang commented Aug 15, 2024 • edited Loading

Resolved issues:

Description of changes:

Regression Tests Results

Handshake Benchmarks

s2n_constant_time_equals vs memcmp Benchmarks

Call-outs:

Testing:

jmayclin left a comment

Choose a reason for hiding this comment

colmmacc commented Aug 16, 2024

jmayclin commented Aug 20, 2024

boquan-fang commented Aug 15, 2024 •

edited

Loading