From a1192d141636b3a7ebeb0a97dbd09ad9c5b802a6 Mon Sep 17 00:00:00 2001 From: Ralf Jung Date: Sun, 13 Aug 2023 15:40:44 +0200 Subject: [PATCH 1/7] add collection of deliberate cases of UB --- resources/deliberate-ub.md | 41 ++++++++++++++++++++++++++++++++++++++ 1 file changed, 41 insertions(+) create mode 100644 resources/deliberate-ub.md diff --git a/resources/deliberate-ub.md b/resources/deliberate-ub.md new file mode 100644 index 00000000..ac7cabc6 --- /dev/null +++ b/resources/deliberate-ub.md @@ -0,0 +1,41 @@ +When a crate explicitly acknowledges that what it does is UB, but prefers keeping that code over UB-free alternatives (or there are no UB-free alternatives), that is always a concerning sign. +We should evaluate whether there truly is some use-case here that is not currently achievable in well-defined Rust, and work with crate authors on providing a UB-free solution. + +## Known cases of deliberate UB + +### Cases related to concurrency + +* crossbeam's `AtomicCell` implements an SeqLock, which [is well-known to not be compatible with the C++ memory model](http://www.hpl.hp.com/techreports/2012/HPL-2012-68.pdf). + Specifically the problem is [this non-atomic volatile read](https://github.com/crossbeam-rs/crossbeam/blob/5d07fe43540d7f21517a51813acd9332744e90cb/crossbeam-utils/src/atomic/atomic_cell.rs#L980) which can cause data races and hence UB. + This would be fine if we either (a) adopted LLVM's handling of memory races (then the problematic read would merely return `undef` instead of UB due to a data race), or (b) added [bytewise atomic memcpy](https://github.com/rust-lang/rfcs/pull/3301) and used that instead of the non-atomic volatile load. +* crossbeam's `AtomicCell` also uses the standard library `Atomic*` types to do atomic reads and writes of [*any type* that has the right size](https://github.com/crossbeam-rs/crossbeam/blob/master/crossbeam-utils/src/atomic/atomic_cell.rs#L928-L932). + However, doing an `AtomicU32` atomic read on a `(u16, u8)` is unsound because the padding byte can be uninitialized. + (Also, pointer provenance is lost.) + To fix this we need to be able to perform atomic loads at type `MaybeUninit`. +* Similar to the previous case, the `atomic-memcpy` crate uses the [standard `Atomic*` types to load potential padding or pointer bytes](https://github.com/taiki-e/atomic-memcpy/blob/b7b82bface0f24c5a9171f2d0626871c61302152/src/lib.rs#L323). + This is a user-space attempt to implement bytewise atomic memcpy, so [adding that as a native operation](https://github.com/rust-lang/rfcs/pull/3301) should fix this. +* `bytes` does [a non-atomic plain load that races](https://github.com/tokio-rs/bytes/blob/dea868a4b0eec28877e9013702c0ae12dbc40c4b/src/bytes.rs#L2508), + because relaxed loads are claimed to cost too much performance (also see [this LLVM issue](https://github.com/llvm/llvm-project/issues/37064)). + (Note that LLVM's handling of data races is not enough here, data races still return garbage data. Also see [this thread](https://internals.rust-lang.org/t/unordered-as-a-solution-to-bit-wise-reasoning-for-atomic-accesses/11079) on using "unordered".) + The best fix is probably to improve LLVM optimizations on relaxed loads, so that they are not considered too slow for this use-case. + However, it is not clear if all the desired optimizations are even legal for relaxed loads -- there is a significant semantic difference between relaxed loads and non-atomic loads, so seeing *some* performance difference is expected. + Code that works with concurrency can't just expect not to have to pay a cost for that. +* `smol-rs/event-listener` needs an SC fence, but on x86 they prefer a `lock cmpxchg` over an `mfence`. + To achieve this, they do an [SC `compare_exchange` on a local variable](https://github.com/smol-rs/event-listener/blob/0ea464102e74219aab2932f9eff14418a13268d4/src/notify.rs#L574-L577). + That is UB since such an operation has no synchronization effect; the compiler could easily see that this variable is never accessed from outside this function and hence optimize it away entirely. + The best fix is probably an inline assembly block. + +### Cases related to aliasing + +* `yoke` and similar crates relying in "stable deref" properties cause various forms of aliasing trouble (such as [having `Box` that alias with things](https://github.com/unicode-org/icu4x/issues/2095), or [having references in function arguments that get deallocated while the function runs](https://github.com/unicode-org/icu4x/issues/3696)). + This could be fixed by [`MaybeDangling`](https://github.com/rust-lang/rfcs/pull/3336). +* The entire `async fn` ecosystem and every hand-implemented self-referential generator or future is unsound since the self-reference aliases the `&mut` reference to the full generator/future. + This is currently hackfixed by making `Unpin` meaningful for UB; a proper solution would be to add something like [`UnsafeAliased`](https://github.com/rust-lang/rfcs/pull/3467). + +## Former cases of deliberate UB that have at least a work-in-progress solution to them + +* Various `offset_of` implementations caused UB by using `mem::uninitialized()`, or they used `&(*base).field` or `addr_of!((*base).field)` to project a dummy pointer to the field which is UB due to out-of-bounds pointer arithmetic. + The `memoffset` crate has a sound implementation that however causes stack allocations which the compiler must optimize away. + This will be fixed properly by the native [`offset_of!` macro](https://github.com/rust-lang/rfcs/pull/3308), which is [currently in nightly](https://github.com/rust-lang/rust/issues/106655). +* It used to be common to unwind out of `extern "C"` functions which is UB, see [this discussions](https://internals.rust-lang.org/t/unwinding-through-ffi-after-rust-1-33/9521). + This is fixed by `extern "C-unwind"`, which is stable since Rust 1.71. From 2efd24e87c79dd4d15f37deeeaec48c0efdd86c4 Mon Sep 17 00:00:00 2001 From: Ralf Jung Date: Sun, 13 Aug 2023 16:53:43 +0200 Subject: [PATCH 2/7] remove incorrect entry --- resources/deliberate-ub.md | 2 -- 1 file changed, 2 deletions(-) diff --git a/resources/deliberate-ub.md b/resources/deliberate-ub.md index ac7cabc6..9f42cf29 100644 --- a/resources/deliberate-ub.md +++ b/resources/deliberate-ub.md @@ -12,8 +12,6 @@ We should evaluate whether there truly is some use-case here that is not current However, doing an `AtomicU32` atomic read on a `(u16, u8)` is unsound because the padding byte can be uninitialized. (Also, pointer provenance is lost.) To fix this we need to be able to perform atomic loads at type `MaybeUninit`. -* Similar to the previous case, the `atomic-memcpy` crate uses the [standard `Atomic*` types to load potential padding or pointer bytes](https://github.com/taiki-e/atomic-memcpy/blob/b7b82bface0f24c5a9171f2d0626871c61302152/src/lib.rs#L323). - This is a user-space attempt to implement bytewise atomic memcpy, so [adding that as a native operation](https://github.com/rust-lang/rfcs/pull/3301) should fix this. * `bytes` does [a non-atomic plain load that races](https://github.com/tokio-rs/bytes/blob/dea868a4b0eec28877e9013702c0ae12dbc40c4b/src/bytes.rs#L2508), because relaxed loads are claimed to cost too much performance (also see [this LLVM issue](https://github.com/llvm/llvm-project/issues/37064)). (Note that LLVM's handling of data races is not enough here, data races still return garbage data. Also see [this thread](https://internals.rust-lang.org/t/unordered-as-a-solution-to-bit-wise-reasoning-for-atomic-accesses/11079) on using "unordered".) From 64979ed6c687197ddebbb69e365139d09b872acc Mon Sep 17 00:00:00 2001 From: Ralf Jung Date: Sun, 13 Aug 2023 17:03:10 +0200 Subject: [PATCH 3/7] talk about whether the LLVM IR we generate has UB --- resources/deliberate-ub.md | 7 +++++-- 1 file changed, 5 insertions(+), 2 deletions(-) diff --git a/resources/deliberate-ub.md b/resources/deliberate-ub.md index 9f42cf29..8bd83151 100644 --- a/resources/deliberate-ub.md +++ b/resources/deliberate-ub.md @@ -8,24 +8,27 @@ We should evaluate whether there truly is some use-case here that is not current * crossbeam's `AtomicCell` implements an SeqLock, which [is well-known to not be compatible with the C++ memory model](http://www.hpl.hp.com/techreports/2012/HPL-2012-68.pdf). Specifically the problem is [this non-atomic volatile read](https://github.com/crossbeam-rs/crossbeam/blob/5d07fe43540d7f21517a51813acd9332744e90cb/crossbeam-utils/src/atomic/atomic_cell.rs#L980) which can cause data races and hence UB. This would be fine if we either (a) adopted LLVM's handling of memory races (then the problematic read would merely return `undef` instead of UB due to a data race), or (b) added [bytewise atomic memcpy](https://github.com/rust-lang/rfcs/pull/3301) and used that instead of the non-atomic volatile load. + This is currently *not* UB in the LLVM IR we generate, due to LLVM's different handling of read-write races and because the code carefully uses `MaybeUninit`. * crossbeam's `AtomicCell` also uses the standard library `Atomic*` types to do atomic reads and writes of [*any type* that has the right size](https://github.com/crossbeam-rs/crossbeam/blob/master/crossbeam-utils/src/atomic/atomic_cell.rs#L928-L932). However, doing an `AtomicU32` atomic read on a `(u16, u8)` is unsound because the padding byte can be uninitialized. (Also, pointer provenance is lost.) To fix this we need to be able to perform atomic loads at type `MaybeUninit`. + The LLVM IR we generate here contains `noundef` so this UB is not just a specification artifact. * `bytes` does [a non-atomic plain load that races](https://github.com/tokio-rs/bytes/blob/dea868a4b0eec28877e9013702c0ae12dbc40c4b/src/bytes.rs#L2508), because relaxed loads are claimed to cost too much performance (also see [this LLVM issue](https://github.com/llvm/llvm-project/issues/37064)). - (Note that LLVM's handling of data races is not enough here, data races still return garbage data. Also see [this thread](https://internals.rust-lang.org/t/unordered-as-a-solution-to-bit-wise-reasoning-for-atomic-accesses/11079) on using "unordered".) + (Note that LLVM's handling of data races is not enough here, data races still return garbage data -- so this is UB in the generated LLVM IR. Also see [this thread](https://internals.rust-lang.org/t/unordered-as-a-solution-to-bit-wise-reasoning-for-atomic-accesses/11079) on using "unordered".) The best fix is probably to improve LLVM optimizations on relaxed loads, so that they are not considered too slow for this use-case. However, it is not clear if all the desired optimizations are even legal for relaxed loads -- there is a significant semantic difference between relaxed loads and non-atomic loads, so seeing *some* performance difference is expected. Code that works with concurrency can't just expect not to have to pay a cost for that. * `smol-rs/event-listener` needs an SC fence, but on x86 they prefer a `lock cmpxchg` over an `mfence`. To achieve this, they do an [SC `compare_exchange` on a local variable](https://github.com/smol-rs/event-listener/blob/0ea464102e74219aab2932f9eff14418a13268d4/src/notify.rs#L574-L577). - That is UB since such an operation has no synchronization effect; the compiler could easily see that this variable is never accessed from outside this function and hence optimize it away entirely. + That is UB (both in Rust and LLVM) since such an operation has no synchronization effect; the compiler could easily see that this variable is never accessed from outside this function and hence optimize it away entirely. The best fix is probably an inline assembly block. ### Cases related to aliasing * `yoke` and similar crates relying in "stable deref" properties cause various forms of aliasing trouble (such as [having `Box` that alias with things](https://github.com/unicode-org/icu4x/issues/2095), or [having references in function arguments that get deallocated while the function runs](https://github.com/unicode-org/icu4x/issues/3696)). + This also violates the LLVM assumptions that come with `noalias` and `dereferenceable`. This could be fixed by [`MaybeDangling`](https://github.com/rust-lang/rfcs/pull/3336). * The entire `async fn` ecosystem and every hand-implemented self-referential generator or future is unsound since the self-reference aliases the `&mut` reference to the full generator/future. This is currently hackfixed by making `Unpin` meaningful for UB; a proper solution would be to add something like [`UnsafeAliased`](https://github.com/rust-lang/rfcs/pull/3467). From b2694e7f547eccc659cfa0d76074a68110d62288 Mon Sep 17 00:00:00 2001 From: Ralf Jung Date: Sun, 13 Aug 2023 17:21:45 +0200 Subject: [PATCH 4/7] compare_exchange posex extra problems --- resources/deliberate-ub.md | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/resources/deliberate-ub.md b/resources/deliberate-ub.md index 8bd83151..a419990c 100644 --- a/resources/deliberate-ub.md +++ b/resources/deliberate-ub.md @@ -13,7 +13,9 @@ We should evaluate whether there truly is some use-case here that is not current However, doing an `AtomicU32` atomic read on a `(u16, u8)` is unsound because the padding byte can be uninitialized. (Also, pointer provenance is lost.) To fix this we need to be able to perform atomic loads at type `MaybeUninit`. - The LLVM IR we generate here contains `noundef` so this UB is not just a specification artifact. + The LLVM IR we generate here contains `noundef` so this UB is not just a specification artifact.
+ Furthermore, `compare_exchange` will compare padding bytes, which leads to UB. + It is not clear how to best specify a useful `compare_exchange` that can work on padding bytes. * `bytes` does [a non-atomic plain load that races](https://github.com/tokio-rs/bytes/blob/dea868a4b0eec28877e9013702c0ae12dbc40c4b/src/bytes.rs#L2508), because relaxed loads are claimed to cost too much performance (also see [this LLVM issue](https://github.com/llvm/llvm-project/issues/37064)). (Note that LLVM's handling of data races is not enough here, data races still return garbage data -- so this is UB in the generated LLVM IR. Also see [this thread](https://internals.rust-lang.org/t/unordered-as-a-solution-to-bit-wise-reasoning-for-atomic-accesses/11079) on using "unordered".) From 8d6c4e6961c4f18365fb31c9fd746b27ad4f5869 Mon Sep 17 00:00:00 2001 From: Ralf Jung Date: Sun, 13 Aug 2023 21:21:16 +0200 Subject: [PATCH 5/7] BytesMut no longer uses atomics; for smol there is a PR --- resources/deliberate-ub.md | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/resources/deliberate-ub.md b/resources/deliberate-ub.md index a419990c..722cf96c 100644 --- a/resources/deliberate-ub.md +++ b/resources/deliberate-ub.md @@ -16,16 +16,17 @@ We should evaluate whether there truly is some use-case here that is not current The LLVM IR we generate here contains `noundef` so this UB is not just a specification artifact.
Furthermore, `compare_exchange` will compare padding bytes, which leads to UB. It is not clear how to best specify a useful `compare_exchange` that can work on padding bytes. -* `bytes` does [a non-atomic plain load that races](https://github.com/tokio-rs/bytes/blob/dea868a4b0eec28877e9013702c0ae12dbc40c4b/src/bytes.rs#L2508), +* `bytes` used to do [a non-atomic plain load that races](https://github.com/tokio-rs/bytes/blob/dea868a4b0eec28877e9013702c0ae12dbc40c4b/src/bytes.rs#L2508), because relaxed loads are claimed to cost too much performance (also see [this LLVM issue](https://github.com/llvm/llvm-project/issues/37064)). (Note that LLVM's handling of data races is not enough here, data races still return garbage data -- so this is UB in the generated LLVM IR. Also see [this thread](https://internals.rust-lang.org/t/unordered-as-a-solution-to-bit-wise-reasoning-for-atomic-accesses/11079) on using "unordered".) The best fix is probably to improve LLVM optimizations on relaxed loads, so that they are not considered too slow for this use-case. However, it is not clear if all the desired optimizations are even legal for relaxed loads -- there is a significant semantic difference between relaxed loads and non-atomic loads, so seeing *some* performance difference is expected. Code that works with concurrency can't just expect not to have to pay a cost for that. + And anyway current `bytes` does not seem to have any atomics in `ByteMut` any more, so possibly this is resolved. * `smol-rs/event-listener` needs an SC fence, but on x86 they prefer a `lock cmpxchg` over an `mfence`. To achieve this, they do an [SC `compare_exchange` on a local variable](https://github.com/smol-rs/event-listener/blob/0ea464102e74219aab2932f9eff14418a13268d4/src/notify.rs#L574-L577). That is UB (both in Rust and LLVM) since such an operation has no synchronization effect; the compiler could easily see that this variable is never accessed from outside this function and hence optimize it away entirely. - The best fix is probably an inline assembly block. + The best fix is probably an [inline assembly block](https://github.com/smol-rs/event-listener/pull/71). ### Cases related to aliasing From 00fd712b172e193d1af1e2d30f0c29b97159dc15 Mon Sep 17 00:00:00 2001 From: Ralf Jung Date: Sun, 13 Aug 2023 22:09:42 +0200 Subject: [PATCH 6/7] add some general Stacked Borrows concerns --- resources/deliberate-ub.md | 6 ++++++ 1 file changed, 6 insertions(+) diff --git a/resources/deliberate-ub.md b/resources/deliberate-ub.md index 722cf96c..11f8739a 100644 --- a/resources/deliberate-ub.md +++ b/resources/deliberate-ub.md @@ -35,6 +35,12 @@ We should evaluate whether there truly is some use-case here that is not current This could be fixed by [`MaybeDangling`](https://github.com/rust-lang/rfcs/pull/3336). * The entire `async fn` ecosystem and every hand-implemented self-referential generator or future is unsound since the self-reference aliases the `&mut` reference to the full generator/future. This is currently hackfixed by making `Unpin` meaningful for UB; a proper solution would be to add something like [`UnsafeAliased`](https://github.com/rust-lang/rfcs/pull/3467). +* Stacked Borrows forbids a bunch of things that might be considered too restrictive (and that go well beyond LLVM `noalias`): + strict subobject provenance [rules out the `&Header` pattern](https://github.com/rust-lang/unsafe-code-guidelines/issues/256) and also affects [raw pointers derived from references](https://github.com/rust-lang/unsafe-code-guidelines/issues/134); + eager assertion of uniquess makes [even read-only functions such as `as_mut_ptr` dangerous when they take `&mut`](https://github.com/rust-lang/unsafe-code-guidelines/issues/133); + `&UnsafeCell` surprisingly [requires read-write memory even when it is never written](https://github.com/rust-lang/unsafe-code-guidelines/issues/303). + There is a bunch of code out there that violates these rules one way or another. + All of these are resolved by [Tree Borrows](https://perso.crans.org/vanille/treebor/), though [some subtleties around `as_mut_ptr` do remain](https://github.com/rust-lang/unsafe-code-guidelines/issues/450). ## Former cases of deliberate UB that have at least a work-in-progress solution to them From 2782bcaead0a8796982db0401538adcc5ab8729f Mon Sep 17 00:00:00 2001 From: Ralf Jung Date: Sat, 19 Aug 2023 23:12:02 +0200 Subject: [PATCH 7/7] remove resolve cases of deliberate UB: bytes, smol-rs/event-listener --- resources/deliberate-ub.md | 22 ++++++---------------- 1 file changed, 6 insertions(+), 16 deletions(-) diff --git a/resources/deliberate-ub.md b/resources/deliberate-ub.md index 11f8739a..1b53928f 100644 --- a/resources/deliberate-ub.md +++ b/resources/deliberate-ub.md @@ -5,28 +5,18 @@ We should evaluate whether there truly is some use-case here that is not current ### Cases related to concurrency -* crossbeam's `AtomicCell` implements an SeqLock, which [is well-known to not be compatible with the C++ memory model](http://www.hpl.hp.com/techreports/2012/HPL-2012-68.pdf). +* crossbeam's `AtomicCell` "fallback path" uses a SeqLock, which [is well-known to not be compatible with the C++ memory model](http://www.hpl.hp.com/techreports/2012/HPL-2012-68.pdf). Specifically the problem is [this non-atomic volatile read](https://github.com/crossbeam-rs/crossbeam/blob/5d07fe43540d7f21517a51813acd9332744e90cb/crossbeam-utils/src/atomic/atomic_cell.rs#L980) which can cause data races and hence UB. This would be fine if we either (a) adopted LLVM's handling of memory races (then the problematic read would merely return `undef` instead of UB due to a data race), or (b) added [bytewise atomic memcpy](https://github.com/rust-lang/rfcs/pull/3301) and used that instead of the non-atomic volatile load. This is currently *not* UB in the LLVM IR we generate, due to LLVM's different handling of read-write races and because the code carefully uses `MaybeUninit`. -* crossbeam's `AtomicCell` also uses the standard library `Atomic*` types to do atomic reads and writes of [*any type* that has the right size](https://github.com/crossbeam-rs/crossbeam/blob/master/crossbeam-utils/src/atomic/atomic_cell.rs#L928-L932). - However, doing an `AtomicU32` atomic read on a `(u16, u8)` is unsound because the padding byte can be uninitialized. - (Also, pointer provenance is lost.) +* crossbeam's `AtomicCell` "fast path" uses the standard library `Atomic*` types to do atomic reads and writes of [*any type* that has the right size](https://github.com/crossbeam-rs/crossbeam/blob/master/crossbeam-utils/src/atomic/atomic_cell.rs#L928-L932). + However, doing an `AtomicU32` read (returning a `u32`) on a `(u16, u8)` is unsound because the padding byte can be uninitialized. + (Also, pointer provenance is lost, so `AtomicCell<*const T>` does not behave the way people expect.) To fix this we need to be able to perform atomic loads at type `MaybeUninit`. The LLVM IR we generate here contains `noundef` so this UB is not just a specification artifact.
Furthermore, `compare_exchange` will compare padding bytes, which leads to UB. - It is not clear how to best specify a useful `compare_exchange` that can work on padding bytes. -* `bytes` used to do [a non-atomic plain load that races](https://github.com/tokio-rs/bytes/blob/dea868a4b0eec28877e9013702c0ae12dbc40c4b/src/bytes.rs#L2508), - because relaxed loads are claimed to cost too much performance (also see [this LLVM issue](https://github.com/llvm/llvm-project/issues/37064)). - (Note that LLVM's handling of data races is not enough here, data races still return garbage data -- so this is UB in the generated LLVM IR. Also see [this thread](https://internals.rust-lang.org/t/unordered-as-a-solution-to-bit-wise-reasoning-for-atomic-accesses/11079) on using "unordered".) - The best fix is probably to improve LLVM optimizations on relaxed loads, so that they are not considered too slow for this use-case. - However, it is not clear if all the desired optimizations are even legal for relaxed loads -- there is a significant semantic difference between relaxed loads and non-atomic loads, so seeing *some* performance difference is expected. - Code that works with concurrency can't just expect not to have to pay a cost for that. - And anyway current `bytes` does not seem to have any atomics in `ByteMut` any more, so possibly this is resolved. -* `smol-rs/event-listener` needs an SC fence, but on x86 they prefer a `lock cmpxchg` over an `mfence`. - To achieve this, they do an [SC `compare_exchange` on a local variable](https://github.com/smol-rs/event-listener/blob/0ea464102e74219aab2932f9eff14418a13268d4/src/notify.rs#L574-L577). - That is UB (both in Rust and LLVM) since such an operation has no synchronization effect; the compiler could easily see that this variable is never accessed from outside this function and hence optimize it away entirely. - The best fix is probably an [inline assembly block](https://github.com/smol-rs/event-listener/pull/71). + It is not clear how to best specify a useful `compare_exchange` that can work on padding bytes, + see the [discussion here](https://github.com/rust-lang/unsafe-code-guidelines/issues/449). ### Cases related to aliasing