From a1192d141636b3a7ebeb0a97dbd09ad9c5b802a6 Mon Sep 17 00:00:00 2001
From: Ralf Jung <post@ralfj.de>
Date: Sun, 13 Aug 2023 15:40:44 +0200
Subject: [PATCH 1/7] add collection of deliberate cases of UB

---
 resources/deliberate-ub.md | 41 ++++++++++++++++++++++++++++++++++++++
 1 file changed, 41 insertions(+)
 create mode 100644 resources/deliberate-ub.md
diff --git a/resources/deliberate-ub.md b/resources/deliberate-ub.md
new file mode 100644
index 00000000..ac7cabc6
--- /dev/null
+++ b/resources/deliberate-ub.md
@@ -0,0 +1,41 @@
+When a crate explicitly acknowledges that what it does is UB, but prefers keeping that code over UB-free alternatives (or there are no UB-free alternatives), that is always a concerning sign.
+We should evaluate whether there truly is some use-case here that is not currently achievable in well-defined Rust, and work with crate authors on providing a UB-free solution.
+
+## Known cases of deliberate UB
+
+### Cases related to concurrency
+
+* crossbeam's `AtomicCell` implements an SeqLock, which [is well-known to not be compatible with the C++ memory model](http://www.hpl.hp.com/techreports/2012/HPL-2012-68.pdf).
+  Specifically the problem is [this non-atomic volatile read](https://github.com/crossbeam-rs/crossbeam/blob/5d07fe43540d7f21517a51813acd9332744e90cb/crossbeam-utils/src/atomic/atomic_cell.rs#L980) which can cause data races and hence UB.
+  This would be fine if we either (a) adopted LLVM's handling of memory races (then the problematic read would merely return `undef` instead of UB due to a data race), or (b) added [bytewise atomic memcpy](https://github.com/rust-lang/rfcs/pull/3301) and used that instead of the non-atomic volatile load.
+* crossbeam's `AtomicCell` also uses the standard library `Atomic*` types to do atomic reads and writes of [*any type* that has the right size](https://github.com/crossbeam-rs/crossbeam/blob/master/crossbeam-utils/src/atomic/atomic_cell.rs#L928-L932).
+  However, doing an `AtomicU32` atomic read on a `(u16, u8)` is unsound because the padding byte can be uninitialized.
+  (Also, pointer provenance is lost.)
+  To fix this we need to be able to perform atomic loads at type `MaybeUninit<u32>`.
+* Similar to the previous case, the `atomic-memcpy` crate uses the [standard `Atomic*` types to load potential padding or pointer bytes](https://github.com/taiki-e/atomic-memcpy/blob/b7b82bface0f24c5a9171f2d0626871c61302152/src/lib.rs#L323).
+  This is a user-space attempt to implement bytewise atomic memcpy, so [adding that as a native operation](https://github.com/rust-lang/rfcs/pull/3301) should fix this.
+* `bytes` does [a non-atomic plain load that races](https://github.com/tokio-rs/bytes/blob/dea868a4b0eec28877e9013702c0ae12dbc40c4b/src/bytes.rs#L2508),
+  because relaxed loads are claimed to cost too much performance (also see [this LLVM issue](https://github.com/llvm/llvm-project/issues/37064)).
+  (Note that LLVM's handling of data races is not enough here, data races still return garbage data. Also see [this thread](https://internals.rust-lang.org/t/unordered-as-a-solution-to-bit-wise-reasoning-for-atomic-accesses/11079) on using "unordered".)
+  The best fix is probably to improve LLVM optimizations on relaxed loads, so that they are not considered too slow for this use-case.
+  However, it is not clear if all the desired optimizations are even legal for relaxed loads -- there is a significant semantic difference between relaxed loads and non-atomic loads, so seeing *some* performance difference is expected.
+  Code that works with concurrency can't just expect not to have to pay a cost for that.
+* `smol-rs/event-listener` needs an SC fence, but on x86 they prefer a `lock cmpxchg` over an `mfence`.
+  To achieve this, they do an [SC `compare_exchange` on a local variable](https://github.com/smol-rs/event-listener/blob/0ea464102e74219aab2932f9eff14418a13268d4/src/notify.rs#L574-L577).
+  That is UB since such an operation has no synchronization effect; the compiler could easily see that this variable is never accessed from outside this function and hence optimize it away entirely.
+  The best fix is probably an inline assembly block.
+
+### Cases related to aliasing
+
+* `yoke` and similar crates relying in "stable deref" properties cause various forms of aliasing trouble (such as [having `Box` that alias with things](https://github.com/unicode-org/icu4x/issues/2095), or [having references in function arguments that get deallocated while the function runs](https://github.com/unicode-org/icu4x/issues/3696)).
+  This could be fixed by [`MaybeDangling`](https://github.com/rust-lang/rfcs/pull/3336).
+* The entire `async fn` ecosystem and every hand-implemented self-referential generator or future is unsound since the self-reference aliases the `&mut` reference to the full generator/future.
+  This is currently hackfixed by making `Unpin` meaningful for UB; a proper solution would be to add something like [`UnsafeAliased`](https://github.com/rust-lang/rfcs/pull/3467).
+
+## Former cases of deliberate UB that have at least a work-in-progress solution to them
+
+* Various `offset_of` implementations caused UB by using `mem::uninitialized()`, or they used `&(*base).field` or `addr_of!((*base).field)` to project a dummy pointer to the field which is UB due to out-of-bounds pointer arithmetic.
+  The `memoffset` crate has a sound implementation that however causes stack allocations which the compiler must optimize away.
+  This will be fixed properly by the native [`offset_of!` macro](https://github.com/rust-lang/rfcs/pull/3308), which is [currently in nightly](https://github.com/rust-lang/rust/issues/106655).
+* It used to be common to unwind out of `extern "C"` functions which is UB, see [this discussions](https://internals.rust-lang.org/t/unwinding-through-ffi-after-rust-1-33/9521).
+  This is fixed by `extern "C-unwind"`, which is stable since Rust 1.71.

From 2efd24e87c79dd4d15f37deeeaec48c0efdd86c4 Mon Sep 17 00:00:00 2001
From: Ralf Jung <post@ralfj.de>
Date: Sun, 13 Aug 2023 16:53:43 +0200
Subject: [PATCH 2/7] remove incorrect entry

---
 resources/deliberate-ub.md | 2 --
 1 file changed, 2 deletions(-)

diff --git a/resources/deliberate-ub.md b/resources/deliberate-ub.md
index ac7cabc6..9f42cf29 100644
--- a/resources/deliberate-ub.md
+++ b/resources/deliberate-ub.md
@@ -12,8 +12,6 @@ We should evaluate whether there truly is some use-case here that is not current
   However, doing an `AtomicU32` atomic read on a `(u16, u8)` is unsound because the padding byte can be uninitialized.
   (Also, pointer provenance is lost.)
   To fix this we need to be able to perform atomic loads at type `MaybeUninit<u32>`.
-* Similar to the previous case, the `atomic-memcpy` crate uses the [standard `Atomic*` types to load potential padding or pointer bytes](https://github.com/taiki-e/atomic-memcpy/blob/b7b82bface0f24c5a9171f2d0626871c61302152/src/lib.rs#L323).
-  This is a user-space attempt to implement bytewise atomic memcpy, so [adding that as a native operation](https://github.com/rust-lang/rfcs/pull/3301) should fix this.
 * `bytes` does [a non-atomic plain load that races](https://github.com/tokio-rs/bytes/blob/dea868a4b0eec28877e9013702c0ae12dbc40c4b/src/bytes.rs#L2508),
   because relaxed loads are claimed to cost too much performance (also see [this LLVM issue](https://github.com/llvm/llvm-project/issues/37064)).
   (Note that LLVM's handling of data races is not enough here, data races still return garbage data. Also see [this thread](https://internals.rust-lang.org/t/unordered-as-a-solution-to-bit-wise-reasoning-for-atomic-accesses/11079) on using "unordered".)

From 64979ed6c687197ddebbb69e365139d09b872acc Mon Sep 17 00:00:00 2001
From: Ralf Jung <post@ralfj.de>
Date: Sun, 13 Aug 2023 17:03:10 +0200
Subject: [PATCH 3/7] talk about whether the LLVM IR we generate has UB

---
 resources/deliberate-ub.md | 7 +++++--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/resources/deliberate-ub.md b/resources/deliberate-ub.md
index 9f42cf29..8bd83151 100644
--- a/resources/deliberate-ub.md
+++ b/resources/deliberate-ub.md
@@ -8,24 +8,27 @@ We should evaluate whether there truly is some use-case here that is not current
 * crossbeam's `AtomicCell` implements an SeqLock, which [is well-known to not be compatible with the C++ memory model](http://www.hpl.hp.com/techreports/2012/HPL-2012-68.pdf).
   Specifically the problem is [this non-atomic volatile read](https://github.com/crossbeam-rs/crossbeam/blob/5d07fe43540d7f21517a51813acd9332744e90cb/crossbeam-utils/src/atomic/atomic_cell.rs#L980) which can cause data races and hence UB.
   This would be fine if we either (a) adopted LLVM's handling of memory races (then the problematic read would merely return `undef` instead of UB due to a data race), or (b) added [bytewise atomic memcpy](https://github.com/rust-lang/rfcs/pull/3301) and used that instead of the non-atomic volatile load.
+  This is currently *not* UB in the LLVM IR we generate, due to LLVM's different handling of read-write races and because the code carefully uses `MaybeUninit`.
 * crossbeam's `AtomicCell` also uses the standard library `Atomic*` types to do atomic reads and writes of [*any type* that has the right size](https://github.com/crossbeam-rs/crossbeam/blob/master/crossbeam-utils/src/atomic/atomic_cell.rs#L928-L932).
   However, doing an `AtomicU32` atomic read on a `(u16, u8)` is unsound because the padding byte can be uninitialized.
   (Also, pointer provenance is lost.)
   To fix this we need to be able to perform atomic loads at type `MaybeUninit<u32>`.
+  The LLVM IR we generate here contains `noundef` so this UB is not just a specification artifact.
 * `bytes` does [a non-atomic plain load that races](https://github.com/tokio-rs/bytes/blob/dea868a4b0eec28877e9013702c0ae12dbc40c4b/src/bytes.rs#L2508),
   because relaxed loads are claimed to cost too much performance (also see [this LLVM issue](https://github.com/llvm/llvm-project/issues/37064)).
-  (Note that LLVM's handling of data races is not enough here, data races still return garbage data. Also see [this thread](https://internals.rust-lang.org/t/unordered-as-a-solution-to-bit-wise-reasoning-for-atomic-accesses/11079) on using "unordered".)
+  (Note that LLVM's handling of data races is not enough here, data races still return garbage data -- so this is UB in the generated LLVM IR. Also see [this thread](https://internals.rust-lang.org/t/unordered-as-a-solution-to-bit-wise-reasoning-for-atomic-accesses/11079) on using "unordered".)
   The best fix is probably to improve LLVM optimizations on relaxed loads, so that they are not considered too slow for this use-case.
   However, it is not clear if all the desired optimizations are even legal for relaxed loads -- there is a significant semantic difference between relaxed loads and non-atomic loads, so seeing *some* performance difference is expected.
   Code that works with concurrency can't just expect not to have to pay a cost for that.
 * `smol-rs/event-listener` needs an SC fence, but on x86 they prefer a `lock cmpxchg` over an `mfence`.
   To achieve this, they do an [SC `compare_exchange` on a local variable](https://github.com/smol-rs/event-listener/blob/0ea464102e74219aab2932f9eff14418a13268d4/src/notify.rs#L574-L577).
-  That is UB since such an operation has no synchronization effect; the compiler could easily see that this variable is never accessed from outside this function and hence optimize it away entirely.
+  That is UB (both in Rust and LLVM) since such an operation has no synchronization effect; the compiler could easily see that this variable is never accessed from outside this function and hence optimize it away entirely.
   The best fix is probably an inline assembly block.
 
 ### Cases related to aliasing
 
 * `yoke` and similar crates relying in "stable deref" properties cause various forms of aliasing trouble (such as [having `Box` that alias with things](https://github.com/unicode-org/icu4x/issues/2095), or [having references in function arguments that get deallocated while the function runs](https://github.com/unicode-org/icu4x/issues/3696)).
+  This also violates the LLVM assumptions that come with `noalias` and `dereferenceable`.
   This could be fixed by [`MaybeDangling`](https://github.com/rust-lang/rfcs/pull/3336).
 * The entire `async fn` ecosystem and every hand-implemented self-referential generator or future is unsound since the self-reference aliases the `&mut` reference to the full generator/future.
   This is currently hackfixed by making `Unpin` meaningful for UB; a proper solution would be to add something like [`UnsafeAliased`](https://github.com/rust-lang/rfcs/pull/3467).

From b2694e7f547eccc659cfa0d76074a68110d62288 Mon Sep 17 00:00:00 2001
From: Ralf Jung <post@ralfj.de>
Date: Sun, 13 Aug 2023 17:21:45 +0200
Subject: [PATCH 4/7] compare_exchange posex extra problems

---
 resources/deliberate-ub.md | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/resources/deliberate-ub.md b/resources/deliberate-ub.md
index 8bd83151..a419990c 100644
--- a/resources/deliberate-ub.md
+++ b/resources/deliberate-ub.md
@@ -13,7 +13,9 @@ We should evaluate whether there truly is some use-case here that is not current
   However, doing an `AtomicU32` atomic read on a `(u16, u8)` is unsound because the padding byte can be uninitialized.
   (Also, pointer provenance is lost.)
   To fix this we need to be able to perform atomic loads at type `MaybeUninit<u32>`.
-  The LLVM IR we generate here contains `noundef` so this UB is not just a specification artifact.
+  The LLVM IR we generate here contains `noundef` so this UB is not just a specification artifact.<br>
+  Furthermore, `compare_exchange` will compare padding bytes, which leads to UB.
+  It is not clear how to best specify a useful `compare_exchange` that can work on padding bytes.
 * `bytes` does [a non-atomic plain load that races](https://github.com/tokio-rs/bytes/blob/dea868a4b0eec28877e9013702c0ae12dbc40c4b/src/bytes.rs#L2508),
   because relaxed loads are claimed to cost too much performance (also see [this LLVM issue](https://github.com/llvm/llvm-project/issues/37064)).
   (Note that LLVM's handling of data races is not enough here, data races still return garbage data -- so this is UB in the generated LLVM IR. Also see [this thread](https://internals.rust-lang.org/t/unordered-as-a-solution-to-bit-wise-reasoning-for-atomic-accesses/11079) on using "unordered".)

From 8d6c4e6961c4f18365fb31c9fd746b27ad4f5869 Mon Sep 17 00:00:00 2001
From: Ralf Jung <post@ralfj.de>
Date: Sun, 13 Aug 2023 21:21:16 +0200
Subject: [PATCH 5/7] BytesMut no longer uses atomics; for smol there is a PR

---
 resources/deliberate-ub.md | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/resources/deliberate-ub.md b/resources/deliberate-ub.md
index a419990c..722cf96c 100644
--- a/resources/deliberate-ub.md
+++ b/resources/deliberate-ub.md
@@ -16,16 +16,17 @@ We should evaluate whether there truly is some use-case here that is not current
   The LLVM IR we generate here contains `noundef` so this UB is not just a specification artifact.<br>
   Furthermore, `compare_exchange` will compare padding bytes, which leads to UB.
   It is not clear how to best specify a useful `compare_exchange` that can work on padding bytes.
-* `bytes` does [a non-atomic plain load that races](https://github.com/tokio-rs/bytes/blob/dea868a4b0eec28877e9013702c0ae12dbc40c4b/src/bytes.rs#L2508),
+* `bytes` used to do [a non-atomic plain load that races](https://github.com/tokio-rs/bytes/blob/dea868a4b0eec28877e9013702c0ae12dbc40c4b/src/bytes.rs#L2508),
   because relaxed loads are claimed to cost too much performance (also see [this LLVM issue](https://github.com/llvm/llvm-project/issues/37064)).
   (Note that LLVM's handling of data races is not enough here, data races still return garbage data -- so this is UB in the generated LLVM IR. Also see [this thread](https://internals.rust-lang.org/t/unordered-as-a-solution-to-bit-wise-reasoning-for-atomic-accesses/11079) on using "unordered".)
   The best fix is probably to improve LLVM optimizations on relaxed loads, so that they are not considered too slow for this use-case.
   However, it is not clear if all the desired optimizations are even legal for relaxed loads -- there is a significant semantic difference between relaxed loads and non-atomic loads, so seeing *some* performance difference is expected.
   Code that works with concurrency can't just expect not to have to pay a cost for that.
+  And anyway current `bytes` does not seem to have any atomics in `ByteMut` any more, so possibly this is resolved.
 * `smol-rs/event-listener` needs an SC fence, but on x86 they prefer a `lock cmpxchg` over an `mfence`.
   To achieve this, they do an [SC `compare_exchange` on a local variable](https://github.com/smol-rs/event-listener/blob/0ea464102e74219aab2932f9eff14418a13268d4/src/notify.rs#L574-L577).
   That is UB (both in Rust and LLVM) since such an operation has no synchronization effect; the compiler could easily see that this variable is never accessed from outside this function and hence optimize it away entirely.
-  The best fix is probably an inline assembly block.
+  The best fix is probably an [inline assembly block](https://github.com/smol-rs/event-listener/pull/71).
 
 ### Cases related to aliasing
 

From 00fd712b172e193d1af1e2d30f0c29b97159dc15 Mon Sep 17 00:00:00 2001
From: Ralf Jung <post@ralfj.de>
Date: Sun, 13 Aug 2023 22:09:42 +0200
Subject: [PATCH 6/7] add some general Stacked Borrows concerns

---
 resources/deliberate-ub.md | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/resources/deliberate-ub.md b/resources/deliberate-ub.md
index 722cf96c..11f8739a 100644
--- a/resources/deliberate-ub.md
+++ b/resources/deliberate-ub.md
@@ -35,6 +35,12 @@ We should evaluate whether there truly is some use-case here that is not current
   This could be fixed by [`MaybeDangling`](https://github.com/rust-lang/rfcs/pull/3336).
 * The entire `async fn` ecosystem and every hand-implemented self-referential generator or future is unsound since the self-reference aliases the `&mut` reference to the full generator/future.
   This is currently hackfixed by making `Unpin` meaningful for UB; a proper solution would be to add something like [`UnsafeAliased`](https://github.com/rust-lang/rfcs/pull/3467).
+* Stacked Borrows forbids a bunch of things that might be considered too restrictive (and that go well beyond LLVM `noalias`):
+  strict subobject provenance [rules out the `&Header` pattern](https://github.com/rust-lang/unsafe-code-guidelines/issues/256) and also affects [raw pointers derived from references](https://github.com/rust-lang/unsafe-code-guidelines/issues/134);
+  eager assertion of uniquess makes [even read-only functions such as `as_mut_ptr` dangerous when they take `&mut`](https://github.com/rust-lang/unsafe-code-guidelines/issues/133);
+  `&UnsafeCell` surprisingly [requires read-write memory even when it is never written](https://github.com/rust-lang/unsafe-code-guidelines/issues/303).
+  There is a bunch of code out there that violates these rules one way or another.
+  All of these are resolved by [Tree Borrows](https://perso.crans.org/vanille/treebor/), though [some subtleties around `as_mut_ptr` do remain](https://github.com/rust-lang/unsafe-code-guidelines/issues/450).
 
 ## Former cases of deliberate UB that have at least a work-in-progress solution to them
 

From 2782bcaead0a8796982db0401538adcc5ab8729f Mon Sep 17 00:00:00 2001
From: Ralf Jung <post@ralfj.de>
Date: Sat, 19 Aug 2023 23:12:02 +0200
Subject: [PATCH 7/7] remove resolve cases of deliberate UB: bytes,
 smol-rs/event-listener

---
 resources/deliberate-ub.md | 22 ++++++----------------
 1 file changed, 6 insertions(+), 16 deletions(-)

diff --git a/resources/deliberate-ub.md b/resources/deliberate-ub.md
index 11f8739a..1b53928f 100644
--- a/resources/deliberate-ub.md
+++ b/resources/deliberate-ub.md
@@ -5,28 +5,18 @@ We should evaluate whether there truly is some use-case here that is not current
 
 ### Cases related to concurrency
 
-* crossbeam's `AtomicCell` implements an SeqLock, which [is well-known to not be compatible with the C++ memory model](http://www.hpl.hp.com/techreports/2012/HPL-2012-68.pdf).
+* crossbeam's `AtomicCell` "fallback path" uses a SeqLock, which [is well-known to not be compatible with the C++ memory model](http://www.hpl.hp.com/techreports/2012/HPL-2012-68.pdf).
   Specifically the problem is [this non-atomic volatile read](https://github.com/crossbeam-rs/crossbeam/blob/5d07fe43540d7f21517a51813acd9332744e90cb/crossbeam-utils/src/atomic/atomic_cell.rs#L980) which can cause data races and hence UB.
   This would be fine if we either (a) adopted LLVM's handling of memory races (then the problematic read would merely return `undef` instead of UB due to a data race), or (b) added [bytewise atomic memcpy](https://github.com/rust-lang/rfcs/pull/3301) and used that instead of the non-atomic volatile load.
   This is currently *not* UB in the LLVM IR we generate, due to LLVM's different handling of read-write races and because the code carefully uses `MaybeUninit`.
-* crossbeam's `AtomicCell` also uses the standard library `Atomic*` types to do atomic reads and writes of [*any type* that has the right size](https://github.com/crossbeam-rs/crossbeam/blob/master/crossbeam-utils/src/atomic/atomic_cell.rs#L928-L932).
-  However, doing an `AtomicU32` atomic read on a `(u16, u8)` is unsound because the padding byte can be uninitialized.
-  (Also, pointer provenance is lost.)
+* crossbeam's `AtomicCell` "fast path" uses the standard library `Atomic*` types to do atomic reads and writes of [*any type* that has the right size](https://github.com/crossbeam-rs/crossbeam/blob/master/crossbeam-utils/src/atomic/atomic_cell.rs#L928-L932).
+  However, doing an `AtomicU32` read (returning a `u32`) on a `(u16, u8)` is unsound because the padding byte can be uninitialized.
+  (Also, pointer provenance is lost, so `AtomicCell<*const T>` does not behave the way people expect.)
   To fix this we need to be able to perform atomic loads at type `MaybeUninit<u32>`.
   The LLVM IR we generate here contains `noundef` so this UB is not just a specification artifact.<br>
   Furthermore, `compare_exchange` will compare padding bytes, which leads to UB.
-  It is not clear how to best specify a useful `compare_exchange` that can work on padding bytes.
-* `bytes` used to do [a non-atomic plain load that races](https://github.com/tokio-rs/bytes/blob/dea868a4b0eec28877e9013702c0ae12dbc40c4b/src/bytes.rs#L2508),
-  because relaxed loads are claimed to cost too much performance (also see [this LLVM issue](https://github.com/llvm/llvm-project/issues/37064)).
-  (Note that LLVM's handling of data races is not enough here, data races still return garbage data -- so this is UB in the generated LLVM IR. Also see [this thread](https://internals.rust-lang.org/t/unordered-as-a-solution-to-bit-wise-reasoning-for-atomic-accesses/11079) on using "unordered".)
-  The best fix is probably to improve LLVM optimizations on relaxed loads, so that they are not considered too slow for this use-case.
-  However, it is not clear if all the desired optimizations are even legal for relaxed loads -- there is a significant semantic difference between relaxed loads and non-atomic loads, so seeing *some* performance difference is expected.
-  Code that works with concurrency can't just expect not to have to pay a cost for that.
-  And anyway current `bytes` does not seem to have any atomics in `ByteMut` any more, so possibly this is resolved.
-* `smol-rs/event-listener` needs an SC fence, but on x86 they prefer a `lock cmpxchg` over an `mfence`.
-  To achieve this, they do an [SC `compare_exchange` on a local variable](https://github.com/smol-rs/event-listener/blob/0ea464102e74219aab2932f9eff14418a13268d4/src/notify.rs#L574-L577).
-  That is UB (both in Rust and LLVM) since such an operation has no synchronization effect; the compiler could easily see that this variable is never accessed from outside this function and hence optimize it away entirely.
-  The best fix is probably an [inline assembly block](https://github.com/smol-rs/event-listener/pull/71).
+  It is not clear how to best specify a useful `compare_exchange` that can work on padding bytes,
+  see the [discussion here](https://github.com/rust-lang/unsafe-code-guidelines/issues/449).
 
 ### Cases related to aliasing