Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

UnsafePinned: allow aliasing of pinned mutable references #3467

Merged
merged 35 commits into from
Jun 18, 2024
Merged
Changes from 1 commit
Commits
Show all changes
35 commits
Select commit Hold shift + click to select a range
51bed0f
UnsafeAliased: port text from hackmd
RalfJung Aug 1, 2023
d599c1e
typos and nits
RalfJung Aug 1, 2023
c2bbec7
polyfill; Send/Sync question
RalfJung Aug 1, 2023
60ca0fa
remove some wrong cells
RalfJung Aug 2, 2023
6910494
detail example
RalfJung Aug 4, 2023
a1a05fb
Unpin hack migration plan
RalfJung Aug 4, 2023
a10d70c
remove stable feature gate
RalfJung Aug 5, 2023
9535a37
add alternative: drop aliasing rules for mutable references
RalfJung Aug 7, 2023
06b52df
link to real-world intrusive linked list example
RalfJung Aug 9, 2023
0369352
add get_mut_pinned method
RalfJung Aug 10, 2023
3e97a7e
fix typo
RalfJung Aug 15, 2023
a8f2516
add !Unpin impl to UnsafeAliased
RalfJung Aug 15, 2023
a55adde
add some links to prior discussions
RalfJung Sep 13, 2023
5f03b25
rename UnsafeAliased → UnsafePinned
RalfJung Nov 4, 2023
ce937b8
explain the naming choice
RalfJung Nov 4, 2023
60c255f
update UnsafeUnpin trait notes
RalfJung Nov 4, 2023
9881c94
explain why this is so much more awkward than UnsafeCell
RalfJung Nov 11, 2023
36b694d
be more clear about the soundness issues around '&mut UnsafePinned'
RalfJung Nov 25, 2023
7158ce2
also explain a non-alternative
RalfJung Nov 25, 2023
5ff0df8
fix a typo: mutation -> aliasing
RalfJung Nov 26, 2023
086412d
we should block niches
RalfJung Nov 26, 2023
7190a78
drawback: losing too much noalias
RalfJung Nov 28, 2023
d793712
elaborate on `Unpin + !UnsafeUnpin`
RalfJung Nov 28, 2023
9327537
fix typo
RalfJung Dec 22, 2023
86e0188
add example for why this is tied to pinning
RalfJung Feb 10, 2024
14b9019
update UnsafePinned docs regarding public API exposure
RalfJung Feb 11, 2024
5b2d690
Send and Sync
RalfJung Mar 10, 2024
1f31c7b
fix code example
RalfJung May 3, 2024
76a4035
make UnsafePinned derive Copy, Send, Sync
RalfJung May 3, 2024
bfb8c4b
expand back-compat note
RalfJung May 3, 2024
e98f367
add fixed code example
RalfJung May 29, 2024
474389f
add open question around naming
RalfJung May 29, 2024
3efb695
update filename and RFC number
RalfJung Jun 17, 2024
519aeb6
Clean up trailing whitespace
traviscross Jun 18, 2024
b950de5
Add tracking issue for RFC 3467
traviscross Jun 18, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
337 changes: 337 additions & 0 deletions text/0000-unsafe-aliased.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,337 @@
# `unsafe_aliased`

- Feature Name: `unsafe_aliased`
- Start Date: 2022-11-05
- RFC PR: [rust-lang/rfcs#0000](https://github.com/rust-lang/rfcs/pull/0000)
- Rust Issue: [rust-lang/rust#0000](https://github.com/rust-lang/rust/issues/0000)

# Summary
[summary]: #summary

Add a type `UnsafeAliased` that acts on `&mut` in a similar way to how `UnsafeCell` acts on `&`: it opts-out of the aliasing guarantees.
However, `&mut UnsafeAliased` can still be `mem::swap`ed, so this is not a free ticket for arbitrary aliasing of mutable references.
Abstractions built on top of `UnsafeAliased` must ensure proper encapsulation when handing such aliases references to clients (e.g. via pinning).

This type is then used in generator lowering, finally fixing [#63818](https://github.com/rust-lang/rust/issues/63818).

# Motivation
[motivation]: #motivation

Let's say you want to write a type with a self-referential pointer:

```rust
#![feature(pin_macro)]
#![feature(negative_impls)]
use std::ptr;
use std::pin::{pin, Pin};

pub struct S {
data: i32,
ptr_to_data: *mut i32,
}

impl !Unpin for S {}
RalfJung marked this conversation as resolved.
Show resolved Hide resolved

impl S {
pub fn new() -> Self {
S { data: 42, ptr_to_data: ptr::null_mut() }
}

pub fn get_data(self: Pin<&mut Self>) -> i32 {
// SAFETY: We're not moving anything.
let this = unsafe { Pin::get_unchecked_mut(self) };
if this.ptr_to_data.is_null() {
this.ptr_to_data = ptr::addr_of_mut!(this.data);
}
// SAFETY: if the pointer is non-null, then we are pinned and it points to the `data` field.
unsafe { this.ptr_to_data.read() }
}

pub fn set_data(self: Pin<&mut Self>, data: i32) {
// SAFETY: We're not moving anything.
let this = unsafe { Pin::get_unchecked_mut(self) };
if this.ptr_to_data.is_null() {
this.ptr_to_data = ptr::addr_of_mut!(this.data);
}
// SAFETY: if the pointer is non-null, then we are pinned and it points to the `data` field.
unsafe { this.ptr_to_data.write(data) }
}
}

fn main() {
let mut s = pin!(S::new());
s.as_mut().set_data(42);
println!("{}", s.as_mut().get_data());
}
```

This kind of code is implicitly generated by rustc all the time when an `async fn` has a local variable of reference type that is live across a yield point.
The problem is that this code has UB under our aliasing rules: the `&mut S` inside the `self` argument of `get_data` aliases with `ptr_to_data`!

This simple code only has UB under Stacked Borrows but not under the LLVM aliasing rules; more complex variants of this -- still in the realm of what `async fn` generates -- also have UB under the LLVM aliasing rules.

<details>

<summary>A more complex variant</summary>

```rust
#![feature(pin_macro)]
RalfJung marked this conversation as resolved.
Show resolved Hide resolved
#![feature(negative_impls)]
use std::ptr;
use std::pin::{pin, Pin};
use std::task::Poll;

pub struct S {
state: i32,
data: i32,
ptr_to_data: *mut i32,
}

impl !Unpin for S {}

impl S {
pub fn new() -> Self {
S { state: 0, data: 42, ptr_to_data: ptr::null_mut() }
RalfJung marked this conversation as resolved.
Show resolved Hide resolved
}

fn poll(self: Pin<&mut Self>) -> Poll<()> {
// SAFETY: We're not moving anything.
let this = unsafe { Pin::get_unchecked_mut(self) };
match this.state {
0 => {
// The first time, set up the pointer.
this.ptr_to_data = ptr::addr_of_mut!(this.data);
// Now yield.
this.state += 1;
Poll::Pending
}
1 => {
// After coming back from the yield, write to the pointer.
unsafe { this.ptr_to_data.write(42) };
// And read our local variable `data`.
// THIS IS UB! `this` is derived from the `noalias` pointer
// `self` but we did a write to `this.data` in the previous
// line when writing to `ptr_to_data`. The compiler is allowed
// to reorder this and the previous line and then the output
// would change.
println!("{}", this.data);
// Done!
Poll::Ready(())
}
_ => unreachable!(),
}
}
}

fn main() {
let mut s = pin!(S::new());
while let Poll::Pending = s.as_mut().poll() {}
}
```

</details>

<br>

Beyond self-referential types, a similar problem also comes up with intrusive linked lists: the nodes of such a list often live on the stack frames of the functions participating in the list, but also have incoming pointers from other list elements.
When a function takes a mutable reference to its stack-allocated node, that will alias the pointers from the neighboring elements.
`Pin` is used to ensure that the list elements don't just move elsewhere (which would invalidate those incoming pointers), but there still is the problem that an `&mut Node` is actually not a unique pointer due to these aliases -- so we need a way for the to opt-out of the aliasing rules.

<br>

The goal of this RFC is to offer a way of writing such self-referential types and intrusive collections without UB.
We don't want to change the rules for mutable references in general (that would also affect all the code that doesn't do anything self-referential), instad we want to be able to tell the compiler that this code is doing funky aliasing and that should be taken into account for optimizations.

# Guide-level explanation
[guide-level-explanation]: #guide-level-explanation

To write this code in a UB-free way, wrap the fields that are *targets* of self-referential pointers in an `UnsafeAliased`:

```rust
pub struct S {
data: UnsafeAliased<i32>, // <!---- here
ptr_to_data: *mut i32,
}
```

This lets the compiler know that mutable references to `data` might still have aliases, and so optimizations cannot assume that no aliases exist. That's entirely analogous to how `UnsafeCell` lets the compiler know that *shared* references to this field might undergo mutation.

However, what is *not* analogous is that `&mut S`, when handed to safe code *you do not control*, must still be unique pointers!
This is because of methods like `mem::swap` that can still assume that their two arguments are non-overlapping.
(`mem::swap` on two `&mut UnsafeAliased<i32>` may soundly assume that they do not alias.)
In other words, the safety invariant of `&mut S` still requires full ownership of the entire memory range `S` is stored at; for the duration that a function holds on to the borrow, nobody else may read and write this memory.
But again, this is a *safety invariant*; it only applies to safe code you do not control. You can write your own code handling `&mut S` and as long as that code is careful not to make use of this memory in the wrong way, potential aliasing is fine.
You can also hand out a `Pin<&mut S>` to external safe code; the `Pin` wrapper imposes exactly the restrictions needed to ensure that this remains sound (e.g., it prevents `mem::swap`).


# Reference-level explanation
[reference-level-explanation]: #reference-level-explanation

API sketch:
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should UnsafeAliased be !Unpin? This (negative) impl listed isn't in the sketch, but it must be for the earlier guide-level line about Pin<&mut S> soundness to be accurate.

By analogy to how the other autotraits are treated, I argue that yes, it should be not-Unpin. The user can always implement Unpin if desired.


```rust
/// The type `UnsafeAliased<T>` lets unsafe code violate
/// the rule that `&mut UnsafeAliased<T>` may never alias anything else.
///
/// However, it is still very risky to have two `&mut UnsafeAliased<T>` that alias
/// each other. For instance, passing those two references to `mem::swap`
/// would cause UB. The general advice is to still have a single mutable
/// reference, and then a bunch of raw pointers that alias it.
/// If you want to pass such a reference to external safe code,
/// you need to use techniques such as pinning (`Pin<&mut TypeWithUnsafeAliased>`)
/// which ensure that both the mutable reference and its aliases remain valid.
///
/// Further note that this does *not* lift the requirement that shared references
/// must be read-only! Use `UnsafeCell` for that.
#[lang = "unsafe_cell_mut"]
RalfJung marked this conversation as resolved.
Show resolved Hide resolved
#[repr(transparent)]
struct UnsafeAliased<T: ?Sized> {
value: T,
}

impl<T: ?Sized> !Send for UnsafeAliased<T> {}

impl<T> UnsafeCell<T> {
RalfJung marked this conversation as resolved.
Show resolved Hide resolved
/// Constructs a new instance of `UnsafeCell` which will wrap the specified
RalfJung marked this conversation as resolved.
Show resolved Hide resolved
/// value.
pub fn new(value: T) -> UnsafeAliased<T> {
UnsafeAliased { value }
}

pub fn into_inner(self) -> T {
self.value
}

/// Get read-write access to the contents of an `UnsafeAliased`.
pub fn get_mut(&mut self) -> *mut T {
RalfJung marked this conversation as resolved.
Show resolved Hide resolved
self as *mut _ as *mut T
}

/// Get read-only access to the contents of a shared `UnsafeAliased`.
/// Note that `&UnsafeAliased<T>` is read-only if `&T` is read-only.
/// This means that if there is mutation of the `T`, future reads from the
/// `*const T` returned here are UB!
///
/// ```rust
/// unsafe {
/// let mut x = UnsafeAliased::new(0);
/// let ref1 = &mut *addr_of_mut!(x);
/// let ref2 = &mut *addr_of_mut!(x);
/// let ptr = ref1.get(); // read-only pointer, assumes immutability
/// ref2.get_mut.write(1);
RalfJung marked this conversation as resolved.
Show resolved Hide resolved
/// ptr.read(); // UB!
/// }
/// ```
pub fn get(&self) -> *const T {
self as *const _ as *const T
}

pub fn raw_get_mut(this: *mut Self) -> *mut T {
this as *mut T
}

pub fn raw_get(this: *const Self) -> *const T {
this as *const T
}
}
```

The comment about aliasing `&mut` being "risky" refers to the fact that their safety invariant still asserts exclusive ownership.
This implies that `duplicate` in the following example, while not causing immediate UB, is still unsound:

```rust
pub struct S {
data: UnsafeAliased<i32>,
}

impl S {
fn new(x: i32) -> Self {
S { data: UnsafeAliased::new(x) }
}

fn duplicate<'a>(s: &'a mut S) -> (&'a mut S, &'a mut S) {
let s1 = unsafe { (&s).transmute_copy() };
let s2 = s;
(s1, s2)
}
}
```

The unsoundness is easily demonstrated by using safe code to cause UB:

```rust
let mut s = S::new(42);
let (s1, s2) = s.duplicate(); // no UB
mem::swap(s1, s2); // UB
```

We could even soundly make `get_mut` return an `&mut T`, given that the safety invariant is not affected by `UnsafeAliased`.
But that would probably not be useful and only cause confusion.

Reference diff:

```diff
* Breaking the [pointer aliasing rules]. `&mut T` and `&T` follow LLVM’s scoped
- [noalias] model, except if the `&T` contains an [`UnsafeCell<U>`].
+ [noalias] model, except if the `&T` contains an [`UnsafeCell<U>`] or
+ the `&mut T` contains an [`UnsafeAliased<U>`].
```

Async generator lowering changes:
- Fields that represent local variables whose address is taken across a yield point must be wrapped in `UnsafeAliased`.

Codegen and Miri changes:

- We have a `Unique` auto trait similar to `Freeze` that is implemented if the type does not contain any by-val `UnsafeAliased`.
- `noalias` on mutable references is only emitted for `Unique` types. (This replaces the current hack where it is only emitted for `Unpin` types.)
- Miri will do `SharedReadWrite` retagging inside `UnsafeAliased` similar to what it does inside `UnsafeCell` already. (This replaces the current `Unpin`-based hack.)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's clear that &UnsafeAliased<T> carries a Shared tag, and that mutating through it is UB (assuming no UnsafeCell is present). What's not clear is how retagging impacts extant sibling write-capable provenance.

This very much gets into some undecided specifics to specify precisely, but some specifics are needed. Namely, if I do

let mut s = pin!(S::new());
s.as_mut().get_data(); // init ptr
black_box(s.as_ref()); // retag &S
s.as_mut().set_data(42); // write through ptr

is this guaranteed to be permitted? If it is, then IIUC, &UnsafeAliased<T> can protect the bytes such that writing to them would be UB, but must not retag them in such a way as to completely invalidate sibling write-capable provenance.

Copy link
Member Author

@RalfJung RalfJung Aug 3, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No idea where existing siblings matter in your example.

In SB, each as_mut creates a new SRW child of s. Each as_ref creates a new SRO child. Both of these actions preserve existing SRW and SRO children of s.

In TB, as_mut will just return the original provenance -- no retagging for !Unique mutable references. as_ref will create a read-only child that lives as long as there is no write.

Your example doesn't even exploit any of that though. It would be allowed even if creating a new SRW or SRO child first killed all previously created children, since you are creating new pointer with calls to as_mut/as_ref all the time. Did you mean to keep the first as_mut pointer around and use it again after the as_ref?


### Comparison with some other types that affect aliasing

- `UnsafeCell`: disables aliasing (and affects but does not fully disable dereferenceable) behind shared refs, i.e. `&UnsafeCell<T>` is special. `UnsafeCell<&T>` (by-val, fully owned) is not special at all and basically like `&T`; `&mut UnsafeCell<T>` is also not special.
- `UnsafeAliased`: disables aliasing (and affects but does not fully disable dereferenceable) behind mutable refs, i.e. `&mut UnsafeAliased<T>` is special. `UnsafeAliased<&mut T>` (by-val, fully owned) is not special at all and basically like `&mut T`; `&UnsafeAliased<T>` is also not special.
- [`MaybeDangling`](https://github.com/rust-lang/rfcs/pull/3336): disables aliasing and dereferencable *of all references (and boxes) directly inside it*, i.e. `MaybeDanling<&[mut] T>` is special. `&[mut] MaybeDangling<T>` is not special at all and basically like `&[mut] T`.
RalfJung marked this conversation as resolved.
Show resolved Hide resolved

# Drawbacks
[drawbacks]: #drawbacks

It's yet another wrapper type adjusting our aliasing rules and very easy to mix up with `UnsafeCell` or [`MaybeDangling`](https://github.com/rust-lang/rfcs/pull/3336).
Furthermore, it is an extremely subtle wrapper type, as the `duplicate` example shows.
`Unique` is a very strange trait, since it does not come with much of a safety requirement -- unlike `Freeze`, which quite clearly expresses some statement about the invariant of a type when shared, it is much less clear what the corresponding statement would be for `Unique`. (More work on RustBelt is needed to determine the details here.)

# Rationale and alternatives
[rationale-and-alternatives]: #rationale-and-alternatives

The proposal in this RFC matches [what was discussed with the lang team a long time ago](https://github.com/rust-lang/rust/issues/63818#issuecomment-526579977).

However, of course one could imagine alternatives:

- Keep the status quo. The current sitaution is that we only make aliasing requirements on mutable references if the type they point to is `Unpin`. This is unsatisfying: `Unpin` was never meant to have this job. A consequence is that a stray `impl Unpin` on a `Wrapper<T>`-style type can [lead to subtle miscompilations](https://internals.rust-lang.org/t/surprising-soundness-trouble-around-pollfn/17484/15) since it re-adds aliasing requirements for the inner `T`. Contrast this with the `UnsafeCell` situation, where it is not possible for (stable) code to just `impl Freeze for T` in the wrong way -- `UnsafeCell` is *always* recognized by the compiler.

On the other hand, `UnsafeAliased` is rather quirky in its behavior and having two marker traits (`Unique` and `Unpin`) might be too confusing, so sticking with `Unpin` might not be too bad in comparison.

If we do that, however, it seems preferrable to transition `Unpin` to an unsafe trait. There *is* a clear statement about the types' invariants associated with `Unpin`, so an `impl Unpin` already comes with a proof obligation. It just happens to be the case that in a module without unsafe, one can always arrange all the pieces such that the proof obligation is satisifed. This is mostly a coincidence and related to the fact that we don't have safe field projections on `Pin`. That said, solving this also requires solving the trouble around `Drop` and `Pin`, where effectively an `impl Drop` does an implicit `get_mut_unchecked`, i.e., implicitly assumes the type is `Unpin`.
- `UnsafeAliased` could affect aliasing guarantees *both* on mutable and shared references. This would avoid the currently rather subtle situation that arises when one of many aliasing `&mut UnsafeAliased<T>` is cast or coerced to `&UnsafeAliased<T>`: that is a read-only shared reference and all aliases must stop writing! It would make this type strictly more 'powerful' than `UnsafeCell` in the sense that replacing `UnsafeCell` by `UnsafeAliased` would always be correct. (Under the RFC as written, `UnsafeCell` and `UnsafeAliased` can be nested to remove aliasing requirements from both shared and mutable references.)

If we don't do this, we could consider removing `get` since since it seems too much like a foot-gun. But that makes shared references to `UnsafeAliased` fairly pointless. Shared references to generators/futures are basically useless so it is unclear what the potential use-cases here are.
- Instead of introducing a new type, we could say that `UnsafeCell` affects *both* shared and mutable references. That would lose some optimization potential on types like `&mut Cell<T>`, but would avoid the footgun of coercing an `&mut UnsafeAliased<T>` to `&UnsafeAliased<T>`. That said, so far the author is not aware of Miri detecting code that would run into this footgun (and Miri is [able to do detect such issues](https://play.rust-lang.org/?version=stable&mode=debug&edition=2021&gist=aab417b535f7dbd266fbfe470ea208c7)).


# Prior art
[prior-art]: #prior-art

This is somewhat like `UnsafeCell`, but for mutable instead of shared references.

# Unresolved questions
[unresolved-questions]: #unresolved-questions

- How do we transition code that relies on `Unpin` opting-out of aliasing guarantees, to the new type? Futures and generators just need a compiler patch, but there is probably other code that needs adjusting (e.g., Rust-for-Linux uses pinning to handle all sorts of self-referntial things in the Linux Kernel). Note that all such code is explicitly unsupported right now; the `Unpin` loophole has always explicitly been declared as temporary, unstable, and not something that we promise will actually work.
- The name of the type needs to be bikeshed. `UnsafeAliased` might be too close to `UnsafeCell`, but that is a deliberate choice to indicate that this type has an effect when it appears in the *pointee*, unlike types like `MaybeUninit` or [`MaybeDangling`](https://github.com/rust-lang/rfcs/pull/3336) that have an effect when wrapped around the *pointer*. Like `UnsafeCell`, the aliasing allowed here is "interior". Other possible names: `UnsafeSelfReferential`, `UnsafePinned`, ...
- Relatedly, in which module should this type live?
- Should this type `derive(Copy)`? `UnsafeCell` does not, which is unfortunate because it now means some people might use `T: Copy` as indication that there is no `UnsafeCell` in `T`.
- `Unpin` [also affects the `dereferenceable` attribute](https://github.com/rust-lang/rust/pull/106180), so the same would happen for this type. Is that something we want to guarantee, or do we hope to get back `dereferenceable` when better semantics for it materialize on the LLVM side?

# Future possibilities
[future-possibilities]: #future-possibilities

- Similar to how we might want the ability to project from `&UnsafeCell<Struct>` to `&UnsafeCell<Field>`, the same kind of projection could also be interesting for `UnsafeAliased`.