Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC: Never patterns #3719

Open
wants to merge 1 commit into
base: master
Choose a base branch
from
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
250 changes: 250 additions & 0 deletions text/0000-never-patterns.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,250 @@
- Feature Name: `never_patterns`
- Start Date: 2024-10-27
- RFC PR: [rust-lang/rfcs#3719](https://github.com/rust-lang/rfcs/pull/3719)
- Rust Issue: [rust-lang/rust#118155](https://github.com/rust-lang/rust/issues/118155)

# Summary
[summary]: #summary

A `!` pattern indicates a type with no valid values. It is used to indicate an impossible case when matching an empty type in unsafe code.

```rust
enum Void {}
unsafe fn empty_tup<T>(tup: *const (T, Void)) -> ! {
unsafe {
match *tup { ! }
}
}
```

Note: this RFC is purely about a new pattern syntax. It does not propose changes to exhaustiveness checking or to the operational semantics of uninhabited types.


# Motivation
[motivation]: #motivation

Rust's unsafe semantics are access-based: the validity of data only matters when reading from/writing to a place. When pattern-matching, patterns only access the data (discriminants/values) they need to choose the right arm.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rust's unsafe semantics are actually mostly not access-based. Constructing an invalid value is UB.

Only behind raw pointers and inside union fields do we use access-based semantics. So the text here is quite misleading as-is.

Copy link

@golddranks golddranks Oct 28, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Niko's blog post from year 2018, linked in the RFC text, says that

While the details of our model around unsafe code are still being worked out (in part by this post!), there is a general consensus that we want an “access-based” model

...mentioning about &! being possibly a valid value. Just to clarify, I presume this is now stale information and having a type &! is considered UB? Also, the same article presents following code:

unsafe {
  let x: Uninit<(u32, T)> = Uninit { uninit: () };
  x.value.0 = 22; // initialize first part of the tuple
  ...
  match x.value {
    (v, _) => {
      // access only first part of the tuple
    }
  }
  ...
}

Is that accepted by modern standards? Seems a bit sketchy that the union field is already accessed in the match expression, so the fields are not matched against inside the match. By extension, one should be able to do the following, which, by modern standards, is surely UB, right? (To be sure, the RFC presents code that matches against the union fields inside the match.)

unsafe {
  let x: Uninit<(u32, T)> = Uninit { uninit: () };
  x.value.0 = 22; // initialize first part of the tuple
  ...
  let tmp = x.value; // move from x.value!
  match tmp {
    (v, _) => {
      // access only first part of the tuple
    }
  }
  ...
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A match expression works on places, not values. So the two examples you show are not equivalent! The first never turns x.value into a value, it only ever accesses its first field.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But values don't exist in memory, they only exist while reading or writing, right? And matches don't access compound values wholesale, they only inspect memory piecewise. In particular I can mess up some bytes of a place and still match on it if the match only inspects the non-messed-up bytes.

Is that not "access-based"? Am I incorrect in the above understanding?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In an access-based semantics, it is generally expected that e.g.

let b: bool = transmute(13u8);

is fine if we never "access" b. So I think it is misleading to call Rust's semantics access-based.

But what you say about places is correct.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. Is there a term of art for the particular flavor of "validity only matters when reading/writing" that rust has?

Otherwise I don't can do without this general statement. Would this be ok: "The semantics of pattern-matching around unsafe values are access-based: patterns only access the data (discriminants/values) they need to choose the right arm. The accessed data must be valid, but the rest may be invalid or uninitialized"?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not aware of a specific term. I would summarize it as "enforcing language invariants on every typed copy".

"The semantics of pattern-matching around unsafe values are access-based: patterns only access the data (discriminants/values) they need to choose the right arm. The accessed data must be valid, but the rest may be invalid or uninitialized"

"unsafe value" isn't a standard term. I've used "unsafe places" in the past but it is not standard either; the RFC should define it if it wants to use it.

I would say something like: Rust's pattern match semantics are access-based. Only the parts of the underlying value representation that actually matter for the pattern have to be valid.


Empty types are funny because the natural way of matching them is to write no arms at all:

```rust
fn empty_tup<T>(tup: (T, !)) -> ! {
match tup {}
}
```

Here the absence of an arm plays the role of a "read discriminant" kind of operation. This is fine in this case, but around unsafe code that interacts with possibly-uninitialized data, accesses should be explicitly visible in the pattern.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Here the absence of an arm plays the role of a "read discriminant" kind of operation. This is fine in this case, but around unsafe code that interacts with possibly-uninitialized data, accesses should be explicitly visible in the pattern.
Here the absence of an arm plays the role of a "read discriminant" kind of operation.
(If, instead, there was an arm with pattern `_`, then there would be no "read discriminant" operation.)
This is fine in this case, but around unsafe code that interacts with possibly-uninitialized data, accesses should be explicitly visible in the pattern.


Today, when matching empty types inside places that may contain uninitialized data, rust requires a dummy match arm:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Today, when matching empty types inside places that may contain uninitialized data, rust requires a dummy match arm:
Today, when matching empty types inside places that may contain uninitialized data, rust requires a match arm that explicitly reads the empty type:

```rust
enum Void {}
unsafe fn empty_tup<T>(tup: *const (T, Void)) -> ! {
unsafe {
match *tup {
_ => unreachable_unchecked(),
// or
(_, x) => match x {}
Comment on lines +44 to +46
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
_ => unreachable_unchecked(),
// or
(_, x) => match x {}
(_, x) => match x {}

The first example seems odd and orthogonal to the RFC.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe _ => unreachable!() is better? That's just what people commonly write in this case

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It may be what people commonly write, but the intent of this PR is better described by match x {} so IMO we should have only that.

}
}
}

union MyUnion<T: Copy> {
uninit: (),
value: (T, !),
}
impl<T: Copy> MyUnion<T> {
unsafe fn assume_init(self) -> ! {
match self {
MyUnion { value: (_, x) } => x,
}
}
}
```

This RFC proposes a new `!` pattern that works as an explicit "access data" or "assert validity" operation for uninhabited types. The examples above become:
Copy link

@ijackson ijackson Nov 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this line is the key point. It defines what the programmer means when they write !.

However, in safe code (eg, when the scrutinee is a safe reference), this fact - "this is actually sane" - is something that should always be true and shouldn't need to be explained by the programmer.

ISTM that this is also a pedagogical problem. If this is needed in safe Rust, auithors of that safe Rust will need an explanation what this thing is for and what it means, but we definitley don't want to burden them thoughts about validity/safety invariants etc. Nor do we want to suiggest to them in discussion of match that values might not be valid. But there is no way to explain this ! without supposing that &T might not reference a proper T. Authors of safe Rust are entitled to assume that &T always does reference a valid T.

Therefore, ! should not be required in safe Rust. What does that mean precisely? One possible answer:

  • ! is not required when dereferencing a safe reference, field, etc. (recursively)
  • Except, within an unsafe block, we emit a lint saying "something something uninhabited types validity something something requirement something"

Copy link
Member

@RalfJung RalfJung Nov 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we want to make syntax depend on whether something is syntactically inside or outside an unsafe block. That kind of context-dependence can become quite surprising.

So if we do anything like this, IMO it should be fully based on the type of the scrutinee place (including the types of its subexpressions, e.g. whether a union field projection is involved).

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But just to check, would you consider a possible path forward looking something like:

  • Never patterns match types with a certain property, independent of location (this property could be "safety-based uninhabitness").
  • When creating a match statement on a type with a different property, the compiler inserts a never pattern at the bottom of the match, independent of location (this property could be, e.g. "doesn't contain pointers or union field projections or partially uninhabited tuples" - roughly as per the auto-never blog post)).
  • We add a lint for when the auto-never might cause issues - which could be dependent on location?

I think such a lint being location-sensitive would be justifiable in that it doesn't change syntax, but just adds extra checks? Or at least Niko was happy with that in 2018.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW the path forward I intend to propose is simply to allow omitting arms for &! and similar types (more generally, if the type is safely uninhabited). My understanding is that making unsafe references is a massive footgun anyway (because of e.g. auto(de)ref).

Unsafe code authors that do that anyway can write never patterns for such cases (they will not be marked as unreachable), and we may add a lint that requires them if users find this useful.

I have moved away from explaining things in terms of "auto-never" out of personal preference, but you could explain this by saying that we always auto-never except inside pointers and unions.

Never patterns match types with a certain property, independent of location

This is indeed precisely the intent of this here RFC.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given ptr: *const &!, I don't think match *ptr {} should ever build. That should require match *ptr { ! }.

IOW, this should depend on whether the matched-on place is a safe place or not.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed, I include this in the exception that "we don't auto-never inside pointers". By "inside pointers" I mean in terms of places.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given ptr: *const &!, I don't think match *ptr {} should ever build. That should require match *ptr { ! }.

IOW, this should depend on whether the matched-on place is a safe place or not.

Okay, understood - so in your view auto-never behaviour (if/when we have it or an equivalent) would need to vary be place, and shouldn't just be a static property of the matched type.

But never patterns, as syntax, should be place invariant. This makes sense.

Copy link
Member Author

@Nadrieril Nadrieril Nov 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I may now stop discussion of auto-never and similar considerations: I'd like to keep the discussions on-topic of the never pattern syntax proposal, and discuss changes to which arms can be omitted etc to some other place. For example rust-lang/rust#131452.


```rust
enum Void {}
unsafe fn empty_tup<T>(tup: *const (T, Void)) -> ! {
unsafe {
match *tup { ! }
}
}

union MyUnion<T: Copy> {
uninit: (),
value: (T, !),
}
impl<T: Copy> MyUnion<T> {
unsafe fn assume_init(self) -> ! {
match self { MyUnion { value: (_, !) } }
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
match self { MyUnion { value: (_, !) } }
match self { MyUnion { value: ! } }

}
}
```

<!--

```rust
enum E { A(i32), B(i32, !) }
let result: *const E = some_function();
unsafe {
let x = match *result {
A(x) => x,
// An arm is required (https://github.com/rust-lang/unsafe-code-guidelines/issues/443) but `!` asserts validity of the `!` data so the arm doesn't need a body.
B(_, !),
};
// Alternatively:
let (A(x) | B(_, !)) = *result;
}
```

```rust
enum E { A(i32), B(i32, !) }
let val: *mut E = Box::leak(Box::new(E::A(42)));
// It may be possible to partially initialize this to the `B` variant
// in the future.
unsafe { set_discriminant_to_b(val) };
unsafe {
match *val {
A(x) => { ... }
// This branch would then be reached without UB.
B(..) => println!("Reachable!"),
}
}

// It may be possible to construct a `&!` pointing to
// uninitialized data (unsafe, but valid).
let never_ref: &! = ...;
let result: Result<u8, &!> = Err(never_ref);
match result {
Ok(x) => { ... }
// This branch would then be reached without UB.
Err(_) => println!("Reachable!"),
}
```

```rust
enum Void {}
let result: &Result<T, Void> = ...;
let x = match result {
Ok(x) => x,
Err(!), // No need for an arm body
};
// Or even
let (Ok(x) | Err(!)) = result;
```

-->


# Guide-level explanation
[guide-level-explanation]: #guide-level-explanation

Patterns can be used on a partially uninitialized place; the basic rule is that a pattern only accesses data that is directly mentioned in the pattern: an `Ok(..)` pattern requires a discriminant read, a binding requires a full read, a wildcard `_` accesses nothing etc.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Patterns can be used on a partially uninitialized place; the basic rule is that a pattern only accesses data that is directly mentioned in the pattern: an `Ok(..)` pattern requires a discriminant read, a binding requires a full read, a wildcard `_` accesses nothing etc.
Patterns can be used on a partially uninitialized place; the basic rule is that a pattern only accesses data that is directly mentioned in the pattern: an `Ok(..)` pattern requires a discriminant read, a by-value binding requires a full read, a wildcard `_` accesses nothing etc.

There are also ref bindings, after all...


For uninhabited types (types with no valid values), normal patterns cannot be used to indicate an access since there is no data to be accessed. For this purpose, you can use the special `!` pattern.

The `!` pattern is allowed on any uninhabited type (such as `enum Void {}` or `(Result<!, !>, u32)`) and does an access to the underlying data. Since there can be no such data (on pain of UB), the corresponding arm is known to be unreachable and does not need a body:
```rust
enum Void {}
unsafe fn empty_tup<T>(tup: *const (T, Void)) -> ! {
unsafe {
match *tup { ! }
}
}
```


# Reference-level explanation
[reference-level-explanation]: #reference-level-explanation

We add `!` to the syntax of patterns. A `!` pattern is accepted for any type that is visibly uninhabited (i.e. uninhabited taking into account private fields and `#[non_exhaustive]` annotations) from the item containing the match expression.
Copy link
Member

@RalfJung RalfJung Oct 28, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are many possible definitions of "uninhabited" so it'd be good to spell this out. E.g., is &! uninhabited?

What I would expect is that ! is allowed on all types whose library invariant ("safety invariant") cannot be satisfied (it is an explicit operation after all), so we can treat &! as uninhabited.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh yeah, I had not considered the safety version of uninhabitedness. This does feel like the appropriate notion to use, I'll change that

Copy link

@dhedey dhedey Nov 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to clarify, if we use this wider definition of "uninhabited" for when the ! pattern can apply (rather than a definition such as that of "empty" in rust-lang/rust#119612 (comment)), does this mean the following would be acceptable?

Such syntax was used in this comment from Ralf so I assume this might be the intention?

fn abc(value: Option<&!>) {
   match value {
        None => {},
        !
   }
}

I rather like this, but just to flag that it is a change of behaviour from:

  • The current never_patterns feature which would require a more explicit Some(!) pattern (playground link)
  • The syntax in the auto-never blog post which I believe would have required a Some(&!) pattern to draw attention that it accesses the &! (and so that it asserts validity - if we replace references with pointers)

Copy link

@dhedey dhedey Nov 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

More generally, would the ! pattern be able to match multiple uninhabited types in a wildcard manner?

To create an explicit example, would the following compile, on the justification that:

  • MultiNever::A(&MultiNever::B(Empty)), MultiNever::A(&MultiNever::C(Empty)), MultiNever::B(Empty) and MultiNever::C(Empty) are all uninhabited based on the above rules.
  • We may allow ! to wildcard over all uninhabited types.
enum Empty {}
enum MultiNever<T> {
   A(T),
   B(Empty),
   C(Empty),
}
fn abc(value: MultiNever<&MultiNever<u32>>) -> u32 {
    match value {
        MultiNever::A(MultiNever::A(value)) => *value,
        !
    }
}

If this does compile, I'm not sure if this necessarily fits with the motivation that "accesses should be explicitly visible in the pattern." - but possibly I've got the wrong end of the stick?

Copy link

@dhedey dhedey Nov 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just revisiting my musing in the last comment, I missed that the motivation in the RFC actually says more fully that "around unsafe code that interacts with possibly-uninitialized data, accesses should be explicitly visible in the pattern" - which seems right to me. The motivation only cares about accesses which could cause UB being explicitly visible in the pattern, or slightly more loosely that "accesses which could cause UB without breaking a safety invariant should be explicit".

And so Ralf's justification is that, references have a safety invariant to point at valid data, and as such have a recursively-safe safety invariant. Therefore it should be okay to effectively view a never pattern as:

A never pattern acts as an explicit compile-time assertion that all unmatched types have an unsatisfiable safety invariant; and also as an explicit compile-time assertion that all those types are invalid at this position.

By contrast, e.g. pointers or union fields do not have such a safety invariant, so would not satisfy such a recursive safety-based definition of uninhabitedness, and would need to be handled explicitly.

Copy link

@dhedey dhedey Nov 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I prefer that the user write Some(!) explicitly because I find it more explicit and easier to specify/implement. This is open for discussion of course, that's the aim of the RFC.

Good point! Well, in the spirit of discussion, let me propose for debate the "wildcard !" or more precisely the "safety-uninhabitable binding" definition of the never pattern. Such a pattern would:

Apply against all unmatched branches (similarly to _ or a variable binding a), and, both:

  • Assert (on pain of a compile error) that the "leaf type" of each of these branches is safety-uninhabited. (Where by leaf type I'm being quite wooly, I mean the type you'd get if you re-interpret the leaf enum variant of the branch as a struct type)
  • Assert (on pain of UB) that all such branches are not reachable (i.e. all such leaf types are validity-uninhabited in this position)

If this is well defined (and, to be fair, that's a big if), it appears to me that it's a strictly more powerful extension to this proposal - in the sense that it is allowed in a superset of places.

I was also wondering it it were too powerful, but I think on balance that it's likely OK. A ! at the root of a nested match is a declaration that "all unspecified branches are not possible (on pain of UB)".

It has benefits that it can mostly solve the ergonomics issues in this issue without auto-never, by only requiring a user to add a ! to their matches. Macros could do this in a blanket fashion, and pair this with something like an #[allow(unneeded_never_pattern)].

Users in unsafe land could be more careful, and use ! more explicitly, in place of individual types.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So just to clarify, I believe that under this RFC, the first 3 compile, but the fourth one does not:

correct!

If this is well defined (and, to be fair, that's a big if), it appears to me that it's a strictly more powerful extension to this proposal - in the sense that it is allowed in a superset of places.

I believe this is well-defined, and agree that this would be allowed in strictly more places than the RFC currently proposes.

It has benefits that it can mostly solve the ergonomics issues in rust-lang/rust#131452 without auto-never

If that's the main motivation I'd much rather have auto-never, because I'd much rather avoid safe users needing to add ! patterns in some matches and not others for hard-to-figure-out reasons (namely the presence of & on the path to an empty type).

I tried to separate this RFC from auto-never discussions but maybe they're too entangled for that 🤔.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If that's the main motivation I'd much rather have auto-never, because I'd much rather avoid safe users needing to add ! patterns in some matches and not others for hard-to-figure-out reasons (namely the presence of & on the path to an empty type).

I agree that we need an auto-never equivalent... As you've said in the comment below, maybe we need to explore that first.

But I think a more flexible never pattern could theoretically be beneficial to people writing unsafe too. For example, in these quite hypothetical scenarios:

  • Maybe in their current circumstance they know that all branches that are unsafe are invalid, and want to avoid writing out lots of distinct branches. (e.g. they're using something like an unsafe Trees That Grow pattern, if such a thing would even make sense... in which case, they could use ! as a wildcard, like Ralf spelled out here. Although I'm not sure if that was intentional on Ralf's part, or accidental 🤷)
  • Or they know that some sub-enum should always be never, and want to assert that at compile time.

Whilst I'm naturally drawn to neat generalisations, I accept that the use case argument isn't super strong.

That said, (ignoring the implementation difficulty argument for now) I feel a good argument against a more general / flexible pattern should be motivated in why a restriction makes sense. For example:

  • Do we feel that allowing wildcard ! is a foot gun for a user to write the wrong thing? That they put ! as an easy fix because they don't understand the implications about it matching more variants than expected, or future new variants? (I'm not sure to what extent unsafe developers would benefit from such hand-holding)
  • Do we feel that allowing wildcard ! might be "less easy to understand" or less readable at code review time? Does a lone ! look weird or eat up too much strangeness budget?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, I didn't realize that we'd need more than just this RFC for that case.

My immediate gut feeling is that the conservative choice here is to accept less code. It's east to accept more code in the future, it's impossible to take back accepting wildcard ! if we add it now. So I'd be inclined to first require the ! pattern to be used in a position where the type is actually uninhabited, and possibly change the "automatic never pattern" rules for matches on entirely safe places, and then revisit whether we really need wildcard !.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that's a good argument. Start conservative, without precluding expanding in future if it's still useful 👍


A pattern that contains a `!` is called a _never pattern_. For or-patterns, all alternatives must be never patterns for the whole pattern to be too. For example `Ok(x) | Err(!)` is not a never pattern but `(_, !)` and `Ok((_, !)) | Err(!)` are.

A never pattern is accepted as an arm in a match expression, and that arm takes no body nor guards.

```rust
let x: *const Option<(u32, (u32, !))> = ...;
match *x {
None => { ... }
Some((_, !)),
}
```

A never pattern is also accepted in other places a pattern would be (`if let`, destructuring `let`, etc.), and makes code that corresponds to that branch unreachable.

```rust
enum Void {}
impl Void {
fn ex_falso<T>(!: Self) -> T {}
}
```

In terms of both semantics and exhaustiveness checking, a `!` pattern behaves like a binding. E.g. a `Some(!),` arm (absent match ergonomics) is semantically equivalent to `Some(x) => match x {}`. Indeed, they both indicate a load at a place of an uninhabited type, followed by an unreachable body.

Never patterns interact with match ergonomics as expected: a `!` pattern is allowed on e.g. `&(T, !)`.
Copy link
Member

@RalfJung RalfJung Oct 28, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would have expected &(T, !) to be an uninhabited type (it is library-invariant-uninhabited, after all), so this sentence is odd.

Copy link
Member Author

@Nadrieril Nadrieril Nov 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we do go with the "safety invariant" version of inhabitedness then indeed this sentence isn't useful anymore. Will update when I get a moment.


A never pattern may be linted as "unreachable" if there is no well-behaved execution that can reach it. For example, if it was guaranteed that reading the discriminant of `Result<T, !>` could never return `Err`, the `Err(!)` pattern would be linted as unreachable. Never patterns are not otherwise linted as unreachable even when removing them would be accepted.
Copy link

@dhedey dhedey Nov 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we view this as necessary? Under the psuedo-definition of ! in this comment, a never pattern against 0 types is trivially okay.

Some benefits of permitting an uneeded !:

  • It would be a win for macros (as per this comment), they could always include it without punishment.
  • It would make auto-never a trivial sugaring where we choose to accept it (just stick ! at the bottom of every match, which effectively operates as both a check that all remaining unmatched types have unsatisfiable safety invariants, and an assertion that they are all invalid)

A drawback could be that in this case, if we see !, we can't assume that there exists at least 1 leftover variant/type in the match with an unsatisfiable safety invariant. But I'm not sure to what extent that is actually useful information, worthy of linting for?

Copy link
Member Author

@Nadrieril Nadrieril Nov 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is purely a convenient lint for users and helps learnability, exactly like today's "unreachable arm" lint. Macros can (and often do) silence lints to get more flexibility, so there'd be no win here. Similarly, the auto-never desugaring you have in mind would work the same; we wouldn't emit a lint there since that would make no sense.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That makes sense.

My comment was mostly applying to a world where we had wildcard !... Unfortunately without wildcard !, it's not valid to place a ! next to other variants, so a macro which outputs a match &self { /* ... */ } still needs to handle/branch on there being 0 variants, to decide whether to insert a ! or not. This is rather unergonomic - particularly in declarative macros.

All that said, I think it's fine for this RFC to not cover that, and to need to wait for something equivalent to auto-never for this to be a "nice" story for macro writers.


# Drawbacks
[drawbacks]: #drawbacks

The main drawback is that this new pattern is unlike other patterns (e.g. it doesn't require an arm body) which adds some maintenance burden. It's otherwise a simple feature.
Copy link
Member

@kennytm kennytm Nov 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How bad would it be to this RFC if the "feature" of skipping the arm body be removed 🤷 Save some major changes to the syntax that messes with macros and syn etc.

Just type-check that the arm body always has type const ! to replace that "a never pattern is always unreachable" error.

match x {
    // note: using `const { unreachable!() }` instead of only `unreachable!()`
    // to tell the compile actually enforce that the arm is never invoked
    Some(!) => const { unreachable!() },
    None => "ok",
}

TBH it is very strange that only the match syntax has this omission feature. You can't do it in an if let for instance. Given how #![feature(never_patterns)] works today you need to write:

if let Some(!) = x {
    // you still need a body here! otherwise you get "E0308 mismatched types"!
    const { unreachable!() }
} else {
    "ok"
}

(I don't think the "empty block" proposal in #3719 (comment) works because an empty block does not indicate the block is actually unreachable. The ! expression is cute but I think it may conflict with unary-not !(false).)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The other syntax that has an equivalent is or-patterns: Ok(x) | Err(!) is a valid pattern because the never case "doesn't count" for the purpose of ensuring that all alternatives have the same binding. Otherwise you're right, match arms are special.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we don't want the "bodiless arm" syntax, I'd propose that we allow this:

match x {
    Some(!) => {}
    None => "ok",
}

this would consider the whole arm body to diverge for the purposes of type-checking, similar to how we typecheck => { return; () } or => { panic!(); () }. In fact I already implemented something like this for function arguments:

enum Void {}
fn foo(!: Void) -> u32 {} // typechecks because the body diverges

There's one case where the "bodiless arm" is particularly nice though: the empty match. Today match *foo {} is allowed for foo: *const !, which violates the intent that matching on empty types inside pointers should be explicit. My intent after this RFC is to deprecate this case and eventually force users to write match *foo { ! } instead. This is less nice if they have to write match *foo { ! => {} }.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The specialty of Ok(x) | Err(!) manifests only after parsing which you have built the AST which then you can check for "E0408 variable is not bound in all patterns". The impact is not at the same level as the proposed syntax match expr { pat(!) } that changes the grammar and also the resulting AST/HIR structure.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The ! expression is cute but it think it may conflict with unary-not !(false).

how so? if you're trying to have a ! expression, you can just say it can't be followed by a function call or other ambiguous tokens, if you really want to call it, you'd have to write (!)(false) (not that that would type check...)

Copy link
Member

@programmerjake programmerjake Nov 14, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for this rather obscure feature is not a place we should spend the weirdness budget on.

ok, since {} can be made to type check, I think using ! => {} as the match arm is good enough.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think the need for never-patterns is common enough to justify changing the match syntax just to save that => {} 😅

Fair. I was hoping we could allow it just for this special case but idk if that's worth it.

I suppose that means the if let example not working for {} is a bug?

Yep

Copy link

@dhedey dhedey Nov 14, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My slight concern with a syntax like ! => {} is that it reads to me like "types which match ! are valid in this position, please ignore them and return ()" rather than "types matching ! are unreachable/UB".

I haven't thought too much about this, but would something like ! => ! type check? (if we temporarily ignore a dependency on the never type).

And would it possible to reject alternative arms somehow at compile time? e.g. require ! => ! or ! => { ! }, but prevent e.g. ! => { x = x + 1; panic!("Side effect!"); } which I would assume also has a type of !?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dhedey => ! can certainly type-check but then we need to introduce the new "!" expression in additional to the pattern and type.

but prevent e.g. ! => { x = x + 1; panic!("Side effect!"); } which I would assume also has a type of !?

A non-empty block (or non-! value) should raise the unreachable_code lint like now:

// warning: unreachable statement
fn f(ptr: &mut u8, !: &!) {
//                 - any code following a never pattern is unreachable
    *ptr += 1;
//  ^^^^^^^^^^ unreachable statement
}

Copy link
Member

@RalfJung RalfJung Nov 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we have to have an arm, IMO => ! makes most sense. It does still look quite funny, though...

EDIT: Oh, and it was already said above that it causes some parser issues. Hm.



# Rationale and alternatives
[rationale-and-alternatives]: #rationale-and-alternatives

Status quo alternative: we could do nothing, and consider that writing `Err(_) => unreachable!()` or `Err(x) => match x {}` is good enough. The drawback is more noticeable for or-patterns, e.g. `let (Ok(x) | Err(!)) = ...;` would have to become a `let .. else` or a full `match`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the first time the RFC mentions that let (Ok(x) | Err(!)) = ... is even possible. If that is part of the RFC it should be spelled out above. I guess never patters are exempt from the usual rule that all variants of an or-pattern need to bind the same variables?


Alternatively, we could allow omitting all empty arms; this is what the unstable `exhaustive_patterns` feature does today. A match like
```rust
match ... {
Ok(_) => ...,
}
```
would implicitly contain a `Err(!)` arm, i.e. would trigger UB if the discriminant is `Err`. This was deemed a footgun and was the original motivation for this feature.


# Prior art
[prior-art]: #prior-art

This proposal is entirely based on @nikomatsakis and @RalfJung's original idea, presented in Niko's [blog post](https://smallcultfollowing.com/babysteps/blog/2018/08/13/never-patterns-exhaustive-matching-and-uninhabited-types-oh-my/).

The only other language I (Nadrieril) am aware of that has the intersection of features that would make this necessary is Zig. They are in the process of clarifying the semantics of their empty types (https://github.com/ziglang/zig/issues/15909); they may or may not end up encountering the same explicitness problem as us.

Ocaml and Adga have respectively [refutation cases](https://ocaml.org/manual/5.2/gadts-tutorial.html#s:gadt-refutation-cases) and [absurd patterns](https://agda.readthedocs.io/en/latest/language/function-definitions.html#absurd-patterns), both of which give a way to say "this pattern is impossible and I shouldn't have to write an arm for it". Their flavor of impossibility is related to types however, not runtime validity.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Their flavor of impossibility is related to types however, not runtime validity.

I don't understand this remark. Ours is also all about types?

Copy link
Member Author

@Nadrieril Nadrieril Nov 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm yes this is badly phrased. I'm trying to express something like "in their case, the absurd pattern is purely a type-system hint, whereas in our case changing Err(_) to Err(!) can change the runtime semantics".

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The key difference is that Adga doesn't have unsafe code and OCaml doesn't talk about the unsafe code it has.


Something that's suprisingly semantically (and syntactically!) close is Haskell's [bang patterns](https://ghc.gitlab.haskell.org/ghc/doc/users_guide/exts/strict.html#bang-patterns-informal). Haskell is a lazy language and a pattern only evaluates the underlying value if needed to decide which branch to take. A `!x` instead pattern forces the evaluation of the underlying value. Rust's patterns are a bit lazy too: they lazily assert validity of the data they access. In Haskell one must be careful to evaluate the right things in the right order else you may diverge; in unsafe Rust similar care is needed to avoid UB.

Never patterns could then be described as playing the role of both a bang pattern ("please evaluate this") and an absurd pattern ("this is an impossible case") at the same time.

From [the lazy patterns literature](https://dl.acm.org/doi/abs/10.1145/3408989) we can also take the distinction between a "redundant" pattern (can be removed without changing semantics), and an "inaccessible" pattern (will never execute but cannot be removed without changing semantics). E.g., in rust:
```rust
let ptr: *const ! = ...;
match *ptr {
// This causes a read of the value, hence UB if reached. Removing it would remove UB. Therefore this is inaccessible and not redundant.
_x => {}
_ => {}
}
```
Never patterns are our way to notate inaccessible patterns.


# Unresolved questions
[unresolved-questions]: #unresolved-questions

- Does this cause problems with macros-by-example?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. It also causes problems for derive macros. As I write,

[It] sets macro authors up for failure. Every macro-generated match statement needs to have a spurious ! arm, or the macro will fail to compile, sometimes, with an empty enum.

This isn't just a problem for macro_rules macros. Derive macros can suffer from it too. (Users of derive-deftly or other approaches for making derive macros more easily are quite likely to run into this.)

Copy link
Member Author

@Nadrieril Nadrieril Nov 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are two separate issues here: rust-lang/rust#131452 is entirely independent from this RFC, since this RFC is purely about adding a new pattern syntax (and not about changing anything else in the language). The other issue with macros is that a ! pattern doesn't take an arm; this is the point that's relevant to this RFC. That one I don't know how bad it is.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The other issue with macros is that a ! pattern doesn't take an arm; this is the point that's relevant to this RFC. That one I don't know how bad it is.

I think it would be a breaking change for libraries like syn, in the sense that all procedural macro crates would need to update to the latest syn with a new AST to handle match patterns without arms - and this may require a major version bump of syn...

Given that syn is only on major version 2, it looks like such changes aren't very frequent - but maybe this isn't that much of a problem in practice.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a good point, they'd have to majorly change their Arm struct :/

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is already causing problems like:

fn h(a: Option<&!>) -> bool {
    matches!(a, Some(!))
    // error: a never pattern is always unreachable
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

syn has to have some way of dealing with the Rust syntax evolving, I don't think its API choices should dictate our grammar.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

matches!(a, Some(!)) is kind of silly so arguably we should at the very least lint against it. Why not just write false instead?

- Should the MIR-lowering of a never pattern actually include a place read or is a branch to `Unreachable` good enough?
- Out of scope: opsem and exhaustiveness decisions.


# Future possibilities
[future-possibilities]: #future-possibilities

This is a pretty self-contained feature.

This RFC is part of the [Patterns of Empty Types](https://rust-lang.github.io/rust-project-goals/2024h2/Patterns-of-empty-types.html) project goal. In parallel with this, I plan to propose that we allow omitting empty arms behind references (e.g. `let Ok(_) = expr;` should be allowed for `expr: Result<T, &!>`). That way we'd get: for safe places (places that can be accessed in safe code), you can omit an empty arm; for unsafe places, you must write an arm and never patterns make this convenient.

This feature is closely related to the [`never_type`](https://github.com/rust-lang/rust/issues/35121) initiative, which aims to provide `!` as the default empty type in the language. These initiatives are independent and complement each other.