Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Edition 2024: don't special-case diverging blocks as much #123590

Closed

Conversation

WaffleLapkin
Copy link
Member

@WaffleLapkin WaffleLapkin commented Apr 7, 2024

Normally, when a block has no tail-expression its type is ():

let a /*: ()*/ = {};
let a /*: ()*/ = {
    statement(); // note the `;`!
};

However, this is not the case when the block is known to diverge, in which case its type becomes !1:

let a /*: !*/ = {
    return; // note the `;`!
};

let a /*: !*/ = {
    returns_never();
    blah(); // note the `;`!
}

I think that this is a useless special case and unnecessary complicates the language. I propose that we remove it in the next edition.

Note that you can always fix the code that used this special case by removing a ; (and removing dead-code, if there is any). We already have a machine-applicable fix (as can be seen in the test added in this PR).

// from the above example, changes you'd need to do in the next edition

let a /*: !*/ = {
    return // no `;`!
};

let a /*: !*/ = {
    returns_never() // no `;`!
                    // removed `blah();` which was dead code
}

Also note that while this is related to the never type, this change is independent of the work on its stabilization. I personally think that if we are going to stabilize !, we should make it less weird and this is one of the ways to do that; but, this is not required for ! stabilization and is a completely separate cleanup.

The only hard part of this change is that rustfmt currently adds ; to returns which are at the end of blocks. We'll have to change rustfmt style in this regard in order to support the next edition. Out options are

  1. Make rustfmt keep ;, but not add ; (this won't break any existing users)
  2. Make rustfmt add ; in the edition<=2021 and remove ; in the edition>=2024 (this will automatically fix code when porting to the new edition) (does rustfmt even support editions?...)
  3. Make rustfmt remove ; (this will break all existing formatting, I don't think that's desirable)

Tracking:

r? compiler-errors

Footnotes

  1. it then immediately decays to () due to current never type fallback, but it does not matter -- you can still specify any type and the block will be coerced to that

@rustbot rustbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. labels Apr 7, 2024
@WaffleLapkin WaffleLapkin added T-lang Relevant to the language team, which will review and decide on the PR/issue. needs-fcp This change is insta-stable, so needs a completed FCP to proceed. F-never_type `#![feature(never_type)]` A-rustfmt Area: Rustfmt I-lang-nominated Nominated for discussion during a lang team meeting. A-edition-2024 Area: The 2024 edition labels Apr 7, 2024
@traviscross traviscross added the T-style Relevant to the style team, which will review and decide on the PR/issue. label Apr 7, 2024
@scottmcm
Copy link
Member

I wonder about maybe special-casing keyword control flow here. Trying to push a "no, you can't write return; any more" might not be worth it, but let Some(x) = foo() else { bar(); }; working with the semicolon for a function that happens to return ! feels much more weird to me.

Given that we could change panic!'s expansion to have a return internally, for example, that might help resolve the "wait, that's not how types work" concerns with less impact on relatively normal code? I don't know that I want to insist on changing

let Some(x) = foo else {
    return;
};

everywhere, especially since we've been pushing people to that.

@nikomatsakis

This comment was marked as outdated.

@rfcbot

This comment was marked as outdated.

@rfcbot rfcbot added proposed-final-comment-period Proposed to merge/close by relevant subteam, see T-<team> label. Will enter FCP once signed off. disposition-merge This issue / PR is in PFCP or FCP with a disposition to merge it. labels Apr 10, 2024
@nikomatsakis

This comment was marked as outdated.

@rfcbot

This comment was marked as outdated.

@rfcbot rfcbot removed proposed-final-comment-period Proposed to merge/close by relevant subteam, see T-<team> label. Will enter FCP once signed off. disposition-merge This issue / PR is in PFCP or FCP with a disposition to merge it. labels Apr 10, 2024
@nikomatsakis

This comment was marked as outdated.

@rustbot rustbot removed the T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. label Apr 10, 2024
@nikomatsakis

This comment was marked as outdated.

@rfcbot

This comment was marked as outdated.

@rfcbot rfcbot added proposed-final-comment-period Proposed to merge/close by relevant subteam, see T-<team> label. Will enter FCP once signed off. disposition-merge This issue / PR is in PFCP or FCP with a disposition to merge it. labels Apr 10, 2024
@rfcbot rfcbot removed the proposed-final-comment-period Proposed to merge/close by relevant subteam, see T-<team> label. Will enter FCP once signed off. label May 14, 2024
@rfcbot
Copy link

rfcbot commented May 14, 2024

🔔 This is now entering its final comment period, as per the review above. 🔔

@traviscross traviscross removed I-lang-nominated Nominated for discussion during a lang team meeting. I-style-nominated Nominated for discussion during a style team meeting. labels May 15, 2024
@riking
Copy link

riking commented May 23, 2024

I think I want to be the procedural wet blanket here and say that this is moving too fast and has too many unknowns to plausibly land in 2024. I don't remember seeing it on the concept tracking boards, and this is ultimately a very large style change to Rust. I'd prefer if this was targeting e2027 instead to have more time to get the lints correct and to let people experiment with the new style on nightly.

@JakobDegen
Copy link
Contributor

Had someone reach out to me about a similar point to what @riking has brought up, and so I just want to clarify: The original FCP comment included this phrasing:

Immediate next step being merged here: land this on nightly so that we can begin experimenting and make sure there aren't cases we've overlooked where this rule is important.

I want to be sure that this means that what is being approved here is general agreement that something along these lines is desirable, and experimentation in nightly. However, there still needs to be a follow-up FCP in the future for T-Lang to actually commit to all the details of how this will be landed in its final form for the new edition.

I'm specifically not filing a concern here under the assumption that what I wrote above is correct. If it is not, then I do think we need to clarify the process of how the details will be worked out later as a blocker to landing this FCP.

@rfcbot rfcbot added finished-final-comment-period The final comment period is finished for this PR / Issue. to-announce Announce this issue on triage meeting and removed final-comment-period In the final comment period and will be merged soon unless new substantive objections are raised. labels May 24, 2024
@rfcbot
Copy link

rfcbot commented May 24, 2024

The final comment period, with a disposition to merge, as per the review above, is now complete.

As the automated representative of the governance process, I would like to thank the author for their work and everyone else who contributed.

This will be merged soon.

@theemathas
Copy link
Contributor

Here is an artificial use case that can no longer work (and has no fix) if the change in this PR goes through. I don't know if any real-world use-cases look like this, but I think that the existence of this potential use case is concerning.

macro_rules! my_match {
    ($result:expr, $ok_fn:expr, $err_fn:expr) => {
        match $result {
            Ok(x) => {
                $ok_fn(x);
            }
            Err(e) => {
                $err_fn(e);
            }
        }
    };
}

fn diverge<T>(_x: T) -> ! {
    panic!()
}

fn id<T>(x: T) -> T {
    x
}

fn different_type_arms(res: Result<i32, String>) {
    my_match!(res, id, id);
}

fn diverging_arms(res: Result<i32, String>) -> ! {
    my_match!(res, diverge, diverge);
}

The my_match!() macro applies one of two functions to a Result, depending on which variant it is. The return values of the functions are discarded, which means that the two functions can return different types without any issues. And due to the current special-casing, if both arms diverge, then the entire macro diverges. I believe that if the change in the PR were to go through, then creating a macro or function that does the above would be impossible.

Are there real world use cases that have this problem?

@WaffleLapkin
Copy link
Member Author

@theemathas I don't think this is a real problem, seems like a very niche use case to want to ignore all return values and diverge in different calls to the same macro. It's also trivial to change the macro to support specifically the diverging case...

@traviscross
Copy link
Contributor

traviscross commented May 25, 2024

On the procedural question of what lang agreed to here, including the context from our meetings, the final FCP represents agreement to Option 2:

  1. In Rust 2024, narrow the scope of the special casing from all diverging expressions to just those where return, break, or continue are used syntactically and no dead code (that doesn't itself diverge) follows. Do this after macro expansion at the time of type checking (so that, e.g. let _: ! = { panic!(); } also works).
    • This would promote local reasoning, making the behavior less weird.
    • It would take most of the common cases off the table for the moment.
    • We could at any later time warn about these cases also.
    • Then, if we so choose, we could then take the next step in a later edition of disallowing these.

Since Rust 2024 is still in nightly, landing this PR represents landing this in nightly (and with a presumption that, barring surprises, this will become part of stable Rust 2024). Landing this in nightly Rust 2024 will allow testing this to be part of the testing and validation process for Rust 2024.

Given that the FCP has completed, assuming this PR has been updated to be compliant with Option 2, it can be merged as far as lang is concerned.


Switching hats, and putting on the one for edition management, to facilitate edition testing, we would prefer to see the machine-applicable migration lints land at approximately the same time as this PR does.

@traviscross traviscross removed the S-blocked Status: Marked as blocked ❌ on something else such as an RFC or other implementation work. label May 25, 2024
@WaffleLapkin WaffleLapkin added the S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. label May 30, 2024
@WaffleLapkin WaffleLapkin changed the title Edition 2024: don't special-case diverging blocks Edition 2024: don't special-case diverging blocks as much May 30, 2024
@traviscross
Copy link
Contributor

@WaffleLapkin and I discussed the question of what set of macros in std/core should be updated to include return. Since this PR and FCP was focused on the language mechanism, the plan here is to land this PR with just these language changes, then to make a second PR with the macro adjustments. That PR will propose a set (or maybe the empty set) of macros to adjust, and that question will be nominated.

@traviscross
Copy link
Contributor

traviscross commented May 31, 2024

@rustbot labels +I-lang-nominated

We FCPed Option 2, which contained this language:

  1. ...Do this after macro expansion at the time of type checking (so that, e.g. let _: ! = { panic!(); } also works).

@workingjubilee has now identified that, while we can do this at the time of type checking, there's no way to change the expansion of panic!() such that let _: ! = { panic!(); } can work under this rule. This is due to const, e.g.:

macro_rules! my_panic {
    () => { return std::process::abort() };
}

const _: () = {
    my_panic!();
    //~^ ERROR return statement outside of function body
};

(Great catch.)

Let's renominate to discuss what we want to do. At a minimum, we would need to amend our Option 2 consensus to acknowledge that let _: ! = { panic!(); } cannot work.

@rustbot rustbot added the I-lang-nominated Nominated for discussion during a lang team meeting. label May 31, 2024
@WaffleLapkin
Copy link
Member Author

WaffleLapkin commented Jun 3, 2024

I should not be writing this at this hour, but I need to get it out of my system, so I can sleep soundly.

I do not think this proposal, as currently understood and accepted by T-lang, is worth doing. As such, I'm closing this PR, as I do not want to work on something I do not believe in. I would however, want to work on other things (as I describe later) related to the core idea underlying this proposal (given enough rest, as those are not "edition-2024" time sensitive and I do need rest).

Below are my reasoning why the "current T-lang proposal" is not worthwhile.

For full context here is the current rules for typing blocks (as far as I know these is not documented anywhere, neither rust reference nor rust book nor ferrocene specification mention them in full) (this ignores break 'block_label, because it behaves mostly like the tail expression and is largely irrelevant to the discussion at hand):

  • If a block has a tail expression, its type is the type of the tail expression.
  • If the block always diverges (loosely defined as "all possible branches have an expression of type ! somewhere in them"12) its type is !.
  • Otherwise its type is ().

Now, let's outline what the "current T-lang proposal" is34:

  • In Rust edition {next}, change the rule for typing blocks to the following:

    • If a block has a tail expression, its type is the type of the tail expression.
    • If the block's last statement is a semicolon-statement and its expression is return, continue, or break , its type is ! (additionally panic-like macros expand to a return expression, and can fulfill this rule).
    • Otherwise its type is ().

Note that "a block" does not necessarily imply a literal block expression. It can be a part of another expression or statement kind, such as an if (let-else is the other example that comes to mind):

// this currently compiles
let a: u8 = if true {
    return;
} else {
    return;
};

As a side-note: it is quite annoying to test what type does a block have, because just printing the return type of a closure wrapping the block does not work, since the current never type fallback behavior makes ! coerce to (), so the experiment does not show anything. Instead, the way to check is to assert that the type is something other than () (I tend to use u8 as one of the shortest stable type other than ()).

For comparison here is my original proposal:

  • In Rust edition {next}, change the rule for typing blocks to the following:

    • If a block has a tail expression, its type is the type of the tail expression.
    • Otherwise its type is ().

Next, let's see my interpretation of why T-lang decided on the "current T-lang proposal"5. T-lang generally agreed that these "always diverges" rules do not feel like "Rust", in other words they are not consistent with the rest of the language (as rust generally does not use control-flow analysis for type checking6) and that the current rules can be confusing and surprising. Especially T-lang did not like let-else examples with function calls:

// this currently compiles
let Some(id) = last.checked_add(1) else {
    exhausted();
};

However, there are two main concerns. First one is "too much churn", especially given that rustfmt currently adds a ; after return, which would be a semantic change with "my original proposal". Second one is that this would break "inline function at the end"7.

Based on these concerns (primarily the churn one) T-lang decided to go with a "middle ground" proposal (aka "current T-lang proposal").

As was identified by @workingjubilee and mentioned by @traviscross, the proposal is not actually implementable -- panic! can't expand to a return because it can be used in consts, where return can't be used8.

But ignoring that, at first glance "current T-lang proposal" achieves some of the simplification, makes the way for future improvements, all while not doing too much breakage. I want to argue that this is not the case.

Firstly, I would argue that "current T-lang proposal" does not actually do that much simplification. As you might notice by the description above, it is only a little bit "simpler", in that the reasoning is very local (you only need to check the last statement, rather than all of them, recursively), however it is still a mile behind "my original proposal". Moreover, I would say that it's less intuitive, as "current T-lang proposal" depends on a special-case (return/break/continue) rather than a more fundamental property ("always diverges").

Secondly, I do not think it makes way for the future improvements. The problem with "my original proposal" is that most of the breakage9 is return; and it's too much breakage. This problem does not go anywhere after accepting "current T-lang proposal". The difference in breakage is so big, that after fixing breakage from "current T-lang proposal", breakage from "my original proposal" stays mostly the same.

Thirdly, I want to remind the reader, that any breakage has an inherent cost. I.e. the "cost" of a breaking change has both a component depending on the amount of breakage (say f(x), where x is the amount10) and a constant component (say C). Where C represents the cost of implementing/maintaining/documenting/learning/keeping in mind the change. So breaking the same thing twice, can be worse than breaking it in a worse way once.

Lastly, I want to summarize this with a graph11:

A roughly-drawn by a computer graph. Horizontal axis is labeled 'how much breakage we need to get there', vertical axis is labeled 'how good this part of the language is, in abstruct'. The graph grows exponentially-ish (starts slow, but then shoots up). There are three marked points: red rhombus titled 'we are here' at the origin, blue square titled 'accepted proposal' a bit later and green circle titled 'ideal state' at quite a bit later (where later is bigger on both axis).

Yes, if we had a time machine, we would start the language in the green circle/"ideal state"/"my original proposal". Yes, blue square/"accepted proposal"/"current T-lang proposal" is technically better that the current state12.

But it does not mean that going from where we are to "current T-lang proposal" is a good thing. If anything it burdens everyone with the constant cost of a breaking change without enough justification for it, causing us to forever have at least 2 different "bad" semantics for blocks. That is, I do not think that it on its own gives us a noticeably better language to justify the breakage. It only makes sense in the context of future bigger breakage, which, as established before13, it doesn't help to do.

In conclusion: I do not think "current T-lang proposal" is a good direction for the language. It especially does not make sense for 2024 edition. If we want to get a more sane semantics for block typing, we need to find ways to get to the "ideal state" in one breaking change.


Now, if we don't do anything, nothing will ever change and in the {next+1} edition we'll have the same exact situation. This comment wouldn't be complete without things that we can do in my opinion. Here are some ideas of things that might help us change the status quo.

  1. Changing rustfmt default to preserve ; rather than add it on returns and similar expressions
    • I think this is a good idea even if we don't end up doing anything else -- adding ; seems like a semantic change, we shouldn't do it even if it's technically not a semantic change because of current edge cases
  2. Documenting the current behavior
    • At least in rust reference
    • But ideally also in other places
  3. An (allow-by-default) lint. This would allow two things
    • An ability for crates to say "I don't want to depend on the weird block semantics"
    • An ability for crates to port their code at their pace
  4. Better machine-applicable-fix support
    • What is currently implemented would cover most of the cases, but not some of the more edge-y ones (like having inline functions at the end or having dead code)
  5. An opt-in to change the block behavior independently of an edition
    • Not sure if it's actually a good thing, but...
    • It would allow people to use the sane semantics without requiring us to break everyone who wants to switch to a new edition
  6. Blog post(s) to notify people (at least the ones reading blog posts and places which will share information from them) that we are planning to do this in the first place

Additionally I want to highlight that not doing anything is also fine. Yes, the current state is weird and somewhat annoying, but it is fine. The semantics are not actively bad, they are merely a bit surprising (and even that more in "why this compiles" and not "how does the code behave").

Finally I want to apologize if this comment sounds too angry, harsh, verbose, or negative. I'm very exhausted from talking and arguing about this, from thinking about everything described here (but I'm not mad at anyone who's participated in this discussion, this is an outcome of my actions and the system/reality as a whole, rather that fault of anyone in particular). I'm ashamed of wasting everyone's time on this proposal and then retracting it (even though I understand that such is life -- the only way to learn is by doing, and doing necessarily implies failing at least sometimes). So I've tried being as verbose and clear as possible, so I can remove this topic from my working memory and focus on other things. Additionally, as I'm writing this very line, it's 6:57 in my timezone and I haven't slept yet, but I couldn't sleep with all of this on my mind.

With that, thanks everyone for reading this "comment on 'GitHub' social network" and participating with these proposals. [Loud "Close with comment" button click].

Footnotes

  1. "all possible branches" is an important distinction, there are cases where a block does not contain any semicolon-statements which wrap an expression of type !, but the block does have a type !: example

  2. The proper definition would define an "always diverges" property and then define rules of how it propagates for all expression and statement kinds

  3. I'm calling it "current" and "T-lang" to distinguish from previous or future proposals and from current ideas of other entities. This will of course become slightly ambiguous if T-lang changes its mind, but I did not figure out a better name

  4. I've removed the edition from this and other proposals, because it is not substance of the proposal. {next} signifies some future edition, which has not been published yet

  5. My interpretation is based on the two triage meetings where this was discussed, 2024-05-01 and, less importantly, 2024-05-08 triage meetings

  6. Although it is used for borrow checking (which is to be expected, lifetimes are much more closely related to control flow)

  7. There was a proposal to solve this by allowing items after the trailing expression which I would describe (pardon my language, not trying to insult anyone) completely bonkers. This is the opposite of trying to be less confusing, as this makes "tail expression" not visually tail of the block. I really struggle with trying to comprehend why this is a good change

  8. And I'm not counting either "make panic expand to return or not depending on if it's in a const" or "make panic_2024 which does not work in consts" as worthwhile workarounds to this. Additionally expanding to return feels icky, it calls for unexpected consequences

  9. I really want to say >95%, but I don't actually have the numbers, so this would be misleading

  10. I want to say that it should be linear, but is it?

  11. Small notes: the graph is normalized at the current state, i.e. it is "0", this does not mean that it's the worst possible thing. It's also very eye-bolly, obviously neither breakage nor goodness is an integer value, this just describes my feelings in a visual form

  12. Although this is not a definite thing, see my point above about "always diverges" being more more fundamental. To me these states feel about the same tbh. One is more surprising, while the other is more arbitrary, what a great choice

  13. As a reminder "my original proposal" requires so much more breakage, that the "current T-lang proposal" basically does not help at all. Changing 3 things in 2024 and then 20 things in 2027 is worse than changing 23 things in 2027 (numbers arbitrary)

@traviscross traviscross removed I-lang-nominated Nominated for discussion during a lang team meeting. to-announce Announce this issue on triage meeting S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. labels Jun 3, 2024
@nikomatsakis
Copy link
Contributor

nikomatsakis commented Jun 4, 2024 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-edition-2024 Area: The 2024 edition A-rustfmt Area: Rustfmt disposition-merge This issue / PR is in PFCP or FCP with a disposition to merge it. F-never_type `#![feature(never_type)]` finished-final-comment-period The final comment period is finished for this PR / Issue. needs-fcp This change is insta-stable, so needs a completed FCP to proceed. T-lang Relevant to the language team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging this pull request may close these issues.