Skip to content

Subtree sync for rustc_codegen_cranelift #141557

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 96 commits into from
May 26, 2025

Conversation

bjorn3
Copy link
Member

@bjorn3 bjorn3 commented May 25, 2025

The main highlights this time are a Cranelift update and (thanks for beetrees) f16/f128 support.

r? @ghost

@rustbot label +A-codegen +A-cranelift +T-compiler

madsmtm and others added 30 commits March 25, 2025 21:53
…Lapkin

Rename `is_like_osx` to `is_like_darwin`

Replace `is_like_osx` with `is_like_darwin`, which more closely describes reality (OS X is the pre-2016 name for macOS, and is by now quite outdated; Darwin is the overall name for the OS underlying Apple's macOS, iOS, etc.).

``@rustbot`` label O-apple
r? compiler
- src\doc\nomicon\src\ffi.md should also have its ABI list updated
This will show a backtrace. Also added a reference to
rust-lang/rustc_codegen_cranelift#171 in the unimplemented intrinsic
error message.
…ntrinsic_error

Replace trap_unimplemented calls with codegen_panic_nounwind
In preparation for future unwinding support.

Part of rust-lang/rustc_codegen_cranelift#1567
Once writing the LSDA, it will need access to the Module to get a
reference to the personality function and to define a data object for
the LSDA.

Part of rust-lang/rustc_codegen_cranelift#1567
It bugs me when variables of type `Ident` are called `name`. It leads to
silly things like `name.name`. `Ident` variables should be called
`ident`, and `name` should be used for variables of type `Symbol`.

This commit improves things by by doing `s/name/ident/` on a bunch of
`Ident` variables. Not all of them, but a decent chunk.
Remove the use of Rayon iterators

This removes the use of Rayon iterators and the use of the `rustc-rayon` crate.  `rustc-rayon-core` is still used however.

In parallel loops, instead of a Rayon iterator a serial iterator are used to collect items into a `Vec` and we use a parallel loop over its elements using the new `par_slice` function which is built on `rustc-rayon-core`'s `join`.

This change makes it easier to bring `rustc-rayon-core` in-tree.

Tests using 7 threads:
<table><tr><td rowspan="2">Benchmark</td><td colspan="1"><b>Before</b></th><td colspan="2"><b>After</b></th><td colspan="1"><b>Before</b></th><td colspan="2"><b>After</b></th><td colspan="1"><b>Before</b></th><td colspan="2"><b>After</b></th></tr><tr><td align="right">Time</td><td align="right">Time</td><td align="right">%</th><td align="right">Physical Memory</td><td align="right">Physical Memory</td><td align="right">%</th><td align="right">Committed Memory</td><td align="right">Committed Memory</td><td align="right">%</th></tr><tr><td>🟣 <b>clap</b>:check</td><td align="right">0.4827s</td><td align="right">0.4828s</td><td align="right"> 0.02%</td><td align="right">201.23 MiB</td><td align="right">201.31 MiB</td><td align="right"> 0.04%</td><td align="right">279.03 MiB</td><td align="right">279.46 MiB</td><td align="right"> 0.15%</td></tr><tr><td>🟣 <b>hyper</b>:check</td><td align="right">0.1443s</td><td align="right">0.1401s</td><td align="right">💚  -2.91%</td><td align="right">126.42 MiB</td><td align="right">126.70 MiB</td><td align="right"> 0.22%</td><td align="right">199.79 MiB</td><td align="right">199.99 MiB</td><td align="right"> 0.10%</td></tr><tr><td>🟣 <b>regex</b>:check</td><td align="right">0.3252s</td><td align="right">0.3065s</td><td align="right">💚  -5.78%</td><td align="right">161.87 MiB</td><td align="right">161.78 MiB</td><td align="right"> -0.05%</td><td align="right">229.59 MiB</td><td align="right">230.23 MiB</td><td align="right"> 0.28%</td></tr><tr><td>🟣 <b>syn</b>:check</td><td align="right">0.5845s</td><td align="right">0.5876s</td><td align="right"> 0.53%</td><td align="right">197.01 MiB</td><td align="right">196.89 MiB</td><td align="right"> -0.06%</td><td align="right">267.62 MiB</td><td align="right">267.47 MiB</td><td align="right"> -0.06%</td></tr><tr><td>Total</td><td align="right">1.5367s</td><td align="right">1.5169s</td><td align="right">💚  -1.29%</td><td align="right">686.53 MiB</td><td align="right">686.68 MiB</td><td align="right"> 0.02%</td><td align="right">976.04 MiB</td><td align="right">977.14 MiB</td><td align="right"> 0.11%</td></tr><tr><td>Summary</td><td align="right">1.0000s</td><td align="right">0.9796s</td><td align="right">💚  -2.04%</td><td align="right">1 byte</td><td align="right">1.00 bytes</td><td align="right"> 0.04%</td><td align="right">1 byte</td><td align="right">1.00 bytes</td><td align="right"> 0.12%</td></tr></table>

<table><tr><td rowspan="2">Benchmark</td><td colspan="1"><b>Before</b></th><td colspan="2"><b>After</b></th><td colspan="1"><b>Before</b></th><td colspan="2"><b>After</b></th><td colspan="1"><b>Before</b></th><td colspan="2"><b>After</b></th></tr><tr><td align="right">Time</td><td align="right">Time</td><td align="right">%</th><td align="right">Physical Memory</td><td align="right">Physical Memory</td><td align="right">%</th><td align="right">Committed Memory</td><td align="right">Committed Memory</td><td align="right">%</th></tr><tr><td>🟠 <b>clap</b>:debug</td><td align="right">1.6371s</td><td align="right">1.6529s</td><td align="right"> 0.96%</td><td align="right">395.58 MiB</td><td align="right">396.21 MiB</td><td align="right"> 0.16%</td><td align="right">460.98 MiB</td><td align="right">461.52 MiB</td><td align="right"> 0.12%</td></tr><tr><td>🟠 <b>hyper</b>:debug</td><td align="right">0.3248s</td><td align="right">0.3210s</td><td align="right">💚  -1.16%</td><td align="right">155.16 MiB</td><td align="right">155.19 MiB</td><td align="right"> 0.02%</td><td align="right">219.21 MiB</td><td align="right">219.30 MiB</td><td align="right"> 0.04%</td></tr><tr><td>🟠 <b>regex</b>:debug</td><td align="right">1.0148s</td><td align="right">0.9929s</td><td align="right">💚  -2.16%</td><td align="right">297.96 MiB</td><td align="right">295.07 MiB</td><td align="right"> -0.97%</td><td align="right">354.53 MiB</td><td align="right">351.58 MiB</td><td align="right"> -0.83%</td></tr><tr><td>🟠 <b>syn</b>:debug</td><td align="right">1.3614s</td><td align="right">1.3717s</td><td align="right"> 0.76%</td><td align="right">319.10 MiB</td><td align="right">321.19 MiB</td><td align="right"> 0.65%</td><td align="right">378.90 MiB</td><td align="right">381.27 MiB</td><td align="right"> 0.62%</td></tr><tr><td>Total</td><td align="right">4.3381s</td><td align="right">4.3386s</td><td align="right"> 0.01%</td><td align="right">1.14 GiB</td><td align="right">1.14 GiB</td><td align="right"> -0.01%</td><td align="right">1.38 GiB</td><td align="right">1.38 GiB</td><td align="right"> 0.00%</td></tr><tr><td>Summary</td><td align="right">1.0000s</td><td align="right">0.9960s</td><td align="right"> -0.40%</td><td align="right">1 byte</td><td align="right">1.00 bytes</td><td align="right"> -0.03%</td><td align="right">1 byte</td><td align="right">1.00 bytes</td><td align="right"> -0.01%</td></tr></table>
Prepend temp files with per-invocation random string to avoid temp filename conflicts

rust-lang#139407 uncovered a very subtle unsoundness with incremental codegen, failing compilation sessions (due to assembler errors), and the "prefer hard linking over copying files" strategy we use in the compiler for file management.

Specifically, imagine we're building a single file 3 times, all with `-Csave-temps -Cincremental=...`. Let's call the object file we're building for the codegen unit for `main` "`XXX.o`" just for clarity since it's probably some gigantic hash name:

```
#[inline(never)]
#[cfg(any(rpass1, rpass3))]
fn a() -> i32 {
    0
}

#[cfg(any(cfail2))]
fn a() -> i32 {
    1
}

fn main() {
    evil::evil();
    assert_eq!(a(), 0);
}

mod evil {
    #[cfg(any(rpass1, rpass3))]
    pub fn evil() {
        unsafe {
            std::arch::asm!("/*  */");
        }
    }

    #[cfg(any(cfail2))]
    pub fn evil() {
        unsafe {
            std::arch::asm!("missing");
        }
    }
}
```

Session 1 (`rpass1`):
* Type-check, borrow-check, etc.
* Serialize the dep graph to the incremental working directory `.../s-...-working/`.
* Codegen object file to a temp file `XXX.rcgu.o` which is spit out in the cwd.
* Hard-link[^1] `XXX.rcgu.o` to the incremental working directory `.../s-...-working/XXX.o`.
* Save-temps option means we don't delete `XXX.rgcu.o`.
* Link the binary and stuff.
* Finalize[^2] the working incremental session by renaming `.../s-...-working` to ` s-...-asjkdhsjakd` (some other finalized incr comp session dir name).

Session 2 (`cfail2`):
* Load artifacts from the previous *finalized* incremental session, namely the dep graph.
* Type-check, borrow-check, etc. since the file has changed, so most dep graph nodes are red.
* Serialize the dep graph to the incremental working directory `.../s-...-working/`.
* Codegen object file to a temp file `XXX.rcgu.o`. **HERE IS THE PROBLEM**: The hard-link is still set up to point to the inode from `XXX.o` from the first session, so this also modifies the `XXX.o` in the previous finalized session directory.
* Codegen emits an error b/c `missing` is not an instruction, so we abort before finalizing the incremental session. Specifically, this means that the *previous* session is the last finalized session.

Session 3 (`rpass3`):
* Load artifacts from the previous *finalized* incremental session, namely the dep graph. NOTE that this is from session 1.
* All the dep graph nodes are green since we are basically replaying session 1.
* codegen object file `XXX.o`, which is detected as *reused* from session 1 since dep nodes were green. That means we **reuse** `XXX.o` which had been dirtied from session 2.
* Link the binary and stuff.

This results in a binary which reuses some of the build artifacts from session 2, but thinks it's from session 1.

At this point, I hope it's clear to see that the incremental results from session 1 were dirtied from session 2, but we reuse them as if session 1 was the previous (finalized) incremental session we ran. This is at best really buggy, and at worst **unsound**.

This isn't limited to `-C save-temps`, since there are other combinations of flags that may keep around temporary files (hard linked) in the working directory (like `-C debuginfo=1 -C split-debuginfo=unpacked` on darwin, for example).

---

This PR implements a fix which is to prepend temp filenames with a random string that is generated per invocation of rustc. This string is not *deterministic*, but temporary files are transient anyways, so I don't believe this is a problem.

That means that temp files are now something like... `{crate-name}.{cgu}.{invocation_temp}.rcgu.o`, where `{invocation_temp}` is the new temporary string we generate per invocation of rustc.

Fixes rust-lang#139407

[^1]: https://github.com/rust-lang/rust/blob/175dcc7773d65c1b1542c351392080f48c05799f/compiler/rustc_fs_util/src/lib.rs#L60
[^2]: https://github.com/rust-lang/rust/blob/175dcc7773d65c1b1542c351392080f48c05799f/compiler/rustc_incremental/src/persist/fs.rs#L1-L40
@rustbot rustbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. A-codegen Area: Code generation A-cranelift Things relevant to the [future] cranelift backend labels May 25, 2025
@rustbot
Copy link
Collaborator

rustbot commented May 25, 2025

⚠️ Warning ⚠️

@rustbot rustbot added has-merge-commits PR has merge commits, merge with caution. S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. labels May 25, 2025
@rustbot rustbot added A-tidy Area: The tidy tool T-bootstrap Relevant to the bootstrap subteam: Rust's build system (x.py and src/bootstrap) labels May 25, 2025
@rustbot
Copy link
Collaborator

rustbot commented May 25, 2025

The list of allowed third-party dependencies may have been modified! You must ensure that any new dependencies have compatible licenses before merging.

cc @davidtwco, @wesleywiser

@bjorn3
Copy link
Member Author

bjorn3 commented May 25, 2025

@bors r+ p=1 subtree sync

@bors
Copy link
Collaborator

bors commented May 25, 2025

📌 Commit 4aed799 has been approved by bjorn3

It is now in the queue for this repository.

@bors bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. labels May 25, 2025
@bors
Copy link
Collaborator

bors commented May 25, 2025

⌛ Testing commit 4aed799 with merge 9f8929f...

@bors
Copy link
Collaborator

bors commented May 26, 2025

☀️ Test successful - checks-actions
Approved by: bjorn3
Pushing 9f8929f to master...

@bors bors added the merged-by-bors This PR was explicitly merged by bors. label May 26, 2025
@bors bors merged commit 9f8929f into rust-lang:master May 26, 2025
7 checks passed
@rustbot rustbot added this to the 1.89.0 milestone May 26, 2025
Copy link

What is this? This is an experimental post-merge analysis report that shows differences in test outcomes between the merged PR and its parent PR.

Comparing 283db70 (parent) -> 9f8929f (this PR)

Test differences

No test diffs found

Test dashboard

Run

cargo run --manifest-path src/ci/citool/Cargo.toml -- \
    test-dashboard 9f8929fbeca4b5c2302b326606ae800156915840 --output-dir test-dashboard

And then open test-dashboard/index.html in your browser to see an overview of all executed tests.

Job duration changes

  1. aarch64-gnu-debug: 3950.7s -> 4935.0s (24.9%)
  2. dist-armv7-linux: 5313.3s -> 6550.8s (23.3%)
  3. x86_64-apple-1: 6342.2s -> 7805.2s (23.1%)
  4. aarch64-gnu: 6562.6s -> 8013.0s (22.1%)
  5. x86_64-apple-2: 4195.3s -> 4918.9s (17.2%)
  6. aarch64-apple: 4309.1s -> 3989.5s (-7.4%)
  7. x86_64-msvc-ext3: 7957.2s -> 7547.8s (-5.1%)
  8. dist-various-2: 3345.2s -> 3189.1s (-4.7%)
  9. x86_64-gnu: 6510.6s -> 6811.9s (4.6%)
  10. dist-loongarch64-musl: 5617.9s -> 5370.7s (-4.4%)
How to interpret the job duration changes?

Job durations can vary a lot, based on the actual runner instance
that executed the job, system noise, invalidated caches, etc. The table above is provided
mostly for t-infra members, for simpler debugging of potential CI slow-downs.

@rust-timer
Copy link
Collaborator

Finished benchmarking commit (9f8929f): comparison URL.

Overall result: no relevant changes - no action needed

@rustbot label: -perf-regression

Instruction count

This benchmark run did not return any relevant results for this metric.

Max RSS (memory usage)

Results (primary 9.5%, secondary 2.2%)

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
9.5% [9.1%, 9.8%] 2
Regressions ❌
(secondary)
2.2% [2.2%, 2.2%] 1
Improvements ✅
(primary)
- - 0
Improvements ✅
(secondary)
- - 0
All ❌✅ (primary) 9.5% [9.1%, 9.8%] 2

Cycles

This benchmark run did not return any relevant results for this metric.

Binary size

This benchmark run did not return any relevant results for this metric.

Bootstrap: 775.708s -> 776.208s (0.06%)
Artifact size: 366.25 MiB -> 366.34 MiB (0.03%)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-codegen Area: Code generation A-cranelift Things relevant to the [future] cranelift backend A-tidy Area: The tidy tool has-merge-commits PR has merge commits, merge with caution. merged-by-bors This PR was explicitly merged by bors. S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. T-bootstrap Relevant to the bootstrap subteam: Rust's build system (x.py and src/bootstrap) T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging this pull request may close these issues.