Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

nondeterminism from codegen-units #128675

Closed
bmwiedemann opened this issue Aug 5, 2024 · 11 comments
Closed

nondeterminism from codegen-units #128675

bmwiedemann opened this issue Aug 5, 2024 · 11 comments
Labels
A-codegen Area: Code generation A-reproducibility Area: Reproducible / deterministic builds C-bug Category: This is a bug. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.

Comments

@bmwiedemann
Copy link

While working on reproducible builds for openSUSE (sponsored by the NLnet NGI0 fund), I found that
compiling various rust applications with the default of codegen-units=16, it produced nondeterministic variations in resulting binaries. e.g.
pop-os/launcher#230

IMHO, the default settings should be made to behave deterministically. Either make codegen-units=1 the default or ensure that codegen-units=16 can produce deterministic results.

In the unlikely event that this is not possible, it should clearly be documented how to achieve reproducible builds, not only in
https://doc.rust-lang.org/rustc/codegen-options/index.html#codegen-units

This is about rust1.79 but started much earlier. At least with 1.73

@bmwiedemann bmwiedemann added the C-bug Category: This is a bug. label Aug 5, 2024
@rustbot rustbot added the needs-triage This issue may need triage. Remove it if it has been sufficiently triaged. label Aug 5, 2024
@kpcyrd
Copy link

kpcyrd commented Aug 5, 2024

Arch Linux is currently also having issues with the rustc and rustdoc binaries being unreproducible:

https://web.archive.org/web/20240805092434/https://reproducible.archlinux.org/api/v0/builds/649842/diffoscope

bmwiedemann added a commit to bmwiedemann/cargo-packaging that referenced this issue Aug 5, 2024
This produces binaries that are better optimized and deterministic.

This may become obsolete when
rust-lang/rust#128675
is resolved.
@jieyouxu jieyouxu added A-codegen Area: Code generation T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. A-docs Area: documentation for any part of the project, including the compiler, standard library, and tools A-reproducibility Area: Reproducible / deterministic builds labels Aug 5, 2024
@saethlin saethlin removed the A-docs Area: documentation for any part of the project, including the compiler, standard library, and tools label Aug 5, 2024
@saethlin
Copy link
Member

saethlin commented Aug 5, 2024

IMHO, the default settings should be made to behave deterministically. Either make codegen-units=1 the default or ensure that codegen-units=16 can produce deterministic results.

CGU partitioning is very deliberately designed to be deterministic.

@saethlin
Copy link
Member

saethlin commented Aug 5, 2024

@bmwiedemann It's unclear to me exactly what you are trying to report. In llvm/llvm-project#72206 you just generally refer to " different binaries in every build, even when trying to run builds as similar as possible."

But in pop-os/launcher#230 you are pointing out that changing the build environment but not the build settings changes the build. That has a fair chance of being a different rustc bug than the issue you filed on LLVM.

So I do not think it is accurate to say this started in 1.73.

@kpcyrd What are you trying to report? All you linked is a diff from a system that is completely unfamiliar, with no enlightening context. I can't tell whether you're just leaving a "seeing this too" comment (which is entirely unnecessary), or if you're trying to provide additional data for debugging, or if you're trying to report a third variation.

@workingjubilee
Copy link
Member

workingjubilee commented Aug 5, 2024

Based on the response to @bmwiedemann's original report, binaries are reproducible based on codegen units if you use the same number of codegen units. Thus, closing as already-completed.

@kpcyrd "I'm having trouble making these binaries reproducible" doesn't distinguish your report from any other extant A-reproducibility issue on the tracker, so please open a new issue with an actual report. Build logs are not an issue report. Even if they were, each distribution uses a wildly different format so it is not possible for any lessons to be learned and applied from distro to distro. The day SUSE and Arch and Debian and everyone else standardize on using exactly the same tooling with exactly the same configuration is nowhere in sight.

@workingjubilee workingjubilee closed this as not planned Won't fix, can't repro, duplicate, stale Aug 5, 2024
@workingjubilee
Copy link
Member

@bmwiedemann If I have made a mistake in identifying what you are reporting and you wish to reopen this, then there is one key improvement that is required as a baseline:

Please start by linking to the exact Rust source and build instructions that you are working from, including the patches you are applying, or where they would be if they existed. Make sure they can be accessed without authentication, instead of behind a JIRA tracker that requires logging in to read.

You see, in order for us to determine whether the issue is in rustc's source code, we have to be able to read your patches. At least two distributions have patched rustc's source and then reported issues caused or aggravated by the patches. This is quite unfortunate, as it reduces our confidence that updating our source code will actually fix issues in distributed rustcs, as distros may prefer to simply continue patching out the fixed code instead, leaving the problem behind.

@saethlin saethlin removed the needs-triage This issue may need triage. Remove it if it has been sufficiently triaged. label Aug 5, 2024
@bmwiedemann
Copy link
Author

On the openSUSE rust side, atm we have exactly one tiny downstream patch "to ignore GCC incompatible flag"
https://github.com/bmwiedemann/openSUSE/blob/master/packages/r/rust1.79/ignore-Wstring-conversion.patch

The https://github.com/bmwiedemann/openSUSE/blob/master/packages/p/pop-launcher/pop-launcher.spec is also not heavily patched and it did vary between builds on 1-core-VMs and 4-core-VMs, which stopped with codegen-units=1

That contradicts "CGU partitioning is very deliberately designed to be deterministic", so I still think this bug is valid.

@workingjubilee
Copy link
Member

Then, can you precisely describe the variance that you are seeing, with fresh builds using rust 1.80 and the current rust nightly, and what is different between which build, and if the result is deterministic for a given rustc between VMs that have the same amount of cores? So that we at least have eliminated that much possibility?

Can you replicate this variance without using both the Just build system and cargo? We have no special knowledge of Just and it is possible it is doing things that disturb this fact.

Can you replicate the variance with a simpler program that does not require cargo to build? There are very many things that cargo does itself, not all of them deterministic, which have little to do with how rustc operates.

@workingjubilee
Copy link
Member

Judging by how the LLVM issue has played out, it seems that some of the variance that is being described in the LLVM issue is a result of the source code that you are packaging, which has included nondeterminism in its build: llvm/llvm-project#72206 (comment)

This has little to do with the Rust toolchain. If you call getrandom() in the code supplied to a build system for build-time execution, then you get nondeterminism in the build. It is good if, when reporting a lack of reproducibility, others can reproduce that. It is best if you try to minimize the issue at least partially.

@workingjubilee
Copy link
Member

The degree to which this remains a concern is probably a complete duplicate with #126976

@bmwiedemann
Copy link
Author

Looking at Instagram/LibCST#1213 it seems there is still work left to do in rust or llvm. That was with rust 1.81.0

@bmwiedemann
Copy link
Author

bmwiedemann commented Oct 14, 2024

I found that the issue in pop-launcher and python-libcst is not with codegen-units, but these become reproducible with lto = "off" . It just happened that I had used lto = false and that behaves like off iff codegen-units = 1

Edit: they even build reproducibly without any changes now with rust-1.81.0

Edit2: they still vary, but only sometimes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-codegen Area: Code generation A-reproducibility Area: Reproducible / deterministic builds C-bug Category: This is a bug. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.
Projects
None yet
Development

No branches or pull requests

6 participants