Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mostly replace profile scripts with Rust #738

Closed
wants to merge 17 commits into from
Closed

Conversation

grahamc
Copy link
Member

@grahamc grahamc commented Nov 22, 2023

Try this PR:

curl --proto '=https' --tlsv1.2 -sSf -L https://install.determinate.systems/nix/pr/738 | sh -s -- install

Description

(Includes clippy-nits, #737)

We rewrote the Nix installer in Rust to escape horrible and unpredictable bugs in the install scripts.
This has been super successful, but upstream still has a four critical scripts: the nix-profile* scripts.

Recently, we introduced a patch to add Nix's paths to XDG_DATA_DIRS: NixOS/nix#8985
This had an oversight, which was fixed in a PR: NixOS/nix#9312
...which also had an oversight which was released as 2.19.0, and fixed in a PR: NixOS/nix#9425

This quicksand style development nightmare is exactly why we rewrote it in the first place.
I examined the four versions (nix-profile{-daemon}.{fish,sh}) -- which all have different peculiarities, behaviors, and bugs.

Here are some of the issues I identified while trying to understand the differences.
Note that this isn't intended to be a "call-out" thread where I drag people through the mud or make Nix look bad or whatever.
I personally introduced some, and folks at DetSys introduced others.
This code was written by smart people, trying to do the right thing, but didn't have the tools to because scripting languages suck.

  1. nix-profile-daemon.{fish,sh} protect against double-loading but nix-profile.{fish,sh} don't:

    if [ -n "${__ETC_PROFILE_NIX_SOURCED:-}" ]; then return; fi
    __ETC_PROFILE_NIX_SOURCED=1
    if test -n "$__ETC_PROFILE_NIX_SOURCED"
    exit
    end
    set __ETC_PROFILE_NIX_SOURCED 1
  2. nix-profile-daemeon.fish sometimes leaks the add_path function, because it registers the function and then detects if Nix was sourced already:

    function add_path --argument-names new_path
      # ...snip...
    end
    
    # Only execute this file once per shell.
    if test -n "$__ETC_PROFILE_NIX_SOURCED"
      exit
    end
    
    # ...snip...
    
    functions -e add_path
  3. nix-profile.{fish,sh} check that HOME is defined before using it, but nix-profile-daemon.{fish,sh} don't, exposing users with set -u to a potential crash in the most common use case:

    if [ -n "$HOME" ] && [ -n "$USER" ]; then
  4. nix-profile.{fish,sh} requires that USER is defined, despite never using it. Likely leftover from a refactor around how GC roots were configured.

  5. nix-profile{-daemon}.sh were updated to account for the XDG directory migration, but the Fish equivalent weren't:

    NIX_LINK="$HOME/.nix-profile"
    if [ -n "${XDG_STATE_HOME-}" ]; then
        NIX_LINK_NEW="$XDG_STATE_HOME/nix/profile"
    else
        NIX_LINK_NEW="$HOME/.local/state/nix/profile"
    fi

    vs. the naive Fish:

    set --export NIX_PROFILES "@localstatedir@/nix/profiles/default $HOME/.nix-profile"

    By way of note, the XDG migration included a useful, user-forward migration path for users who had both a legacy and an XDG-based path. It was made defunct by a logical inversion by mistake,

  6. nix-profile.sh, nix-profile.fish, nix-profile-daemon.sh all include the user's Nix profile in the XDG_DATA_DIRS, but nix-profile-daemon.fish does not:

    set --export XDG_DATA_DIRS "$XDG_DATA_DIRS:$NIX_LINK/share:/nix/var/nix/profiles/default/share"

    vs.

    set --export XDG_DATA_DIRS "$XDG_DATA_DIRS:/nix/var/nix/profiles/default/share"
  7. The profile scripts typically make an attempt at leaving the NIX_SSL_CERT_FILE environment variable alone if the user set it, but...

    • nix-profile.sh doesn't bother
    • nix-profile.{fish,sh} look to see if NIX_SSH_CERT_FILE (note the H!) is set instead of NIX_SSL_CERT_FILE (note the L).
  8. All but nix-profile-daemon.sh will check to see if $NIX_LINK/etc/ca-bundle.crt exists and use that.

  9. nix-profile-daemon.{sh,fish} both check for a file called etc/ssl/certs/ca-bundle.crt in all the defined NIX_PROFILES, but nix-profile.{sh,fish} don't.

  10. nix-profile.{sh,fish} will extend MANPATH if it is already set, but the -daemon scripts won't.

  11. The nix-profile-daemon.{fish,sh} scripts put /nix/var/nix/profiles/default/bin into the PATH, but the others don't. This is true, despite all four setting up the default profile.

So I experimented with rewriting these complicated and tricky bash profiles in Rust.

However, Rust can't set environment variables in the parent so we still need a bit of shell glue to finish the job.
The goal is to have as little shell as possible, in a way that ~never needs to be touched.

This is challenging, because we have to stick to the fundamentals of POSIX.
That means using read with a heredoc, or eval.

I don't like eval, but the heredoc method turns out to be very brittle. POSIX doesn't support \0, and this method implies trying to create a parser for the output in the most limited form of shell.

The aborted attempt at not using eval

The Rust program emits data like this:

ENVVARNAME ENVVARDATA
ENVVARNAME2 ENVVARDATA2

where ENVVARNAME is guaranteed to be a-zA-Z0-9_, and ENVVARDATA has no newlines or null bytes.
Then the profile script loads the environment variables like this:

# shellcheck shell=sh
if [ -f /nix/nix-installer ] && [ -x /nix/nix-installer ] && [ -z "${__ETC_PROFILE_NIX_SOURCED:-}" ]; then
    NIX_INSTALLER_EXPORT_DATA=$(/nix/nix-installer export --format space-newline-separated);
    while read -r nix_installer_export_key nix_installer_export_value; do
        export "$nix_installer_export_key=$nix_installer_export_value";
    done <<DATA_INPUT
$NIX_INSTALLER_EXPORT_DATA
DATA_INPUT

    unset NIX_INSTALLER_EXPORT_DATA nix_installer_export_key nix_installer_export_value
fi

...this is profoundly ugly, but it passes shellcheck and doesn't require touching for any logical changes to the environment handling.

Is it safe?

No.

Previous text before I started testing against a bunch of shells:

According to the POSIX spec and testing, yes.
IFS=' ' read -r key value is like a split_once, taking the first space-separated value of each line and storing it at $key, and then all remaining data of the line -- regardless of spaces -- is stored at $value.
This means any data with spaces, tabs, backslashes, etc. should be preserved.
The only rule is the data doesn't contain a newline -- which is protected against.

eval

Another way that can also be safe is to just use eval, since we know how to safely escape shell values:

eval "$(/nix/nix-installer export --format=sh)

The downside is this is much more "mystery-meat".
The nix-installer binary may be perceived as having more capabilities, though the capabilities aren't practically increased over the current implementation.
The user may also have concerns about the safety and correctness of the code we emit to be evaled, founded or unfounded.

Using eval is like a skepticism lightning-rod, however it is our safest and most resilient option.

Appendix of options that are not POSIX compliant

Null-byte separated output

This was the first try:

nix-installer export --format null-separated \
  | while IFS= read -r -d $'\0' key && IFS= read -r -d $'\0' value; do
    export "$key"="$value"
done

However, read -d and $'\0' aren't POSIX.

The space and newline separation, without the heredoc

nix-installer export --format space-newline-separated \
  | while IFS=' ' read key value; do
    export "$key"="$value"
done

Unfortunately this doesn't work either because of subshells.
The export "$key"="$value" step is executed in a subshell, which doesn't change the environment of the caller.
Bash 4.2+ does have shopt -s lastpipe, but again -- not POSIX.

Process substitution directly into the while

Like this:

while IFS=' ' read key value; do
  export "$key"="$value"
done < <(nix-installer export --format space-newline-separated)

However, < <(...) isn't POSIX.

Checklist
  • Formatted with cargo fmt
  • Built with nix build
  • Ran flake checks with nix flake check
  • Added or updated relevant tests (leave unchecked if not applicable)
  • Added or updated relevant documentation (leave unchecked if not applicable)
  • Linked to related issues (leave unchecked if not applicable)

TODO

Open Questions

  • Is making our own profile.sh hook a liability we don't want to take? How can we do this most safely?

Validating with install.determinate.systems

If a maintainer has added the upload to s3 label to this PR, it will become available for installation via install.determinate.systems:

curl --proto '=https' --tlsv1.2 -sSf -L https://install.determinate.systems/nix/pr/$PR_NUMBER | sh -s -- install

Base automatically changed from clippy-nits to main November 22, 2023 19:50
@grahamc grahamc force-pushed the replace-profile-scripts branch from 5685192 to f8dbb4d Compare November 22, 2023 19:51
@grahamc grahamc mentioned this pull request Nov 22, 2023
6 tasks
@abathur
Copy link
Contributor

abathur commented Nov 23, 2023

I can't find it at the moment, but I feel like I remember someone floating the idea of simplifying Nth shell support by replacing all of the scripts with a ~wrapper executable along the lines of env. Ensuring it's in place and used in the right places smells just as challenging, though?

@grahamc grahamc force-pushed the replace-profile-scripts branch from b5e894b to b5a4d15 Compare November 25, 2023 19:57
@grahamc grahamc added the upload to s3 The labeled PR is allowed to upload its artifacts to S3 for easy testing label Nov 27, 2023
@@ -61,6 +61,7 @@ which = "4.4.0"
sysctl = "0.5.4"
walkdir = "2.3.3"
indexmap = { version = "2.0.2", features = ["serde"] }
export = { git = "https://github.com/DeterminateSystems/export" }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Before this PR can merge this git dependency either needs to be published to crates.io or pulled into this repo as a workspace.

Since nix-installer is published to crates.io, we cannot have git deps.

@grahamc
Copy link
Member Author

grahamc commented Nov 28, 2023 via email

{
let mut path = vec![
nix_link.join("bin"),
// Note: This is typically only used in single-user installs, but I chose to do it in both for simplicity.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// Note: This is typically only used in single-user installs, but I chose to do it in both for simplicity.
// Note: This is typically only used in single-user installs, doing it in both for simplicity.

},
None => {
match calculate_environment() {
e @ Err(Error::AlreadyRun) => {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure this is ideal. Returning errors invokes various backtrace handling which could add several microseconds to runtime.

Consider checking the env var on a new line between L82 and L83:

    if nonempty_var_os("__ETC_PROFILE_NIX_SOURCED") == Some("1".into()) {
         tracing::debug!("Already ran export");
         return Ok(ExitCode::SUCCESS);
    }

Then removing that error variant from the error type.

@Hoverbear
Copy link
Contributor

I ran our VM test suite over this and it was without issues. Nice.

@Hoverbear
Copy link
Contributor

Overall the code looks correct, only a few minor nits (above).

In this PR you describe several issues with the existing profile scripts: We should make sure each is reported (or fixed) upstream, even if we ultimately don't use them.


On a personal note, I don't love this. It makes me feel bad.

I don't love the installer running each time someone starts a new shell. (Okay fine, maybe we can consider it is 'installing' Nix into the shell.. Sure -- that's fine I guess.)

I don't love the installer doing something that seems more the domain of Nix itself.

I don't love having users look at the source script and seeing a call to a binary wrapped in an eval instead of a relatively plain bash file setting some environment variables.

I don't love that this will cause us to drift from upstream. I don't love that we'll now need to pay attention to changes in these files upstream and make sure we don't cause breakage for folks.

I also don't love that it's an several times (~8x in my testing) slower. I'm observing that bash -c '. /nix/nix-installer.d/profile.sh' takes approximately 8ms (7ms being our binary) using this branch. On the main branch of this installer, bash -c '. /nix/var/nix/profiles/default/etc/profile.d/nix-daemon.sh' takes ~1.2ms. (a tool with a similar job, starship, takes only 2.8ms)

I believe much of that time is related to starting the tokio executor, setting up logging, and doing arg parsing etc. I believe we can overcome this by using a different small executable, avoiding slow syscalls, or getting a bit clever with when we start up our tokio runtime, start our logging, and parse our clap args.

Regardless: We should address this slowdown if you want to shepherd this into a merge. 8ms on each shell start is an incredibly long time, even moreso on Mac where the remote builder fix causes it to be executed much more often then just on shell start.

I understand the motivations behind this change, but if I may be so bold: Many of the bugs originating from the shell scripts are not a problem with the scripts themselves, but the processes in which changes to those scripts are made. For example, NixOS/nix#8985 was not ready to merge, it was basically an invite for the Nix team to consider if they'd want such a thing. It should have had another pass from me, underwent more review, and have had a test added. None of those things happened. I planned to do those things with it but it was merged rather abruptly after it was decided it was the direction Nix wanted to.

This process problem is not something I have the ability to fix, I don't know if any of us do. There cannot be a culture of excellence around these scripts when, for example, I can very trivially go onto the Nix repository and find examples of contributors posting and merging their own PRs without review or discussion. That is not the kind of software engineering that is expected from foundational build technology that companies and individuals need to rely on, like Nix... that's cowboy coding. The kind of thing you do when you're a lone developer on a new prototype.

It is because of those process problems that I think we may want to opt for this solution: To help mitigate potential damage from future process related issues... But Nix is a product of those processes as well, so I don't really know.

@grahamc grahamc closed this Oct 3, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
upload to s3 The labeled PR is allowed to upload its artifacts to S3 for easy testing
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants