Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

eval cache: fix and make re-evaluation of cached errors configurable #10368

Closed
wants to merge 2 commits into from

Conversation

Ma27
Copy link
Member

@Ma27 Ma27 commented Mar 30, 2024

Motivation

I'm on Nix 2.19 and I wanted to fix the dreaded cached failure of attribute thing (again).
I realized that it's gone on 2.20+, but it turns out the eval cache just regressed and doesn't seem to cache anything. Fix for that is in the first commit.
The second commit adds an eval setting to configure whether or not to force re-evaluation of cached failures.

Future work:

  • set this to false by default for a few commands such as nix search.
  • cache error messages (and traces?) as well.

Context

I'd recommend to review commit-wise.

Priorities and Process

Add 👍 to pull requests you find important.

The Nix maintainer team uses a GitHub project board to schedule and track reviews.

Ma27 and others added 2 commits March 30, 2024 13:13
…ided

In 6558da4 the way the cache key for
the eval cache was computed was deferred to libfetchers. However, the
git fetcher only provides a fingerprint if a rev is given which seems
reasonable.

That breaks the eval cache however because calls like `nix build .#` are
uncached now because the input doesn't have a revision. Adding the old
behavior back only if libfetchers doesn't return a fingerprint and add a
regression test.

Not sure btw why I'd need to run `nix build` three times when having
`packages.$system.foo` fail on evaluation until it's cached, added a
FIXME for now since that's something we'll probably want to solve in the
future. Also interesting that if the attribute `foo` fails, it never
gets cached currently 🤔
When I run `nix build` on a broken attribute in a flake twice, I get the
dreaded

    error: cached failure of attribute 'snens'

This also happens when you re-run the previous command with
`--show-trace`. While trying to fix that, Linus and I discovered that
the eval cache was broken for use on local flakes (i.e. `nix build .#`)
was broken which got fixed in the commit before.

We decided to use an eval setting for that because `forceError` seems
to have been added somewhat arbitrarily as argument to the functions of
AttrCursor where this was needed at the time it was developed. The effect
was that it was hard for us to understand when an invocation can force
errors and when it doesn't. For instance, `AttrCursor::isDerivation()`
never forces re-evaluation of errors, so if `maybeGetAttr("type")`
is the first thing that breaks eval, you'd always run into a cached
failure.

For now, this is `true` by default because I think the current behavior
with a properly working eval cache is pretty painful while hacking on
(or with) a flake. Potential future improvements are

* Setting `evalSettings.forceErrorsInEvalCache` to `false` for
  commands like `nix search`.

* Allowing the eval cache to cache the full failure (right now,
  `failed_t` is empty). That way, you could get errors when re-running
  `nix build` without having to re-evaluate everything, perhaps even
  with traced.

  That's the solution Linus and I would prefer.

Co-authored-by: Linus Heckemann <[email protected]>
@Ma27 Ma27 requested a review from edolstra as a code owner March 30, 2024 12:19
@github-actions github-actions bot added the with-tests Issues related to testing. PRs with tests have some priority label Mar 30, 2024
@edolstra
Copy link
Member

edolstra commented Apr 2, 2024

Disabling caching for dirty trees was intentional (and part of the incremental lazy-trees backport). Computing a cache key for a dirty Git tree currently requires reading the entire workdir, so that's very not-lazy. (We could compute a cache key more cheaply from the Git revision and the contents of the dirty files, but that requires more work.)

Note that flake.path.to_string() is not a valid cache key, since for a dirty Git tree it will return something like /path/to/tree, which doesn't change if the contents of the tree change.

Regarding "cached failure of attribute": can you create an issue for that? Seeing that message should be considered a bug, since in case of a cached failure, we should always reevaluate to get the actual error.

@roberth
Copy link
Member

roberth commented Apr 2, 2024

Computing a cache key for a dirty Git tree currently requires reading the entire workdir,

Determining whether the tree is dirty already requires reading the entire workdir.
I don't know off the top of my head the details, but these are good starting points:

  • romkatv's work on optimizing and/or replacing "git status"
  • if something uses mtime, we could use the same info to produce "less perfect" but similarly valid cache keys

Furthermore, concurrent mutation could lead to incorrect cache entries if implemented naively:

  • It seems feasible to have cache keys that only involve the files that are actually read, although remembering which files may affect any given attribute is of course more complicated than a single key. (a lot more complicated, but also a lot more effective caching)
  • A simpler solution is to always "stage" the workdir, except to our own index instead of the normal git index. This gets rid of the fragile custom code paths that access the workdir directly.

We should probably do the latter anyway, so that git clean filters are applied, just as they would be by git when the files are committed.

@Ericson2314
Copy link
Member

We ought to also be able to cheaply get a git tree for staged data

@edolstra
Copy link
Member

edolstra commented Apr 3, 2024

Determining whether the tree is dirty already requires reading the entire workdir.

It doesn't, git status is very fast because it relies on inodes' mtimes (or ctimes?) instead of reading file contents.

@Ma27
Copy link
Member Author

Ma27 commented Apr 4, 2024

About the issue itself

can you create an issue for that? Seeing that message should be considered a bug, since in case of a cached failure, we should always reevaluate to get the actual error.

There's already #3872 and in fact I fixed the problem once already: #4688

The patch applies (short of the regression-test) cleanly on 2.19 and is the main motivation why I filed this PR, I wanted to get rid of this behavior. So we could actually backport to 2.19 & 2.18 (which is the stable minor in nixpkgs IIRC).

It probably exists on 2.20+ as well, however I didn't bother to check because it's rather hard to trigger the eval cache reliably / easy to hit issues (or I just have bad luck) and it was very hard for me to trigger the issue reliably. Upon investigating I found out that dirty trees are simply not cached anymore. But anyways, we'll probably need the second commit on 2.20+ as well.
And I should file an issue for the weird caching behavior with nested attributes.

I'd like to go into a direction where we cache errors again, but we also cache the error messages.

Whether or not to cache dirty trees

Note that flake.path.to_string() is not a valid cache key, since for a dirty Git tree it will return something like /path/to/tree, which doesn't change if the contents of the tree change.

Sure, it's part of a cache key with a few other things.

Computing a cache key for a dirty Git tree currently requires reading the entire workdir, so that's very not-lazy

Does it though? Which part of the old cache-key requires reading the entire tree?

It seems feasible to have cache keys that only involve the files that are actually read, although remembering which files may affect any given attribute is of course more complicated than a single key

That actually seems like an interesting follow-up task although it requires a major rework of the evaluation cache. Is this a change the Nix team would be interested in?
I'd strictly focus on solving the current issue first, but it may be a useful future improvement.

A simpler solution is to always "stage" the workdir, except to our own index instead of the normal git index

So my knowledge about the Nix codebase might've gotten a little dusty in the past year, but what does "our" own index" mean in this context?

@roberth
Copy link
Member

roberth commented Apr 5, 2024

It doesn't, git status is very fast because it relies on inodes' mtimes (or ctimes?) instead of reading file contents.

As I've said,

  • if something uses mtime, we could use the same info to produce "less perfect" but similarly valid cache keys

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
with-tests Issues related to testing. PRs with tests have some priority
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants