Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ZERO Hydra Failures 22.05 #172160

Closed
dasJ opened this issue May 9, 2022 · 25 comments
Closed

ZERO Hydra Failures 22.05 #172160

dasJ opened this issue May 9, 2022 · 25 comments
Labels
6.topic: release process Issues or PRs which are parts of the NixOS release process

Comments

@dasJ
Copy link
Member

dasJ commented May 9, 2022

Mission

Every time we branch off a release we stabilize the release branch.
Our goal here is to get as little as possible jobs failing on the trunk/master jobsets.
We call this effort "Zero Hydra Failure".
I'd like to heighten, while it's great to focus on zero as our goal, it's essentially to
have all deliverables that worked in the previous release work here also.

Please note the changes included in RFC 85.

Most significantly, branch off will occur on 2022 May 22; prior to that date, ZHF will be conducted
on master; after that date, ZHF will be conducted on the release channel using a backport
workflow similar to previous ZHFs.

Jobsets

trunk Jobset (includes linux, darwin, and aarch64-linux builds)
nixos/combined Jobset (includes many nixos tests)

How to help (textual)

  1. Select an evaluation of the trunk jobset
    Screenshot

  2. Find a failed job ❌️ , you can use the filter field to scope packages to your platform, or search for packages that are relevant to you.
    Screenshot from 2020-02-08 15 26 47
    Note: you can filter for architecture by filtering for it, eg: https://hydra.nixos.org/eval/1719540?filter=x86_64-linux&compare=1719463&full=#tabs-still-fail

  3. Search to see if a PR is not already open for the package. It there is one, please help review it.

  4. If there is no open PR, troubleshoot why it's failing and fix it.

  5. Create a Pull Request with the fix targeting master, wait for it to be merged.
    If your PR causes around 500+ rebuilds, it's preferred to target staging to avoid compute and storage churn. If your PR is fixing Haskell packages, target the haskell-updates branch instead.

  6. (after 2022 May 22) Please follow backporting steps and target the release-22.05 branch if the original PR landed in master or staging-22.05 if the PR landed in staging. Be sure to do git cherry-pick -x <rev> on the commits that landed in unstable. @jonringer created a video covering the backport process.

Always reference this issue in the body of your PR:

ZHF: #172160

Please ping @NixOS/nixos-release-managers on the PR and add the 0.kind: build failure label to the pull request.
If you're unable to because you're not a member of the NixOS org please ping @dasJ, @tomberek, @jonringer, @Mic92

How can I easily check packages that I maintain?

I have created an experimental website that automatically crawls Hydra and lists packages by maintainer and lists the most important dependencies (failing packages with the most dependants).
You can reach it here: https://zh.fail

If you prefer the command-line way, you can also check failing packages that you maintain by running:

# from root of nixpkgs
nix-build maintainers/scripts/build.nix --argstr maintainer <name>

New to nixpkgs?

Packages that don't get fixed

The remaining packages will be marked as broken before the release (on the failing platforms).
You can do this like:

meta = {
  # ref to issue/explanation
  # `true` is for everything
  broken = stdenv.isDarwin; 
};

Closing

This is a great way to help NixOS, and it is a great time for new contributors to start their nixpkgs adventure. 🥳

As with the feature freeze issue, please keep discussion here to a minimal so you don't ping all maintainers (although relevant comments can of course be added here if they are directly ZHF-related) and ping me or the release managers team in the respective issues.

cc @NixOS/nixpkgs-committers @NixOS/nixpkgs-maintainers @NixOS/release-engineers

Related Issues

@dasJ dasJ added the 6.topic: release process Issues or PRs which are parts of the NixOS release process label May 9, 2022
@dasJ dasJ pinned this issue May 9, 2022
@nixos-discourse
Copy link

This issue has been mentioned on NixOS Discourse. There might be relevant details there:

https://discourse.nixos.org/t/zero-hydra-failures-22-05/19051/1

@raboof
Copy link
Member

raboof commented May 9, 2022

Perhaps we should tag PR's that fix hydra failures with 0.kind: build failure to encourage reviewing? https://github.com/NixOS/nixpkgs/pulls?q=is%3Aopen+is%3Apr+label%3A%220.kind%3A+build+failure%22

Edit by @dasJ: I added the instruction into the the the issue description.

@GuillaumeDesforges
Copy link
Contributor

GuillaumeDesforges commented May 9, 2022

Some packages I maintain have on Hydra the errors

OSError: Too many open files

But they build ok on nixpkgs master locally.

Example: https://hydra.nixos.org/build/175654425/nixlog/1/tail

Not sure of what I can do on my end.

@vcunat
Copy link
Member

vcunat commented May 9, 2022

I restarted some, but that scipy build failed many times in a row so there it doesn't seem to make sense. I'd suggest to try skipping tests that do similar problems. EDIT: #170143

@sternenseemann
Copy link
Member

sternenseemann commented May 9, 2022

For Haskell, please remember to target any PRs to the haskell-updates branch! Edit by @dasJ: I added this hint to the issue description.

Since we've already marked (most) failures as broken, you need to check manually if your favorite package still works, instead of looking at failed builds on Hydra.

Additionally here is a list of more prominent problems (of Hakell packages exposed via top level pkgs) to look into, note that some of these are unmaintained and probably not worth fixing / should be removed in the long run.

  • hyper-haskell (a problem here is also the electron version used)
  • jl (jl: build failure due to ghc 9 #168256)
  • (hasura-graphql-engine, mostly blocked on upstream)
  • krank
  • cedille
  • diagrams-builder
  • glirc
  • hedgewars (exception: fix can go to master)
  • icepeak
  • madlang
  • nix-delegate
  • nix-deploy
  • stack2nix
  • stutter
  • tweet-hs

davidak pushed a commit that referenced this issue May 9, 2022
Upstream mentions[1] the oldest tested kernel is 4.19, so mark anything
older as broken.

ZHF: #172160

[1] https://github.com/voutilad/vmm_clock#tested-platforms-and-configs
otavio added a commit to otavio/nixpkgs that referenced this issue May 9, 2022
06kellyjac added a commit to 06kellyjac/nixpkgs that referenced this issue May 9, 2022
otavio added a commit to otavio/nixpkgs that referenced this issue May 9, 2022
@cab404
Copy link
Member

cab404 commented May 28, 2022

I've kinda put down a list of packages broken with stdenv update #zhfff
https://gist.github.com/cab404/96259f25450d778e744108c0ea9bfaa8
it’s parsed from hydra outputs with smth like that

[ ...(document.querySelector("#tabs-now-fail > table:nth-child(1) > tbody:nth-child(2)").children) ]
.filter((e) => e.getElementsByClassName("build-status")[0].attributes["alt"].value === "Failed" )
.filter((e) => e.children[5].textContent === "x86_64-linux")
.map((r) => r.children[2].textContent)

these only include ones broken in this eval (1756238) and still broken in this (1763443)

@azahi azahi mentioned this issue May 29, 2022
13 tasks
@dasJ dasJ closed this as completed May 30, 2022
@dasJ dasJ unpinned this issue May 30, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
6.topic: release process Issues or PRs which are parts of the NixOS release process
Projects
None yet
Development

No branches or pull requests