-
Notifications
You must be signed in to change notification settings - Fork 92
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Experiment: use Tweag's self-hosted runners #2037
Conversation
Bencher Report
Click to view all benchmark results
|
3deaa20
to
5639d04
Compare
Thanks for your precious help @YorikSar . It seems the macOS workflow is failing because of some
|
I think we still need to investigate results on our Linux runner because even there the job seems to spend 24m when a random different run on master finished in 3m. As for macOS, I think we could bring this up to Cachix maintainers. Calling out to |
I think the continuous integration for merge queue (the one you linked) is different - I don't know what happens exactly but some caching comes into play. If you take the CI on a random PR, it's usually rather around 20+ minutes so 24 minutes don't stand out, I mean, it does because it should still be quite faster on our own runner. To be honest we seem to recompile a lot of stuff on each run, and we've not really been able to entirely understand if Cachix is being hit every time, if
Fair. I can take care of looking for a similar issue, and if not, opening one. |
Issue opened, and Domen suggest to bump Nixpkgs: cachix/cachix-action#190 (comment) (@YorikSar). I guess this means the nixpkgs used by the build machines, as cachix will use that one, not the input of the Nickel flake. |
@yannham it’s used from this flake’s inputs:
|
Oh, my bad. |
@yannham Looks like jobs are not getting triggered. Maybe because of the conflicts? |
78da50c
to
036e5c4
Compare
Ok, Cachix seems to work fine now, but we get a meson error when trying to build |
From logs from macOS:
It seems this derivation wants to access something outside its sandbox... Although there is also this line at the top that might be related:
What does this |
There is an experimental feature to evaluate a Nix expression in Nickel. It's been written before the Nix C API, so it binds directly to the Nix C++ API. Before we just used the default package of the I saw that the |
Maybe one solution would be to try to migrate that to use the Nix C API (we only really need to evaluate expressions, which should be quite straightforward), but I initially didn't want to go this route for this PR that has nothing to do with that, so I tried to get it working with minimal tweaking first. |
I would assume something was merged related to using meson in nixos/nix repo. Our macOS runner has sandbox enabled in Nix which is disabled by default, which is probably why this issue didn't come up before on hosted runners. |
Right, it seems at the beginning of this summer, they started to switch to meson. I guess we're up for an issue. |
flake.lock
Outdated
"owner": "nixos", | ||
"repo": "nix", | ||
"rev": "0ab9369572f64b1ab70a8db29f79ae730ff31ab6", | ||
"rev": "2c42a9dbaa805f4f29561d9a1c10b41dfe98dcfa", | ||
"type": "github" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@yannham I see you've updated all inputs here. Maybe rollback nixos/nix input to the previous value to see if this helps?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, we can do that of course since Nix isn't taken from Nixpkgs. Good catch, I'll try that
Another option would be to revert changes to flake.lock and just add a new input for nixpkgs from which we'll install cachix. We could also install it from cachix repo directly instead. I think it should exclude issues that have risen because of this update. |
On the other hand we haven't been updating the nixpkgs input for some time, and I think it's good that we do - although I can cherry-pick the changes into a separate PR. But the thing is that now the error seems related to the extra Nix input that should have been reverted to a previous version. And I can't get it to not do the checks for some reason. |
Ah, but the inputs of |
481dabe
to
4d06761
Compare
John ask if it's possible that build02 identifies as something else than |
4d06761
to
86a7a83
Compare
Even though Linux job failed to build Nix this time, it looks like macOS one did something for 6 minutes, so we can assume it didn't break on Nix build. I think we should make that weird step output logs always with something like |
It's really annoying - I'm basically trying many different tags but either Boehm or Nix checks fail locally (so not even on MacOS). It's surprising that something as central as Nix is not reproducible to build on a pretty standard config (Debian unstable with Nix as a package manager). |
Oh, I think this is because we follow the nixpkgs inputs from the main flake. It seems the Boehmgc update |
Oh, interesting. I think that explains the test failure. |
Does it? How so? |
I just took a quick look, but it seems the test is supposed to error out with the same error message as us on Darwin because MacOS do unicode normalization. I think the test tries to create two files with byte-different names but representing the same "unicode strings", if that means anything. MacOS just thinks they're the same file so it doesn't allow it (file already exists). For some reason the tests expect that Linux does not, and so should accept to create both files as separate. However, it seems that this assumption is false with ZFS+normalization option (which implies |
The test was added 3 weeks ago. |
@yannham Yeah, makes sense. I could disable normalisation for /nix on the server. If it would remove some difference with the upstream, it might be really needed for development. |
5ea7a20
to
a1f3f81
Compare
Ahah! it seems to build Nix correctly (there is a clippy error now), using my fork of Nix. Let me see if I can get everything green, and then upstream the patch. |
You did it! Congrats! :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This PR seems to be much more than just switching to the self-hosted runners. Maybe it would be worth splitting out other changes and landing them separately?
I crawled my way until it worked, but now that it is (more or less), it's fair. I'll split the update flake lock and fix breakages in another PR. |
8147a05
to
a17e64b
Compare
a17e64b
to
8137afc
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@yannham I've finished restoring our macOS builder, and re-run check on this PR. It complained that find
and xargs
were not found, so I took a liberty of attempting to fix this in a separate commit here and rebasing on top of master.
8137afc
to
4676c0b
Compare
Ah, I forgot to remove
|
It is not installed by default on our self-hosted runners.
4676c0b
to
21279f4
Compare
Thanks for all the help @YorikSar ! We'll see how it goes for new PRs, but I have good hope that this will speed up the CI. |
By default GitHub lists all arguments in brackets after job name. This gets unwieldy when we include runner labels in #2037. For example, Linux job is now "build-and-test (self-hosted, Linux, X64, stable)". Shorten it to "build-and-test (linux, stable)" instead.
By default GitHub lists all arguments in brackets after job name. This gets unwieldy when we include runner labels in #2037. For example, Linux job is now "build-and-test (self-hosted, Linux, X64, stable)". Shorten it to "build-and-test (linux)" instead. Also, remove rust_channel as it wasn't used.
By default GitHub lists all arguments in brackets after job name. This gets unwieldy when we include runner labels in #2037. For example, Linux job is now "build-and-test (self-hosted, Linux, X64, stable)". Shorten it to "build-and-test (linux)" instead. Also, remove rust_channel as it wasn't used.
Try to use our self hosted runners, which should be beefier, don't need to install Nix again and again, and should be able to re-use the local store as a cache mechanism as well.