Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ci: Build action has been consistently failing macos-arm64-build in finch-core #367

Open
ginglis13 opened this issue Aug 5, 2023 · 10 comments

Comments

@ginglis13
Copy link
Contributor

ginglis13 commented Aug 5, 2023

The Build action has been consistently failing for the last month: https://github.com/runfinch/finch-core/actions/workflows/release.yaml

GitHub runners use passwordless sudo. However runners provisioned via finch infra don't allow passwordless sudo. (EDIT: this is observed consistently on the macOS 12 runner for arm64 https://github.com/runfinch/finch-core/actions/runs/6010202363/job/16301161154)

The log lines in step Make and release deps before timeout have been:

if [ "Darwin" != "Linux" -a ! -e "/opt/homebrew/bin/nerdctl" ]; then ln -sf nerdctl.lima "/opt/homebrew/bin/nerdctl"; fi
if [ "Darwin" != "Linux" -a ! -e "/opt/homebrew/bin/apptainer" ]; then ln -sf apptainer.lima "/opt/homebrew/bin/apptainer"; fi
sudo may prompt for password to run FileMonitor
Error: The operation was canceled. # <-- timeout, cancelled workflow

This message is coming from https://github.com/runfinch/finch-core/blob/08a4ca2a9285f1dd2fac3bd4701087b1b2fdec87/bin/lima-and-qemu.pl#L46

Still looking to verify but the smoking gun is that the script is hanging on a prompt for password.

eOn my machine macOS Ventura 13.4 M1 chip:

./bin/lima-and-qemu.pl                                            
ls: /opt/homebrew/bin/limactl: No such file or directory
Missing argument in sprintf at ./bin/lima-and-qemu.pl line 213.
sudo may prompt for password to run FileMonitor
Password:
@weikequ
Copy link
Contributor

weikequ commented Sep 5, 2023

Thanks for the bringing this up! I am slightly confused, why does the x86 one work, but the arm64 one not work? Shouldn't they be based off the same underlying infra? In addition, our e2e tests on runfinch/finch works totally fine with sudo commands. See this example. I wonder if it has anything to do with this being a perl script that's run instead of a normal bash/zsh script

@ginglis13
Copy link
Contributor Author

I am slightly confused, why does the x86 one work, but the arm64 one not work? Shouldn't they be based off the same underlying infra?

Yes they should be from what I can tell. This is the root of the issue, which I do not have a root cause for. Take a look at this recent execution of the Build action: https://github.com/runfinch/finch-core/actions/runs/6010202363/job/16301161154

You can see the prompt for sudo is blocking. This has been consistent over the last 3months (at least from what I can see).

I wonder if it has anything to do with this being a perl script that's run instead of a normal bash/zsh script

maybe... but this is observed only on a specific macOS version/architecture, the perl script works fine on the others.

@weikequ
Copy link
Contributor

weikequ commented Sep 5, 2023

Did a quick test on this runner by changing the workflow to the following:

...
          sudo echo hi
          ./bin/lima-and-qemu.pl
...

The runner gets stuck on ./bin/lima-and-qemu.pl and not sudo echo hi.

@weikequ
Copy link
Contributor

weikequ commented Sep 6, 2023

Hmm, also not a perl thing:

Run echo '#!/usr/bin/env perl' >> test.pl
  echo '#!/usr/bin/env perl' >> test.pl
  echo 'system("sudo echo sudoed")' >> test.pl
  chmod u+x test.pl
  ./test.pl
  shell: /bin/zsh {0}
  env:
    GO111MODULE: on
sudoed

@weikequ
Copy link
Contributor

weikequ commented Sep 6, 2023

It is due to this line sleep(1) until -s $filemonitor; that the workflow hangs, not the use of sudo's password entry. @vsiravar can you take a look at why it's not correctly evaluating the size changes? From lima-and-qemu.pl:

...
END { system("sudo pkill FileMonitor") }
system("sudo echo this-should-show");                # this shows up
print "sudo may prompt for password to run FileMonitor\n";
system("sudo -b /Applications/FileMonitor.app/Contents/MacOS/FileMonitor >$filemonitor 2>/dev/null");
system("sudo echo this-should-show");                # this shows up
sleep(1) until -s $filemonitor;
system("sudo echo this-probably-wont-show");         # this does not show up
...

@vsiravar
Copy link
Contributor

vsiravar commented Sep 6, 2023

The weird thing though is that it does not hang on a self-hosted runner provisioned manually. Log from a previous run. Does anything show up in the filemonitor.log when the workflow hangs?

@weikequ
Copy link
Contributor

weikequ commented Sep 6, 2023

No, I tried inserting a system("sudo cat $filemonitor");, right before sleep, but nothing is displayed

@vsiravar
Copy link
Contributor

vsiravar commented Sep 6, 2023

can you take a look at why it's not correctly evaluating the size changes?

sleep(1) until -s $filemonitor; is behaving as expected since $filemonitor is empty.

Did you also check if /Applications/FileMonitor.app/Contents/MacOS/FileMonitor process is running after system("sudo -b /Applications/FileMonitor.app/Contents/MacOS/FileMonitor >$filemonitor 2>/dev/null");.

@weikequ
Copy link
Contributor

weikequ commented Sep 6, 2023

96674 ??         0:00.00 sudo -b /Applications/FileMonitor.app/Contents/MacOS/FileMonitor
96675 ??         0:00.00 /Applications/FileMonitor.app/Contents/MacOS/FileMonitor
96676 ??         0:00.00 sh -c ps -ax | grep FileMonitor
96678 ??         0:00.00 grep FileMonitor

@weikequ
Copy link
Contributor

weikequ commented Sep 6, 2023

Update after troubleshooting: FileMonitor requires (or makes Terminal require) Full Disk Access. It is unclear why macOS 11 for x86 works.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants