Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Play fails if temp directory is deleted mid-play #1061

Open
markafarrell opened this issue Apr 12, 2024 · 8 comments · May be fixed by #1062
Open

Play fails if temp directory is deleted mid-play #1061

markafarrell opened this issue Apr 12, 2024 · 8 comments · May be fixed by #1062
Labels
affects-0.3 Issues related to 0.3.X Mitogen releases bug Code feature that hinders desired execution outcome

Comments

@markafarrell
Copy link

markafarrell commented Apr 12, 2024

If the ansible temp directory is removed mid-play mitogen does not recreate it and the play fails.

An exception occurred during task execution. To see the full traceback, use -vvv. The error was:     _os.mkdir(file, 0o700)
fatal: [172.17.0.9]: FAILED! => {"msg": "Unexpected failure during module execution: builtins.FileNotFoundError: [Errno 2] No such file or directory: '/tmp/.ansible-test/tmp/ansible_mitogen_runner_y_absj50'
  File \"<stdin>\", line 3876, in _dispatch_one
  File \"master:/home/xxxxxxxx/work/mitogen-repro/.venv/lib/python3.10/site-packages/ansible_mitogen/target.py\", line 415, in run_module
    return impl.run()
           ^^^^^^^^^^
  File \"master:/home/d384492/work/mitogen-repro/.venv/lib/python3.10/site-packages/ansible_mitogen/runner.py\", line 445, in run
    self.setup()
  File \"master:/home/d384492/work/mitogen-repro/.venv/lib/python3.10/site-packages/ansible_mitogen/runner.py\", line 934, in setup
    self._stdio = NewStyleStdio(self.args, self.get_temp_dir())
                                           ^^^^^^^^^^^^^^^^^^^
  File \"master:/home/d384492/work/mitogen-repro/.venv/lib/python3.10/site-packages/ansible_mitogen/runner.py\", line 361, in get_temp_dir
    self._temp_dir = tempfile.mkdtemp(
                     ^^^^^^^^^^^^^^^^^
  File \"/usr/lib/python3.11/tempfile.py\", line 507, in mkdtemp
    _os.mkdir(file, 0o700)
", "stdout": ""}

Using the normal ansible strategy the temp directory is recreated and the play succeeds.

Ansible version: 2.14.15

Host OS: Ubuntu (WSL2)
Target OS: Debian12 (docker)

Host Python: Python 3.10.12
Target Python: Python 3.11.2

See https://github.com/markafarrell/mitogen-repro-issue-1061 for reproduction instructions

@markafarrell markafarrell added affects-0.3 Issues related to 0.3.X Mitogen releases bug Code feature that hinders desired execution outcome labels Apr 12, 2024
@markafarrell markafarrell linked a pull request Apr 12, 2024 that will close this issue
@moreati
Copy link
Member

moreati commented Apr 21, 2024

I'm attempting to reproduce this. Step 4 of https://github.com/markafarrell/mitogen-repro-issue-1061 doesn't leave a running container. Instead it immediately exits.

alex@ubuntu2004:~/mitogen-repro-issue-1061$ docker run -dt --name target-server \
    -v /sys/fs/cgroup:/sys/fs/cgroup:ro \
    --privileged \
    --rm \
    geerlingguy/docker-debian12-ansible:latest;

964532f2b017d53a6292b476e5e463e5157f8520db7e0a6ca6e4d3d3176885ee
alex@ubuntu2004:~/mitogen-repro-issue-1061$ docker ps
CONTAINER ID   IMAGE     COMMAND   CREATED   STATUS    PORTS     NAMES
alex@ubuntu2004:~/mitogen-repro-issue-1061$ docker --version
Docker version 24.0.5, build 24.0.5-0ubuntu1~22.04.1
alex@ubuntu2004:~/mitogen-repro-issue-1061$ uname -a
Linux ubuntu2004 5.15.0-105-generic #115-Ubuntu SMP Mon Apr 15 09:52:04 UTC 2024 aarch64 aarch64 aarch64 GNU/Linux

@markafarrell
Copy link
Author

I'm guessing that you are using aarch64 is probably the issue.

image

There is an arm64 version of that image so it should work.

Do you get anything from:

docker logs target-server

@moreati
Copy link
Member

moreati commented Apr 22, 2024

alex@ubuntu2004:~/mitogen-repro-issue-1061$ docker rm target-server 
target-server
alex@ubuntu2004:~/mitogen-repro-issue-1061$ docker run -dt --name target-server -v /sys/fs/cgroup:/sys/fs/cgroup:ro --privileged geerlingguy/docker-debian12-ansible:latest;
dea854a953ce1386fcf0ca7b5a28065b5749c982dab711e98fb7210f5968ba39
alex@ubuntu2004:~/mitogen-repro-issue-1061$ docker logs target-server
systemd 252.22-1~deb12u1 running in system mode (+PAM +AUDIT +SELINUX +APPARMOR +IMA +SMACK +SECCOMP +GCRYPT -GNUTLS +OPENSSL +ACL +BLKID +CURL +ELFUTILS +FIDO2 +IDN2 -IDN +IPTC +KMOD +LIBCRYPTSETUP +LIBFDISK +PCRE2 -PWQUALITY +P11KIT +QRENCODE +TPM2 +BZIP2 +LZ4 +XZ +ZLIB +ZSTD -BPF_FRAMEWORK -XKBCOMMON +UTMP +SYSVINIT default-hierarchy=unified)
Detected virtualization docker.
Detected architecture arm64.

Welcome to Debian GNU/Linux 12 (bookworm)!

Failed to create /init.scope control group: Read-only file system
Failed to allocate manager object: Read-only file system
[!!!!!!] Failed to allocate manager object.
Exiting PID 1...

@markafarrell
Copy link
Author

Can you try adding --cgroupns=host and change the mount to be rw?

https://serverfault.com/questions/1053187/systemd-fails-to-run-in-a-docker-container-when-using-cgroupv2-cgroupns-priva

@moreati
Copy link
Member

moreati commented Apr 22, 2024

That did it, and I see the _os.mkdir(file, 0o700) error. Which leads to the next questions

  1. Why don't the unit and integration tests see this? Which extra ingredient(s) matter - Debian 12? systemd? Something Jeff Geerling added?
  2. Can we reproduce it with the existing Mitogen CI images and/or the localhost test?

@markafarrell
Copy link
Author

markafarrell commented Apr 29, 2024

  1. Why don't the unit and integration tests see this? Which extra ingredient(s) matter - Debian 12? systemd? Something Jeff Geerling added?

So I think this will happen regardless of OS, systemd etc. The issue is that https://github.com/mitogen-hq/mitogen/blob/master/ansible_mitogen/runner.py#L361 we are essentially doing

mkdir {{ ansible_remote_tmp }}/ansible_mitogen_runner_{{ random stuff }}/

If ansible_remote_tmp doesn't exist this fails.

The existence of this (ansible_remote_tmp) is only checked once, just after we connect to the target, so if it is removed after the connection happens then we see this failure.

2. Can we reproduce it with the existing Mitogen CI images and/or the localhost test?

It should be very easy to reproduce for both localhost and any other image by using a playbook similar to what i have in my reproduction repo. If you can point me to where the test should live i can quickly create one.

@moreati
Copy link
Member

moreati commented Apr 29, 2024

There are unit tests that mention is_good_temp() in

class FindGoodTempDirTest(testlib.TestCase):
.
Integration tests should probably be added amongst https://github.com/mitogen-hq/mitogen/blob/bb9c51b3e9cc39fceddd55578bb89680fa4e1acc/tests/ansible/integration/runner/all.yml.

For running tests I'm relying on the Azure CI, and (force) pushing changes. We can squash any interim/WIP commits afterwards.

@moreati
Copy link
Member

moreati commented May 19, 2024

  1. Why don't the unit and integration tests see this? Which extra ingredient(s) matter - Debian 12? systemd? Something Jeff Geerling added?

A factor I previously missed: the repro playbook in https://github.com/markafarrell/mitogen-repro-issue-1061/blob/262591aecadb3ae255c904de17617519f8389673/playbook.yml is explicitly deleting $ANSIBLE_REMOTE_TMP, it's not systemd or similar doing it behind the scenes. There's much less mystery here than I thought, if any.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
affects-0.3 Issues related to 0.3.X Mitogen releases bug Code feature that hinders desired execution outcome
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants