Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

adds wait_for_shutoff to resource_libvirt_domain #1066

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

marshallford
Copy link

@marshallford marshallford commented Jan 23, 2024

Open to feedback, there is likely a better way to implement this. I took a first swing at the docs, but I wasn't sure where to start on testing.

Tested with Terraform version 1.7.0.

# kvm host used for testing
ubuntu@test:~$ uname -a
Linux test 5.15.0-88-generic #98-Ubuntu SMP Mon Oct 2 15:18:56 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
ubuntu@test:~$ kvm --version
QEMU emulator version 6.2.0 (Debian 1:6.2+dfsg-2ubuntu6.16)
Copyright (c) 2003-2021 Fabrice Bellard and the QEMU Project developers
# /etc/systemd/system/test-wait.service in guest VM to validate delayed shut off.
[Unit]
Description=...

[Service]
Type=oneshot
RemainAfterExit=true
ExecStop=/usr/bin/sleep 90

[Install]
WantedBy=multi-user.target

Fixes: #356, #1060

@dmacvicar
Copy link
Owner

While I acknowledge the problem, I am not sure we are fixing it in the right place.

  • How do other cloud providers behave here? do we have some similar setting to model against?
  • Given that we already track "running" as state, wouldn't make more sense to implement graceful shutdown there?
  • If we implement it in Destroy, wouldn't make more sense to always do it graceful and just have a timeout for destroy? (those are customizable, so one could have a 0 timeout)

@Silvenga
Copy link

How do other cloud providers behave here?

I don't think cloud providers ultimately need to destroy the resource to update it e.g. scaling up typically requires just rebooting the node. This is mostly a TF'isum, where it's just easier to think of the resources as immutable.

To my knowledge, the cloud providers that I'm aware of don't do graceful shutdowns on destroying (Digital Ocean and Azure).

If we implement it in Destroy, wouldn't make more sense to always do it graceful and just have a timeout for destroy? (those are customizable, so one could have a 0 timeout)

IMO, this ultimately should be opt-in - as there are a lot of cases that would prevent libvirt from communicating with the guest for a graceful shutdown. Some OS's don't even respond to graceful shutdowns, one being at least some versions of VyOS (a router OS, "common" in public clouds). I also don't believe the agent is installed by default in at least the slim version of Ubuntu Server.

@scabala
Copy link
Contributor

scabala commented Sep 24, 2024

I really like this comment on #356 - there should be just a timeout variable that controls the logic. If it is greater than 0, do graceful shutdown and force shutdown when it expires. If it is 0, do forceful shutdown as currently it is done.

@dmacvicar
Copy link
Owner

Yes, it is a good solution. I hope people don't confuse it with the destroy timeout that terraform provides as a setting (which is the timeout of the destroy function itself).

@tiaden
Copy link
Contributor

tiaden commented Oct 20, 2024

Here is a link to an implementation of this issue that I worked on some time ago, which may provide an alternative perspective:
GitHub link.

I have successfully used this solution in my projects without encountering any problems. It leverages the "running" attribute to shut down the virtual machine, utilizing the graceful default shutdown method or the QEMU agent (if specified), and, as a last resort, forcibly powers off the machine if the default Terraform method timeout is reached. If you'd like to test it quickly, you can use the associated Terraform provider available here: Terraform Provider.

I hope this helps in some way

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Destroying a domain should try to power it off first
5 participants