Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Shutdown VMs before destroying them #212

Closed
wants to merge 2 commits into from

Conversation

4censord
Copy link
Contributor

@4censord 4censord commented Oct 6, 2022

Instead of simply terminating the vm without any warning, this tries to shut down the vm first.
This allows the VM to terminate its work properly.

Because of how client.HaltVm() is implemented, this has a not configurable 2 minutes timeout
See: https://github.com/terra-farm/terraform-provider-xenorchestra/blob/master/client/vm.go#L410

@userbradley
Copy link

Just trying to understand why you would want to shutdown an instance before nuking it? You're deleting them so why worry?

@4censord
Copy link
Contributor Author

4censord commented Oct 26, 2022

In my case I am running database clusters.
For upgrades, I recreate VMS one by one via terraform. If I simply delete VMs, the cluster has to perform "recovery" steps to regain its quorum.
This for example stops all DB writes for some time.
If instead the vm shuts down cleanly, and then gets replaced the cluster doesn't go into a degraded state.

@userbradley
Copy link

Understood! Completely forgot that was a use case!

TIL

vmReq := client.Vm{
Id: d.Id(),
}
err := c.HaltVm(vmReq)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have you tested what will happen if a VM is already shutdown? I'm thinking about how this would behave if the resourceVmDelete were to partially fail or timeout in the halting step.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Testing this in my environment with the xo-cli shows that this will throw an error:

ddelnano@ddelnano-desktop:~/go/src/github.com/ddelnano/terraform-provider-xenorchestra$ xo-cli vm.stop id=affc9a76-c03f-435f-96e5-962ebfacd0a8
✖ VM state is halted but should be running
JsonRpcError: VM state is halted but should be running
    at Peer._callee$ (/usr/lib/node_modules/xo-cli/node_modules/json-rpc-peer/dist/index.js:212:17)
    at tryCatch (/usr/lib/node_modules/xo-cli/node_modules/babel-runtime/node_modules/regenerator-runtime/runtime.js:62:40)
    at Generator.invoke [as _invoke] (/usr/lib/node_modules/xo-cli/node_modules/babel-runtime/node_modules/regenerator-runtime/runtime.js:296:22)
    at prototype.<computed> [as next] (/usr/lib/node_modules/xo-cli/node_modules/babel-runtime/node_modules/regenerator-runtime/runtime.js:114:21)
    at step (/usr/lib/node_modules/xo-cli/node_modules/babel-runtime/helpers/asyncToGenerator.js:17:30)
    at /usr/lib/node_modules/xo-cli/node_modules/babel-runtime/helpers/asyncToGenerator.js:35:14
    at new Promise (<anonymous>)
    at new F (/usr/lib/node_modules/xo-cli/node_modules/core-js/library/modules/_export.js:36:28)
    at Peer.<anonymous> (/usr/lib/node_modules/xo-cli/node_modules/babel-runtime/helpers/asyncToGenerator.js:14:12)
    at Peer.exec (/usr/lib/node_modules/xo-cli/node_modules/json-rpc-peer/dist/index.js:256:21)
    at Peer.write (/usr/lib/node_modules/xo-cli/node_modules/json-rpc-peer/dist/index.js:367:12)
    at Xo.<anonymous> (/usr/lib/node_modules/xo-cli/node_modules/jsonrpc-websocket-client/dist/index.js:92:12)
    at Xo.emit (node:events:513:28)
    at WebSocket.<anonymous> (/usr/lib/node_modules/xo-cli/node_modules/jsonrpc-websocket-client/dist/websocket-client.js:232:20)
    at WebSocket.onMessage (/usr/lib/node_modules/xo-cli/node_modules/ws/lib/EventTarget.js:99:16)
    at WebSocket.emit (node:events:513:28)

I think we should handle this case in the event it is partially applied. We could fetch the VM first and then only call HaltVm if it is running.

Ideally we'd have a test case for this situation as well. I'm happy to write that test or provide details on how that could be done. Please let me know what you'd prefer.

In the meantime, I'm running the test suite to validate that the change in its current form is working.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have updated it so we only shutdown running vms.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This also means that Suspended and Paused vm will still be terminated without a propper shutdown or warning.

I don't see a good method to fixing that tho, as "resuming a paused vm on destroy" might not be the expected action.

Copy link
Collaborator

@ddelnano ddelnano Dec 9, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This also means that Suspended and Paused vm will still be terminated without a propper shutdown or warning.

This should only trigger during a terraform destroy or when applying a plan that removed the VM. So terraform should make it fairly obvious that it's going to perform a destructive action. My original concern is allowing a terraform destroy or terraform apply that can never converge because it stops the VM but can't proceed with the termination.

What specific scenarios are you worried about that would be unexpected?

@ddelnano
Copy link
Collaborator

ddelnano commented Dec 9, 2022

Thanks for the contribution and agree that for the situation you mentioned would benefit from a graceful deletion. Left a comment about handling a potential error case and providing test coverage for it.

This allows the VM to terminate its work properly.
@4censord 4censord force-pushed the shutdown_vm_before_destroy branch from 92c59d3 to c15010d Compare December 9, 2022 12:12
As discussed in vatesfr#212

Trying to stop a VM that is not running results in an error, so we only 
do shutdowns for running VMs now

Possible power-states are "Running, Halted, Paused, Suspended"
@4censord
Copy link
Contributor Author

4censord commented Dec 9, 2022

Ideally we'd have a test case for this situation as well. I'm happy to write that test or provide details on how that could be done. Please let me know what you'd prefer.

I am fine with either.
I hadn't yet thought about how to test this, so i would just take the brute force type approach of creating 4 VMs, setting them to the different power states and having them be destroyed.

If this is fine, I could add that

@ddelnano
Copy link
Collaborator

ddelnano commented Dec 9, 2022

I don't think we need to test all 4 states. I think testing two use cases is fine: running VM is terminated and stopped VM is terminated. The former will already have coverage since the terraform test runner performs a destroy when the test is done.

In order to test the stopped case, we can stop the VM mid test case by using the PreConfig setting. Please see the TestAccXenorchestraVm_attachDisconnectedDisk test for an example of this (source). Then we can finish the test with a resource.TestCase with the Destroy field set to true.

ddelnano pushed a commit that referenced this pull request Dec 14, 2022
As discussed in #212

Trying to stop a VM that is not running results in an error, so we only
do shutdowns for running VMs now

Possible power-states are "Running, Halted, Paused, Suspended"
@ddelnano
Copy link
Collaborator

@4censord I should have a PR open soon that includes the extra testing. #220 created complications along the way so I am addressing that first.

ddelnano pushed a commit that referenced this pull request Dec 14, 2022
As discussed in #212

Trying to stop a VM that is not running results in an error, so we only
do shutdowns for running VMs now

Possible power-states are "Running, Halted, Paused, Suspended"
@ddelnano
Copy link
Collaborator

I'm going to close this in favor of #222. Adding the acceptance testing was more difficult than I anticipated (see new PR for details).

I preserved your commit history as part of my changes.

@ddelnano ddelnano closed this Dec 14, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants