-
Notifications
You must be signed in to change notification settings - Fork 657
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[lxd] Add 60 seconds timeout for state operations #1821
base: main
Are you sure you want to change the base?
Conversation
If no timeout is set, LXD uses a hardcoded 30 second timeout when waiting on operations to complete and if the wait timeout occurs, it can lead to incorrect behavior in the LXD backend. Fixes #1777
Codecov Report
@@ Coverage Diff @@
## master #1821 +/- ##
=======================================
Coverage 76.95% 76.95%
=======================================
Files 229 229
Lines 8512 8512
=======================================
Hits 6550 6550
Misses 1962 1962
Continue to review full report at Codecov.
|
The problem I see with this is that 60s just just as arbitrary as 30s… and the "Operation cancelled" error is somewhat devoid of detail (I know…). Can we try and convert a "Operation cancelled" to, say, "Likely timed out"? Is there no detail about why it was cancelled available at all? |
Well, as mentioned, without putting a timeout at all here, LXD has a hard coded timeout of 30 seconds, which is what is tripping us up on shutdown. It seems LXD has some sort of race between the hard coded operation timeout and the actual timeout of trying to shut down an instance. I've looked through the LXD code in this area and I really don't understand why they did what they did. That saidt, setting an explicit timeout for the state operation puts the onus on the state change, not the operation wait part, so it gets around this issue. I can easily make it 30 seconds since that is the arbitrary wait we have in other backends, but I think allowing more time for an instance to shut down is better. In fact, we've had complaints in the past about trying to cleanly shut down busy instances and 30 seconds is not enough time. Regarding |
OK, so I was missing that context, that this is different to the internal LXD timeout. Should we explain in a comment? |
Sure, I can add a comment, but it'll be verbose in order to explain why this was added. But really, adding an explicit timeout here puts us in control of the timeout and should be used regardless of working around LXD idiosyncrasies. |
In some testing to check on some behaviors, I found some issues with this, so will continue working on it. |
When I try to delete a multipass lxd instance it will quite often result in a timeout: $ multipass delete k8-devnode-2
[2022-09-22T09:30:58.018] [error] [lxd request] Timeout getting response for GET operation on unix://multipass/var/snap/lxd/common/lxd/[email protected]/operations/36a26883-b3ec-4f5a-bd00-325cd5dfc150/wait?project=multipass
[2022-09-22T09:30:58.018] [error] [lxd request] Timeout getting response for GET operation on unix://multipass/var/snap/lxd/common/lxd/[email protected]/operations/36a26883-b3ec-4f5a-bd00-325cd5dfc150/wait?project=multipass
delete failed: Timeout getting response for GET operation on unix://multipass/var/snap/lxd/common/lxd/[email protected]/operations/36a26883-b3ec-4f5a-bd00-325cd5dfc150/wait?project=multipass As for the VM it only got stopped (not deleted): $ multipass list
k8-devnode-2 Stopped -- Ubuntu 20.04 LTS Would this pull request fix this @townsend2010 or should I file a separate issue? $ multipass version
multipass 1.10.1
multipassd 1.10.1 $ multipass get local.driver
lxd $ lxd version
5.5 I also expose multipassd to the network and set the environment for this: echo $MULTIPASS_SERVER_ADDRESS
multipass.intra:51005 |
If no timeout is set, LXD uses a hardcoded 30 second timeout when waiting on
operations to complete and if the wait timeout occurs, it can lead to incorrect
behavior in the LXD backend.
Fixes #1777