Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Less anxiety-inducing return values #771

Merged
merged 2 commits into from
Sep 30, 2024
Merged

Conversation

hawkw
Copy link
Member

@hawkw hawkw commented Sep 30, 2024

Currently, the Vm::get() and Vm::state_watcher() methods return a
Result<{something}, VmError>. This means that they can return any
VmError variant, but the current implementation of these methods will
only ever return VmError::NotCreated.

This gave me a bit of anxiety when I noticed that we were presently
handling the VmError::WaitingForInit variant identically to
NotCreated, which seemed wrong as it implied that we could,
potentially, return an error code indicating that no instance has ever
been ensured when in fact, one has been ensured but is still being
initialized. I freaked out about this a bit here:
oxidecomputer/omicron#6726 (comment)

It turns out the current behavior is actually fine, since if the VM is
still initializing, we still allow accessing it and just return a
Starting state, which is correct. But the type signatures of these
functions allow them to return any of these errors, and forces their
callers to handle those errors, and that's the part we were doing
somewhat incorrectly (although, again, in practice it doesn't matter).

Commit 81de422 changes those functions to return an Option rather than
a Result. Now, we return None if no VM has ever been created, making
that the only error case that callers have to handle. There's no
longer any risk of the HTTP API handlers accidentally returning a "VM
never ensured" error when the VM is ensured-but-initializing. Also, we
can remove the NotCreated variant, since it's now no longer returned
anywhere.

Also, I noticed that the PUT /instance/state API route would return a
NoInstance error when trying to change the state returned a
VmError::WaitingToInitialize, which seemed potentially bad: this would
result in a sled-agent that tries to send a state change request to a
still-initializing VM to believe it's Permanently Gone, and mark it as
Failed, tear down the zone, and so on. Which seems rude of it!

I don't think this is likely to be a problem in practice since, IIRC,
both sled-agent and Nexus will not try to send state change requests to
instances that they understand to be still initializing, but it seemed
good to not return the INSTANCE IS PERMANENTLY GONE error code here.
Therefore, commit ce6bb51 changes this to return a 503, so the
sled-agent will just know it needs to wait for a bit.

Currently, the `Vm::get()` and `Vm::state_watcher()` methods return a
`Result<{something}, VmError>`. This means that they can return _any_
`VmError` variant, but the current implementation of these methods will
only ever return `VmError::NotCreated`.

This gave me a bit of anxiety when I noticed that we were presently
handling the `VmError::WaitingForInit` variant identically to
`NotCreated`, which seemed wrong as it implied that we could,
potentially, return an error code indicating that no instance has ever
been ensured when in fact, one _has_ been ensured but is still being
initialized. I freaked out about this a bit here:
oxidecomputer/omicron#6726 (comment)

It turns out the current behavior is actually fine, since if the VM is
still initializing, we still allow accessing it and just return a
`Starting` state, which is correct. But the type signatures of these
functions allow them to return any of these errors, and forces their
callers to handle those errors, and that's the part we were doing
somewhat incorrectly (although, again, in practice it doesn't matter).

This commit changes those functions to return an `Option` rather than a
`Result`. Now, we return `None` if no VM has ever been created, making
that the _only_ error case that callers have to handle. There's no
longer any risk of the HTTP API handlers accidentally returning a "VM
never ensured" error when the VM is ensured-but-initializing. Also, we
can remove the `NotCreated` variant, since it's now no longer returned
anywhere.
Also, I noticed that the `PUT /instance/state` API route would return a
`NoInstance` error when trying to change the state returned a
`VmError::WaitingToInitialize`, which seemed potentially bad: this would
result in a sled-agent that tries to send a state change request to a
still-initializing VM to believe it's Permanently Gone, and mark it as
`Failed`, tear down the zone, and so on. Which seems rude of it!

I don't think this is likely to be a problem in practice since IIRC both
sled-agent and Nexus will not try to send state change requests to
instances that they understand to be still initializing, but it seemed
good to not return the INSTANCE IS PERMANENTLY GONE error code here.
Now, we return a 503, so the sled-agent will just know it needs to wait
for a bit.
@hawkw hawkw merged commit 7e2a3ca into master Sep 30, 2024
11 checks passed
@hawkw hawkw deleted the eliza/less-scary-state-watcher branch September 30, 2024 21:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants