Description
When running a guest, there are certain conditions which may cause one or more of the vCPUs to exit from VM context with an event we are unable to properly handle. Notable examples of this include #333 and #300, where the guests appear to have jumped off into "space", leaving the instruction fetch/decode/emulate machinery unable to do its job.
The course of action we choose in such situations can have certain trade-offs. The current behavior of abort
ing the propolis process has the advantage of attempting to preserve the maximum state of both the userspace emulation (saved in the core) as well as the kernel VMM and instance state (residing in the vmm device instance, as long as it is not removed). This may be beneficial for developer applications, but for running hosts in production, it is likely less than ideal.
Consumers in production likely expect a VM encountering an error condition like that to reboot, as if it had tripped over something like a triple-fault on a CPU. Rebooting the instance promptly at least allows it to return to service quickly. In such cases, we need to think about what bits of state we would want preserved from the machine and the fault conditions so it can be used for debugging later. In addition to the details about the vm-exit on the faulting vCPU(s), we could export the entire emulated device state (not counting DRAM) as if a migration were occurring. Customer policy could potentially choose to prune that down, or even augment it with additional state from the guest (perhaps the page of memory underlying %rip
at the time of exit?)
With such a mechanism in place, we could still preserve the abort-on-unhandled-vmexit behavior if it is desired by developer workflows, but default to the more graceful mechanism for all other cases.