Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

lib: log vCPU diagnostics on triple fault and for some unhandled exit types #795

Open
wants to merge 10 commits into
base: master
Choose a base branch
from

Conversation

gjcolombo
Copy link
Contributor

Add a propolis::vcpu::Diagnostics type that captures and pretty-prints the register state of a vCPU. Log this state if a vCPU triple faults or (in propolis-server) if it raises a Paging or InstEmul unhandled exit, which are the failure modes seen in #300, #333, #340, and #755.

To avoid logging guest data outside of development environments, the Diagnostics type only captures data if the dump-guest-state feature is enabled. Enable this feature by default for propolis-server and -standalone, then modify the Omicron image build job to disable it in Omicron zone images for now (so that the feature will be disabled for customer VMs). In the fullness of time we may want a more flexible way to configure this option.

Finally, hand-implement Debug for the InstEmul exit context to redact this exit's instruction bytes if dump-guest-state is disabled.

This helps with #335. In a subsequent change I'd like to add the memory region near guest %rip to the diagnostics, but this requires some additional plumbing to translate from a GVA to a GPA before trying to read memory (there's an ioctl for this already; just needs some track-laying in Propolis).

Tests: added a diagnostics log line to the "reset" suspend exit so that I could force diagnostics to be captured on demand via bhyvectl, then checked both that they're logged with a binary built in the default configuration and elided in a binary built with the Omicron configuration.

Comment on lines 218 to 232
VmExitKind::Bogus => {
error!(log,
"lib returned bogus exit from vCPU {}",
vcpu.id);

event_handler.unhandled_vm_exit(vcpu.id, exit.kind);
VmEntry::Run
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bogus is not an unhandled exit

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True, it's unconditionally handled in the lib. It previously fell into the _ match arm, which called unhandled_vm_exit, but as long the lib is always supposed to handle this exit kind it's probably better just to panic directly here (since this is really more of an assertion failure than a truly unhandled exit).

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would use unreachable!, then. I think handling like this is misleading to the reader.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in fba85be. I also tried to describe what the bogus exit is for in a comment. LMK if it's not accurate; I'm going mostly off what I could infer from reading around where VM_EXITCODE_BOGUS is used in bhyve.

Comment on lines 104 to 108
#[cfg(not(feature = "dump-guest-state"))]
let inst_data = "<dump-guest-state disabled>";

#[cfg(feature = "dump-guest-state")]
let inst_data = &self.inst_data;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rather than gating this output on a feature, perhaps just censor it on the propolis-server side, depending on the configuration there?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I considered that but didn't like the ergonomics of it; it would mean that everyone in server who might want to debug-print a VmExitKind needs to remember to check what variant it is first to avoid printing an InstEmul. We could force that to happen by declining to derive Debug for VmExitKind, but that would put a big match into functions like unhandled_vm_exit to deal with the one variant that doesn't have a Debug impl, which I also don't love the look of. (I don't think there are any other call sites that would be affected, fortunately.)

@gjcolombo
Copy link
Contributor Author

The clippy failure is a new suggestion from 1.82. I'll fix that separately.

Comment on lines 31 to 40
# This job produces the propolis-server zone image that the Omicron repository
# imports to create full-blown Oxide software packages. Disable all default
# features and opt into the "omicron-build" feature instead to build libraries
# in their "production" configurations.
#
# The 'release' profile is configured for abort-on-panic, so we get an
# immediate coredump rather than unwinding in the case of an error.
ptime -m cargo build --release --verbose -p propolis-server --features omicron-build
ptime -m cargo build --release --verbose -p propolis-server \
--no-default-features \
--features omicron-build
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it took me a second to connect the dots but from #755 and elsewhere we're clearly still collecting cores from a crashed Propolis - all we've done by disabling the logging is prevented ourselves from collecting more timely information (that in the case of #755 helped us tell the guest had gone off to space)

i think a consistent posture here would be either:

A: Propolis stderr/stdout is as sensitive as guest memory; do not collect, if you must collect then keep secret, if you must share, then do so carefully
B: keep Propolis stderr/stdout clean of user data, and when operating Propolis, do not retain user data (cores that could include pages of user data)

i'm inclined towards A on the grounds that if you're looking at or copying a Propolis log around, you probably also can access a core from that same process. and it means we're actually helping a situation like #755 with this change.

i still think the feature is useful, just that we can leave default features enabled in this job. or if we should consider ourselves responsible for keeping Propolis lib/standalone/server's logs free of potentially-sensitive bits, it'd be worth a note to ourselves that we should audit other logs for anything untoward - migration state comes to mind.

i don't want to be excessively black/white on this, but i'm trying to make sure there's a clear line for what is and isn't OK for Propolis to say. what do you think?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I keep finding myself drawn to approach B. At some point we're going to have a situation where we need to debug a Propolis issue using only information from operator-provided support bundles. That's going to be tricky if operators feel they can't ever include Propolis logs in those bundles because Propolis is too cavalier about logging guest data.

i'm inclined towards A on the grounds that if you're looking at or copying a Propolis log around, you probably also can access a core from that same process.

I would put it a bit differently: if you can extract a Propolis log from a sled, you can probably also extract core dumps from that sled. Because Oxide personnel are regularly in this position today (because we access deployed racks over the technician port in order to update/service them), I agree that redacting diagnostic data in logs currently has very little practical upside, at least from the point of view of an operator who wants to limit support personnel access to guest VM state. The long term goal, of course, is to wean ourselves off of the tech port via automated update, support bundles, and the like, and once we've done that I think redaction will be more obviously useful.

#755 is the only case I know of where we've seen one of the diagnostic-generating faults in a production Omicron deployment. All the other cases I know of have been in development environments (ad hoc Propolis server/standalone binaries or a4x2 deployments). So my lean was (and still is, I think) toward disabling this by default in Omicron images, though I think it's a somewhat close call (made even closer by the fact that #755 happened on an Oxide-managed rack and not a customer-operated rack).

With all that said: WWYT of taking some steps now to make this into a runtime-configurable setting? I think that the ideal world is one @pfmooney laid out in chat in yesterday's HV huddle: if this behavior can change dynamically, and Nexus knows how to control it, then an operator who can reproduce a problem in a "clean" VM can enable extra data collection for just that instance (or its project/silo/what-have-you) and provide us with the resulting "full-fat" logs without worry. In the short run (i.e. before Omicron learns about the new API), and in a dire pinch, support personnel accessing a rack through the tech port can access the API, enabling a "user starts clean VM -> support person sets option -> user repros problem" approach if we desperately needed it. I think I can adjust this PR to move in that direction pretty easily (and I suspect it might improve its ergonomics in the process); will probably go ahead and give that a try, at least.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wean ourselves off of the tech port via automated update, support bundles, and the like, and once we've done that I think redaction will be more obviously useful.

ah, yeah, in that context i see what you mean. i was over-indexing on today's diagnostics!

definitely a fan of making it runtime-configurable, then, but i'm even more glad to have piped up on RFD 496 - seems like we need a more nuanced treatment of the relationship between cores and support bundles.

@gjcolombo
Copy link
Contributor Author

I'm going to rebase to pick up #794 and #796 before trying to tweak this PR further (#796 to silence the clippy warning, #794 because I have some Ideas that I think would otherwise conflict with it).

@gjcolombo gjcolombo force-pushed the gjcolombo/triple-fault-diagnostics branch from fba85be to e4f891f Compare October 18, 2024 23:33
@@ -11,13 +11,25 @@ use crate::vmm::SubMapping;

/// Controls whether items wrapped in a [`GuestData`] are displayed or redacted
/// when the wrappers are printed via their `Display` or `Debug` impls.
#[no_mangle]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(definitely worth a note like // The #[no_mangle] here is not for internal correctness, but to ensure that two cohabitant lib-propolis versions produce an error rather than ambiguously work)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants