-
Notifications
You must be signed in to change notification settings - Fork 524
vmm/api: document and streamline process of externally passed FDs for virtio-net devices #7281
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
vmm/api: document and streamline process of externally passed FDs for virtio-net devices #7281
Conversation
bb90e70
to
cd93f50
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel like I've already reviewed this code before, and written similar comments? This is getting very confusing.
I apologize for any confusion, @alyssais . Your review-comments are still available, splitt across #7150 and Context: I'm a strong believer in many small and reviewable PRs. The original PR (#7150), even the prerequisites split-out ( |
b1fd6ae
to
022cf64
Compare
Yes, I agree! The only reason for the confusion is that it started out as a big PR and was then split up, rather than being smaller PRs from the start, but hindsight is 20/20. :) |
@alyssais What did you think of this PR in the end? Are you happy with it? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some comments below. Also, the last commit is a refactoring and not directly related to the previous ones. let'd do it in a separate PR.
022cf64
to
f81ce4e
Compare
I see, and I agree. Before I split it out, ...
I'd like to wait for an approval by @alyssais before things get more confusing again by more PRs :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like the trait approach a lot better. Thank you!
6d7bff6
to
5a87fdf
Compare
cc595da
to
7746fe2
Compare
7746fe2
to
d1d07fd
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Really excellent work. I'm happy, although I spotted a couple of last minute formatting things in the doc comments.
vmm/src/api/http/http_endpoint.rs
Outdated
//! devices **backed by network FDs** to enable live-migration, state save/ | ||
//! resume (restore), and similar VM lifecycle events. | ||
//! Some of the HTTP handlers here implement special logic for devices | ||
//! **backed by network FDs** to enable live-migration, state save/resume (restore), and similar VM lifecycle events. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like wrapping missing here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed, good catch!
/// Helper module for attaching externally opened FDs to config objects. | ||
/// | ||
/// # Difference between [`ConfigWithFDs`] and [`ConfigWithVariableFDs`] | ||
/// The base trait [`ConfigWithFDs`] type must be implemented by all config |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Usually I'd expect a blank line after a heading.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
Just a drive-by comment: for memory zones, FDs can be passed to the VMM on snapshot restore for pmem/file-backed memory; we have a need to do the same (pass memory FDs) on VM creation (more details in the upcoming PR), so I expect to re-use the mechanism established in this PR and related. Thanks for this! |
d1d07fd
to
74e0fcb
Compare
-- @likebreath I see your point, thanks for clarifying! At the same time, since this PR has already gathered quite a bit of feedback, I’m a little worried that splitting it up now might create more confusion. Would it make sense to make an exception in this case and merge it as is? Or do you have a preference for how you’d like to proceed? I’m happy to adapt and work towards whatever approach works best for everyone. :) |
restored_net.fds = Some(fds[start_idx..end_idx].to_vec()); | ||
start_idx = end_idx; | ||
} | ||
if let Some(cfgs) = restore_cfg.net_fds.as_mut() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have a question here. The whole purpose of this PR seems to be to generalize "configs with FDs"; but here it seems that only Net FDs are meaningfully accepted with VmRestore (based on restore_cfg.net_fds staying 'net_fds'). As we are actually working on sending memory FDs with VmCreate and VmRestore API calls, I feel maybe this "net_fds" should be refactored somehow?
Or is the plan to just rename the field into something like "attached_fds" and let vm_restore() in vmm/src/lib.rs assign FDs to configs (MemoryConfig, NetConfig) based on their position in the vector?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fair question! restore_cfg
is here of type RestoredNetConfig
. In Cloud Hypervisor, every device has it's own restore config type. So in future, we can also have this for block devices for example. The config struct itself must know how many FDs it wants to own and the management layer must then send the appropriate FDs along with the request.
I however do see currently the problem that we need a form of multiplexing of FDs in the future, otherwise, one does not know if the management layer starts by sending let's say FDs for virtio-net or virtio-blk devices.
This however needs a larger refactoring in a next-step, which I see as the natural follow-up of these pre-requisites.
Sure. I will try to get to it tomorrow (before I go disconnected for two weeks). Smaller PRs are more agile and likely move faster. Let's make sure we breakdown PRs like this early in the future. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The first three commits are looking good to me. I am not sure about the last commit - the abstraction around FD handlings. The current diff shows more code (even excluding comments and tests) while I am yet to see how such abstraction can make new support easier and cleaner, say for restore FDs for virtio-blk and memory regions. So I think it will need extra review and perhaps discussions.
I don't want to block this PR while I am away in the coming two weeks. So dropping my "Requested Changes".
I just posted a draft PR #7369 that is related to this one. I think we should discuss how the two PRs could be combined. |
Fair concern! Let me elaborate. I also updated the PR description. The motivation of the new abstraction is to provide a verbose, solid,
is not trivial. These factors justify encapsulating the complexity In addition, it allows us to perform unit testing. Further, We get rid of Finally, while this approach may initially result in more code, every new
It will be a simple |
Live migration, state save/resume, and hotplug are not trivial when it comes to virtio-net devices backed by externally provided FDs. As the mechanism behind it can be considered as quite "multi-step magic" even for experienced programmers, it makes sense to thoroughly document this to ease debugging and to improve the mental model of developers working on this in the future. Signed-off-by: Philipp Schuster <[email protected]> On-behalf-of: SAP [email protected]
(1) The old messages are missing the "why" part. With this change, users of Cloud Hypervisor have somehow more context and people looking at the code perfectly know what's going on. (2) Using warn! implies that the user should take action, but in this case, there’s nothing the user can do. If the API is used correctly and file descriptors are passed via an SCM_RIGHTS message over a UNIX domain socket, everything works as intended. In that case, there's no need to issue a warning — debug! is sufficient. Signed-off-by: Philipp Schuster <[email protected]> On-behalf-of: SAP [email protected]
Deserializing values as `-1` makes sense to prevent errors, so let's keep it. However, serializing them differently adds confusion. For example, a `ch-remote info` call should not report `-1` but the actual FDs. Signed-off-by: Philipp Schuster <[email protected]> On-behalf-of: SAP [email protected]
808cc0f
to
ba691aa
Compare
## TL;DR This unifies error handling, implementation, and logging of config objects that are populated with additional FDs received as part of the HTTP API request from the management software. This prevents current and future code in repetition. In the future, if we want to support let's say virtio-blk with external FDs, the AddBlk API handler simply needs a call to `attach_fds_to_cfg()` and gets all the magic for free. ## Motivation The motivation of the new abstraction is to provide a verbose, solid, and bulletproof solution for a complex domain. The interaction between - the management layer, - the passing of file descriptors over UNIX domain sockets via SCM_RIGHTS, - the attachment of configurations to those FDs, - and the ability to give new developers clear insights into what happens under the hood is not trivial. These factors justify encapsulating the complexity behind a convenient and well-documented abstraction, making the system both robust in production and approachable for new developers. In addition, it allows us to perform unit testing. Further, We get rid of existing partial code duplication and inconsistencies. Finally, while this approach may initially result in more code, every new handler that accepts FDs benefits from reduced duplication and a correct implementation by relying on the shared abstraction. This will also enable future functionality, such as virtio-blk devices backed by FDs which can then be integrated with ease. Signed-off-by: Philipp Schuster <[email protected]> On-behalf-of: SAP [email protected]
ba691aa
to
8d7c647
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you - this is nice improvement. I agree we should consider how to generalise for storage too (or other items where we might have FDs.)
452424b
Part of #7291 to work towards live-migration with FDs for virtio-net devices.
About
Part 1:
Document and streamline the current handling of externally provided FDs for virtio-net devices.
Part 2:
Regarding the introduced abstraction to attach FDs to config objects.
This version might be outdated. The most current form is in the commit message!
The motivation of the new abstraction is to provide a verbose, solid,
and bulletproof solution for an complex domain. The interaction between
SCM_RIGHTS,
new developers clear insights into what happens under the hood
is not trivial. These factors justify encapsulating the complexity
behind a convenient and well-documented abstraction, making the system
both robust in production and approachable for new developers.
In addition, it allows us to perform unit testing. Further, We get rid of
existing partial code duplication and inconsistencies.
Finally, while this approach may initially result in more code, every new
handler that accepts FDs benefits from reduced duplication and a correct
implementation by relying on the shared abstraction. This will also enable
future functionality, such as virtio-blk devices backed by FDs which can
then be integrated with ease.
Hints for Reviewrs