-
Notifications
You must be signed in to change notification settings - Fork 89
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
enhance virtio_queue to better support multi-threading #76
Conversation
6f42186
to
ccd3ae2
Compare
@jiangliu this is a super interesting PR! We are working now on a design to support save/restore state in rust-vmm components (while keeping the serialization/deserialization product specific), and we came up with a design that is quite similar to what you are proposing here with the Looking high level, there are also a few differences between the proposals, so I would need to do an in-depth review. For example, we did not deem necessary a |
hi @andreeaflorescu , any update on this topic? The vm-virtio/virtio-queue becomes a blocking point for several of our projects:( |
Hi, I just opened an RFC with our proposal, sorry for the delay and please have a look! We spent some time thinking whether we should have the (de)ser and versionize support in rust-vmm or let the VMM completely handle that, would be great to hear your feedback on this as well. |
22335e7
to
f2c196a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for working on this. I think this is a step in the right direction, that tackles both #89 and #90. It's also consistent with the direction of vhost
and vhost-user-backend
.
Of course, this is going to require changes in the crate consumers, but I checked them with vhost-user-backend
and found them to be reasonable and quite straightforward.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi, left some high level comments for now. Also, would be really nice to add the new abstractions (i.e. QueueState
, QueueStateSync
, etc.) in their own files/modules once we're happy with all the other details.
@@ -284,7 +284,7 @@ mod tests { | |||
assert_eq!(d.cfg.device_features & (1 << VIRTIO_F_RING_EVENT_IDX), 0); | |||
|
|||
for q in d.cfg.queues.iter() { | |||
assert_eq!(q.event_idx_enabled, false); | |||
assert_eq!(q.state.event_idx_enabled, false); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just as a mention, the config in devices should start holding QueueState
objects instead of Queues
s, but we can apply this change later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, let's cover it by a dedicated MR, so we could keep this MR easy to review:)
pub struct Queue<M: GuestAddressSpace> { | ||
mem: M, | ||
|
||
pub struct QueueState<M: GuestAddressSpace> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, I think a QueueState
object doesn't need to be parametrized with anything related to GuestMemory
or GuestAddressSpace
traits. Instead, we can keep GuestState
with any type parameters, and add generic parameters to the individual methods which expect to work with a GuestMemory
handle. For example, this commit also showcases separating the state into a distinct object, but without any type parameters.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have tried to follow the suggestion, but the result is neutral, the QueueState
has been simplified by removing the type parameter, the callers of QueueState's. All QueueState's methods take a mem: &GuestAddressSpace::M
, so when calling those methods, you need to queuestate.avail_idx::<M>(&self.mem.memory(), order)
.
So I suggestion to keep the type parameter the ease the method invoke.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, so I think most (if not all) methods can work with a M: GuestMemory
generic type parameter and a mem: &M
argument, instead of having a M: GuestAddress
space generic type parameter and mem: &GuestAddressSpace::M
argument. This way we can get the best of both worlds, as we both remove the type parameter from QueueState
, and don't need to explicitly type any additional qualifications when calling the methods. If this still doesn't work for the current approach, please nudge me and I'll give it a go as well, since removing the generic from QueueState
can improve readability and conciseness by quite a lot.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems we also need <M: GuestAddressSpace> for
impl<M: GuestAddressSpace> QueueStateT<M> for QueueState<M>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, I think that can be worked around as well; I'll quickly try to better illustrate what I have in mind by adding an example commit on top of your changes in a branch somewhere, and ping here with a link when it's done a bit later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I experimented some more with this, and I pushed an example to this branch. The last two commits are placed on top of the changes you introduce in this PR, and we can just take and use them if they make sense. Can you please have a look and share your thoughts? I'll also propose a small change to how we offer the lock
semantics that you mentioned are required already; will ping back soon about that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Btw I just found out why that weird Sized
trait bound had to appear in the above example; it's because of the impl Bytes for T where T: GuestMemory
auto implementation from vm-memory
, which implies a Sized
bound for T
. I think we'll have to get rid of the auto impl at some point because it prevents a couple of useful scenarios, but for now I'll add a small change to relax the bounds there to include ?Sized
as well (since there's no reason really not to).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I experimented some more with this, and I pushed an example to this branch. The last two commits are placed on top of the changes you introduce in this PR, and we can just take and use them if they make sense. Can you please have a look and share your thoughts? I'll also propose a small change to how we offer the
lock
semantics that you mentioned are required already; will ping back soon about that.
Thanks for your work to simplify the interface, it looks attractive. One drawback with the simplification is that we can't make trait object of QueueStateT
anymore. I'm not sure whether there's any case needing QueueStateT
trait object.
/// Goes back one position in the available descriptor chain offered by the driver. | ||
/// Rust does not support bidirectional iterators. This is the only way to revert the effect | ||
/// of an iterator increment on the queue. | ||
pub fn go_to_previous_position(&mut self) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can add a go_to_previous_position
method to AvailIter
also, but I think should keep the methods on QueueState
and Queue
as well. AvailIter
takes some mut references, which causes issues with the borrow checker in some scenarios (which can be addressed, for example, by adding a pop_descriptor_chain
method to Queue
/QueueState
in the future).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The go_to_previous_position()
method assumes there's only one thread consuming descriptor chain from the available ring, so it would rewind. With the new Queue
and QueueState
, the assumption is not true anymore. That's why I have moved go_to_previous_position()
to AvailIter
because AvailIter
is protected by a lock logically.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's take an example:
- thread A locks the queue, consume a descriptor head 1, advance
next
to 2, unlock - thread B locks the queue, consume a descriptor head 2, advance
next
to 3, unlock - thread A locks the queue, rewind the head to
1
bygo_to_previous_position()
, unlock.
Then descriptor 2 will be consumed twice.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the explanation; let's suspend this discussion a bit until we get to the bottom of the one related to the lock
method.
/// | ||
/// Logically this method will acquire the underlying lock protecting the `QueueState` Object. | ||
/// The lock will be released when the returned object gets dropped. | ||
fn lock(&mut self) -> QueueStateGuard<'_, M>; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, I was just wondering, what's the purpose of this method? Since it takes a &mut self
, we're already in a context where he have exclusive access to the object, and it looks like we can already perform all the operations using the methods from this trait.
If there's a good reason to have the lock
method, then maybe we can get rid of QueueStateGuard
by using a signature like fn lock(&mut self) -> impl DerefMut<Target=QueueState>
, or have an associated type with a : DerefMut<Target=QueueState>
bound (if the former doesn't actually work). The downside to doing either of these is (IIRC) not being able to have QueueStateT
trait objects, but that doesn't seem like a use case we're trying to meet right now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My first trial is to implement fn lock(&mut self) -> impl DerefMut<Target=QueueState>
but it's really to hard to deal with the lifetime parameter needed. The I used the ugly QueueStateGuard
to simplify the implementation.
The total goal of lock()
method is to support following pattern:
fn process(queue: &mut Queue, eventfd: EventFd) -> Result<()> {
loop {
eventfd.read().unwrap();
// Lock the queue, get the next descriptor chain, and unlock
let chain = queue.lock().iter().next();
...
// Lock the queue, add used descriptor, unlock.
queue.add_used(chain.head_index);
}
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok so after thinking a bit in circles about this :D, it looks like the lock
method ultimately adds to the interface some semantics that should probably be handled behind the scenes by specific implementations. For example, given that QueueStateSync
is added in this PR as well (and it also implements QueueStateT
), don't we get proper synchronized behavior implicitly whenever we're (or a Queue
is) using that as the underlying state, without the need for an explicit lock
method?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's the real art part:)
We are trying to balance between flexibility and performance. If we support multi-threading by simple replace single-threaded QueueState
with multi-threaded versionQueueStateSync
, we may not get too much because QueueStateSync
causes too many lock/unlock operations. So we want fine-granularity control of the lock policy. For casual users, it may just replace QueueState
with QueueStateSync
, and everything should work as expected. For advanced users, it may call QueueStateSync::lock() to get the underlying guard object, and then optimize its lock policy.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're definitely right that we should keep in mind that different consumers might need different policies with varying degrees of granularity. In my mind this can be achieved by having different backends for QueueStateT
, instead of also adding lock
to the QueueStateT
interface explicitly. If, at some point, a new use case appears that cannot be efficiently addressed via either QueueState
or QueueStateSync
(it looks like they're enough for now, please correct if I'm wrong), then a new backend can be plugged in (and also upstreamed if not super-specific to a particular project). Do you think this more incremental approach works based on what we know so far?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We already have some clients needs the flexibility. The virtiofsd project and the nydus image service project also help to optimize performance by multi-threading. Our first trial has achieved some performance improvements, but it's still limited by the vm-queue interfaces. So I proposed this change to alleviate it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cool, so here's a simple pseudo-example of how we could redefine the presentation of lock
semantics (there are a couple of comments with bits of additional context in there as well). Basically I think we can make things a bit clearer by defining a separate trait for sync semantics, and remove lock
from the regular QueueState
interface.
This gives consumers of the abstractions (and other abstractions that are built on top) the ability to clearly represent their expectations; for example something that will work with state in a context that needs synchronization will demand a QueueStateSyncT
bound, whereas something that works with any QueueStateT
makes it clear it's doesn't have synchronization concerns (also, we can move the go to previous position op back to regular state as well, which is actually surprisingly useful to continue to have).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems we have two possible way to build the interface here.
One way is to let the virtio driver claims that "I'm a superman to support multi-threading concurrence, but you must supply a special queue with multi-threading capability to me, otherwise I will refuse to work".
Another way is that the virtio driver is optimized for multi-threading. When a multi-threading capable queue is passed in, the driver works in multi-threading mode. When a single-threaded queue is passed in, the driver works too.
The current interface is designed for the latter case.
Also it may also affect the implementation of device manager in vmm, I need more time to figure it out:)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi again, I gave some more thought and I'm personally starting to lean towards a strong preference to move the lock
semantics outside of QueueState
into another abstraction. Let's say there's some component/consumer/code A
that uses queue states without having multithreading concerns, and another B
that does need to leverage MT for performance or any other reasons. A
won't need lock
(and might want to use go_to_previous position
as part of the state interface), whereas B
will explicitly call lock
in different places.
The issues I see here is that we're keeping some functionality away from A
, and then B
has no way of ensuring that lock
(which has to be provided by any QueueStateT
implementation, since it's always a part of the interface) actually provides synchronization. Moreover, since consumers such as B
call lock
explicitly, they can also explicitly request a trait bound for the object. With this in mind, I think the approach where we have an additional trait (i.e. QueueStateSyncT
or something along those lines) lifts the limitations on QueueStateT
itself (because it no longer carries synchronization semantics of its own), and removes some of the ambiguity and potential pitfalls for use cases where sync is required. Also, we can add QueueStateSyncT: QueueStateT
such that it can seamlessly be used for implicit synchronization where appropriate.
That being said, this is prob better discussed over an actual proposal/PR. So far, the bigger discussion points are removing the M
generic parameter from QueueState
and the sync story. Should we just merge the current PR with the understanding that discussions will continue and changes may happen via subsequent PRs?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi again, I gave some more thought and I'm personally starting to lean towards a strong preference to move the
lock
semantics outside ofQueueState
into another abstraction. Let's say there's some component/consumer/codeA
that uses queue states without having multithreading concerns, and anotherB
that does need to leverage MT for performance or any other reasons.A
won't needlock
(and might want to usego_to_previous position
as part of the state interface), whereasB
will explicitly calllock
in different places.The issues I see here is that we're keeping some functionality away from
A
, and thenB
has no way of ensuring thatlock
(which has to be provided by anyQueueStateT
implementation, since it's always a part of the interface) actually provides synchronization. Moreover, since consumers such asB
calllock
explicitly, they can also explicitly request a trait bound for the object. With this in mind, I think the approach where we have an additional trait (i.e.QueueStateSyncT
or something along those lines) lifts the limitations onQueueStateT
itself (because it no longer carries synchronization semantics of its own), and removes some of the ambiguity and potential pitfalls for use cases where sync is required. Also, we can addQueueStateSyncT: QueueStateT
such that it can seamlessly be used for implicit synchronization where appropriate.That being said, this is prob better discussed over an actual proposal/PR. So far, the bigger discussion points are removing the
M
generic parameter fromQueueState
and the sync story. Should we just merge the current PR with the understanding that discussions will continue and changes may happen via subsequent PRs?
Sounds like a plan, let's do it step by step. It's ok to reach a relative stable interface before publishing v0.1. I will rebase it tomorrow.
pub fn set_next_avail(&mut self, next_avail: u16) { | ||
self.state.set_next_avail(next_avail); | ||
} | ||
} | ||
|
||
impl<M: GuestAddressSpace> Queue<M, QueueState<M>> { | ||
/// A consuming iterator over all available descriptor chain heads offered by the driver. | ||
pub fn iter(&mut self) -> Result<AvailIter<'_, M>, Error> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What prevents this from being part of the more generic impl Queue
above this one?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For QueueStateSync
to implement the iter()
method, it needs to lock the underlying Mutex, which means the concurrence has actually been disabled by iter()
. And to support QueueStateSync::iter()
, we need to change prototype of iter()
to something like:
pub fn iter(&mut self) -> Result<(GuardForAvailIter, AvailIter<'_, M>), Error>;
A guard is needed to protect the returned AvailIter object.
I plan to do the split the a dedicated MR:) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi Gerry, left a couple of additional comments and some small examples. Looking forward to hearing your thoughts.
I guess one last high level comment is whether we should actually split the new interfaces into non-mut and mut parts (i.e. QueueStateT
and QueueStateMutT
depending on whether the methods need &self
or &mut self
), as this makes things easier with certain bounds and implementations for smart pointer-like objects (for example). It's prob not super important right now, but it will become more complicated to change as time goes by.
Also, we can ultimately merge this as is and add changes on top afterwards; I think the important thing right now is to get aligned on the direction, and then we can get there in as many steps as we need.
pub struct Queue<M: GuestAddressSpace> { | ||
mem: M, | ||
|
||
pub struct QueueState<M: GuestAddressSpace> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Btw I just found out why that weird Sized
trait bound had to appear in the above example; it's because of the impl Bytes for T where T: GuestMemory
auto implementation from vm-memory
, which implies a Sized
bound for T
. I think we'll have to get rid of the auto impl at some point because it prevents a couple of useful scenarios, but for now I'll add a small change to relax the bounds there to include ?Sized
as well (since there's no reason really not to).
/// | ||
/// Logically this method will acquire the underlying lock protecting the `QueueState` Object. | ||
/// The lock will be released when the returned object gets dropped. | ||
fn lock(&mut self) -> QueueStateGuard<'_, M>; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cool, so here's a simple pseudo-example of how we could redefine the presentation of lock
semantics (there are a couple of comments with bits of additional context in there as well). Basically I think we can make things a bit clearer by defining a separate trait for sync semantics, and remove lock
from the regular QueueState
interface.
This gives consumers of the abstractions (and other abstractions that are built on top) the ability to clearly represent their expectations; for example something that will work with state in a context that needs synchronization will demand a QueueStateSyncT
bound, whereas something that works with any QueueStateT
makes it clear it's doesn't have synchronization concerns (also, we can move the go to previous position op back to regular state as well, which is actually surprisingly useful to continue to have).
I think it's ok to split the trait into non-mut and mut parts, but I'm not sure about the use cases for a readonly queue:) For the way to move forward, I prefer to merge this first and then add more changes piece by piece:) |
Split out `struct QueueState` from `struct Queue`, which contains all the fields related to virtio queue state. It will be reused to support concurent operations for multi-threading context later. Signed-off-by: Liu Jiang <[email protected]>
The `AvailIter` structure is an iterator to consume all avialable descriptors on the available ring. And Queue::go_to_previous_position() is to go back one step on the available ring, or essentially undo the effect of AvailIter::next(). So let's move go_to_previous_position() to AvailIter. It helps to support multi-threading in future. Signed-off-by: Liu Jiang <[email protected]>
Abstract queue state operations as `trait QueueStateT`, so we could provide different implementations for single-threaded context and multi-threaded context. Signed-off-by: Liu Jiang <[email protected]>
Introduce QueueStateSync, which is essentially Arc<Mutex<QueueState>>, for multi-threaded context. The QueueStateSync object could be cloned and sent to different threads for concurrently processing available descriptors. Signed-off-by: Liu Jiang <[email protected]>
Use atomic operation to access queue.head, to avoid reading inconsistent values. Signed-off-by: Liu Jiang <[email protected]>
Refine documentation a bit. Signed-off-by: Liu Jiang <[email protected]>
a935d0d
to
9b985ef
Compare
@slp @alexandruag I have rebased to the latest base, so let's move on and get things done step by step:) |
Is it possible to add some sort of test for the multi-threading part? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cool let's merge this and continue to iterate on top of it. From my point of view I'd like to continue with the changes that remove the generic type parameter from QueueState
and the proposal to refactor how the sync semantics are provided (i.e. vs the current lock
method of QueueStateT
). We'll also have to add more testing (like the tests Laura mentioned) after things settle down more, and we'll see at that point how to split the work.
This API makes very little sense, since the lock() method takes a |
Plus, it is horribly inefficient. The synchronization should have been done at the Queue level, not at the QueueState level. |
There were a couple of changes in rust-vmm#76 that are not yet tested. PR that covers that functionality is in review here: rust-vmm#111. Until that one is merged, the coverage needs to be decreased. Signed-off-by: Laura Loghin <[email protected]>
There were a couple of changes in #76 that are not yet tested. PR that covers that functionality is in review here: #111. Until that one is merged, the coverage needs to be decreased. Signed-off-by: Laura Loghin <[email protected]>
Refactor virtio_queue::Queue to better support multi-threading, and also ease migration of existing consumers of the crate. The main changes are:
trait QueueStateT<M: GuestAddressSpace>
, so we could use different concrete queue state objects for different queues.struct Queue
is a simple wrapper over QueueState with an associated memory object.struct QueueStateSync
for multi-threaded context.go_to_previous_position()
tostruct AvailIter()
.Fixes: #89 #90