-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
drivers: xen: gnttab: rework grant table driver #96
drivers: xen: gnttab: rework grant table driver #96
Conversation
drivers/xen/gnttab.c
Outdated
@@ -34,53 +34,60 @@ | |||
LOG_MODULE_REGISTER(xen_gnttab); | |||
|
|||
/* Timeout for grant table ops retrying */ | |||
#define GOP_RETRY_DELAY 200 | |||
#define GOP_RETRY_DELAY 200 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Stray whitespace change?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed, all spurious style changes removed.
drivers/xen/gnttab.c
Outdated
/* Map one more frame if possible, need to hold mutex */ | ||
rc = extend_gnttab(); | ||
if (rc) { | ||
/* Failed to extend grant table, so need to wait for free entry */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we really need to wait for a free entry? Why we can't just return an error here?
Your approach may block some device driver for undefined amount of time. Which is pretty bad, but is double bad for an RTOS.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That was one of the controversial point in here. Previous implementation would wait on semaphore, but it was not dynamically expandable. So, now I think we can remove condvar and just return error (made in last version).
drivers/xen/gnttab.c
Outdated
@@ -311,7 +319,6 @@ int gnttab_unmap_refs(struct gnttab_map_grant_ref *unmap_ops, unsigned int count | |||
return HYPERVISOR_grant_table_op(GNTTABOP_unmap_grant_ref, unmap_ops, count); | |||
} | |||
|
|||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd prefer to see formatting changes in a separate commit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
drivers/xen/gnttab.c
Outdated
@@ -324,46 +331,150 @@ const char *gnttabop_error(int16_t status) | |||
} | |||
} | |||
|
|||
static int gnttab_init(void) | |||
static int map_grant_frames(unsigned int start_frame, unsigned int end_frame) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This function is always called with (x, x+1)
arguments. Can we rename it to map_grant_frame(unsigned int start_frame)
? And remove for
loop inside.
drivers/xen/gnttab.c
Outdated
} | ||
|
||
/* Frames are mapped, so we can setup grant table */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, this function not only maps frame but also sets it up. In this case it should have this reflected in its name
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Divided into 2 separate functions.
__ASSERT((!rc) && (!setup.status), "Table setup failed; status = %s\n", | ||
gnttabop_error(setup.status)); | ||
if (rc || setup.status) { | ||
LOG_ERR("Table setup failed; status = %s", gnttabop_error(setup.status)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am curious about state of the gnttab after this. We mapped a frame but we couldn't setup it. What will happen during the next call to this function?
Also, should you setup rc
here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In case XENMEM_add_to_physmap() is called before GNTTABOP_setup_table - the GNTTABOP_setup_table become a NOP as only real change which might be triggered by gnttab_setup_table() is to call gnttab_grow_table(), but the same is called by gnttab_get_status_frame_mfn() and gnttab_get_shared_frame_mfn(), which in turn causes GNT expand by 1 on every call.
From Linux and Xen it seems we do not need to call GNTTABOP_setup_table for ARM, the return results are used only by X86.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From Linux and Xen it seems we do not need to call GNTTABOP_setup_table for ARM, the return results are used only by X86.
It is supposed to be the common code. AFAIK, there are experiments with running Zephyr-xenvm on x86.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What will happen during the next call to this function?
I checked Linux driver - in same conditions it just calls BUG_ON()
and continues execution. In our case, as @GrygiriiS said, this call is not doing much inside Xen, but interesting that it may both return negative errno in rc
or zero rc
with error in setup.status
. Since we were able to add one more frame with memory_op
, I think this is definitely not OK condition, so I'm now setting rc
here also from setup.status
From Linux and Xen it seems we do not need to call GNTTABOP_setup_table for ARM, the return results are used only by X86.
As I can see, static int gnttab_map()
doing the same for both of them, only map_frames()
callbacks looks different in platform specific implementation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What will happen during the next call to this function?
I checked Linux driver - in same conditions it just calls
BUG_ON()
and continues execution. In our case, as @GrygiriiS said, this call is not doing much inside Xen, but interesting that it may both return negative errno inrc
or zerorc
with error insetup.status
. Since we were able to add one more frame withmemory_op
, I think this is definitely not OK condition, so I'm now settingrc
here also fromsetup.status
From Linux and Xen it seems we do not need to call GNTTABOP_setup_table for ARM, the return results are used only by X86.
As I can see,
static int gnttab_map()
doing the same for both of them, onlymap_frames()
callbacks looks different in platform specific implementation.
If you check arch_gnttab_map_status() and arch_gnttab_map_shared() in Linux for ARM - they are NOP.
Hence there is NO proper implementation for Zephyr x86 and Zephyr XEN depends on ARMV8_A, I'd just drop this dead code and add comment why.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you check arch_gnttab_map_status() and arch_gnttab_map_shared() in Linux for ARM - they are NOP.
Yeah, I mentioned it here:
only map_frames() callbacks looks different in platform specific implementation.
Thus, I see no dead code here (GNTTABOP_setup_table
and status check, which are the points of this discussion are performed for both platforms).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do not like all these allocs - it really not a Zephyr approach.
How about Kconfig (thoughts):
- GRANT_TABLE_MAX_FRAMES : use to allocate grefs, should be >=max_grant_frames
- GRANT_TABLE_INIT_FRAMES (default 1): number of frames to init at boot
- may be more if needed
And no docs and no tests!!!
drivers/xen/gnttab.c
Outdated
DEVICE_MMIO_TOPLEVEL_MAP(grant_tables, K_MEM_CACHE_WB | K_MEM_PERM_RW); | ||
gnttab.table = (grant_entry_v1_t *)DEVICE_MMIO_TOPLEVEL_GET(grant_tables); | ||
|
||
LOG_DBG("%s: grant table mapped\n", __func__); | ||
LOG_INF("%s: grant table mapped\n", __func__); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a bit useless as INFO, without additional info.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Returned to DBG, spurious change
__ASSERT((!rc) && (!setup.status), "Table setup failed; status = %s\n", | ||
gnttabop_error(setup.status)); | ||
if (rc || setup.status) { | ||
LOG_ERR("Table setup failed; status = %s", gnttabop_error(setup.status)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In case XENMEM_add_to_physmap() is called before GNTTABOP_setup_table - the GNTTABOP_setup_table become a NOP as only real change which might be triggered by gnttab_setup_table() is to call gnttab_grow_table(), but the same is called by gnttab_get_status_frame_mfn() and gnttab_get_shared_frame_mfn(), which in turn causes GNT expand by 1 on every call.
From Linux and Xen it seems we do not need to call GNTTABOP_setup_table for ARM, the return results are used only by X86.
unsigned int flags; | ||
|
||
flags = irq_lock(); | ||
k_mutex_lock(&gnttab.lock, K_FOREVER); | ||
gnttab.gref_list[gref] = gnttab.gref_list[0]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
By the way this crap is may be simple and fast, but terribly unsafe :(
For example: get 8 put 8 put 8 - undefined behavior.
May be there is smth. suitable in Zephyr which can be reused in MM area
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These functions are not accessed directly, only via public functions for allocating or mapping foreign grants (however, yes, you still can put same entry twice, same as in linux because you can't check ownership and know only gref).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BTW, if I am not wrong, ASSERT(gnttab.gref_list[gref] != gnttab.gref_list[0]);
should be sufficient.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unfortunately, it will work only for last freed entry...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, correct.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about writing some kind of flag (for example define GNTTAB_USED_GREF (UINT32_MAX - 1)
) to the taken cells in the gref_list? Then we would be able to check if the entry is allocated and can be safely deallocated.
8302d35
to
546b088
Compare
The problem not only in frames which are located in Xen reserved area, but in entries list for allocator. For our case, 4K frame contains 512 entries (every gref is uint32_t). If we statically allocate
All public functions in gnttab driver are Doxygen documented in |
I'd like to summarize my thoughts after slept over:
With bitarray changing size is safe, especially if size changed only on init. optional - CONFIG_GRANT_TABLE_INIT_FRAMES (default 1): number of frames to init at boot |
The main pros of this "magic" allocator is speed, you can get/put any entry in 2 operation (Linux does exactly the same, but for 2 dimension array).
All other statements are clear when we have statically allocated memory for max possible amount of frames/entries on init. |
Speed here is not so important as Zephyr is expected to do most of the Init work at boot - and here 1) not too many "Grant frames" are allocated 2) all allocations, I see, related to the DomU creation which is, by it self. very-very slow ( images loading, memory mapping, iomem + irq + dtdev mapping -- one SVC call each (for spider it will be probably >100 Xen SVC calls only for iomem/irq/dtdev). |
Well, on one hand there is no actual reason to follow Linux example, on other hand, this implementation is already familiar for Linux users. I am curious - is there a mechanism similar to Linux On third hand, goal of this PR is not to rework gref allocation mechanism...
This depends on use-case actually. In Dom0, yes, there generally not so many grant frames are used. But, DomU can use lots and lots of them. Take a look at PV-DRM for example, which operates with 8MB buffers. Each buffer requires 2048 grant references, which will need 4 grant frames. And for triple buffering you may need 3 such buffers. Similar story is for PV Audio and, to lesser extent, for PV Net and PV Block. |
@lorc oh, I missed @GrygiriiS comment above, thank you for the answer
Unfortunately, no. At least I did not find anything similar or suitable for our case yet, Zephyr use-cases are usually much simpler and don't require such powerful instruments for ID management.
That's one of the biggest problem now, because Zephyr contains only 2 suitable functions for searching free entries - Also, when it comes to setting/clearing required bits - Zephyr has
¯_(ツ)_/¯
And just imagine that on every get/free entry we will need to walk through corresponding bits for 4 already used grant frames (2048 bits for entries) to find first free entry. I tried to figure out how to optimize it by remembering last allocated/freed entry in bit field (at least corresponding index of uint32_t part of bitmap), but it becomes over complicated when we take into account that entries may be freed on random place in bitmap or moment of time. As a little optimization and compromise, I can take Linux approach with 2-dimensional array of grant references - first dimension is grant frame, second - is grant reference from this frame. This will allow to remove memcpy of all previous data after grant table expand. |
Well, that's the problem. Okay, I propose to leave gref allocation as is, at least for now. @GrygiriiS, patches are welcome :)
Yes, this is suboptimal. I can image edge case, when PV DRM allocates lots of grant references for its static buffers first and then PV Net tries to allocate/free gref for each new packet, traversing first 12 grant frames every time, which are all taken by DRM. Basically, Linux allocator guarantee O(1) time while bitmask-based - only O(N).
Table will expand only limited number of times, so I don't see big ussie there, performance-wise. |
drivers/xen/gnttab.c
Outdated
rc = extend_gnttab(); | ||
if (rc) { | ||
k_mutex_unlock(&gnttab.lock); | ||
LOG_WRN("Failed to extend gnttab rc = %d, waiting for free gref", rc); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No more waiting :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, thank you, fixed
unsigned int flags; | ||
|
||
flags = irq_lock(); | ||
k_mutex_lock(&gnttab.lock, K_FOREVER); | ||
gnttab.gref_list[gref] = gnttab.gref_list[0]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BTW, if I am not wrong, ASSERT(gnttab.gref_list[gref] != gnttab.gref_list[0]);
should be sufficient.
546b088
to
0524f61
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Reviewed-by: Volodymyr Babchuk <[email protected]>
unsigned int flags; | ||
|
||
flags = irq_lock(); | ||
k_mutex_lock(&gnttab.lock, K_FOREVER); | ||
gnttab.gref_list[gref] = gnttab.gref_list[0]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, correct.
0524f61
to
f1cdfde
Compare
@lorc thank you for review, added RB. |
Hm. A you kidding people? ;..( Regarding ffs and etc - the everything is in place in bitarray. For example sys_bitarray_test_and_set_region() which is called sys_mem_blocks_get() which allows to search for N bits, check if they already set and set them if not. or there is sys_bitarray_alloc(). What is wrong with them? But yeah it's up to you - but you can forget about certification ;) and continue doing super universal linux-like code for features which most probably will never ever be implemented in Zephyr. |
Look, I agree with you that there is always a room for optimisation and making things better. My point was that purpose of this PR is not to rework gref allocation. This PR reworks grant frames allocation. Yes, we can rework gref allocator as well. No problem. But this should be separate PR. No need to add additional changes into this one. This particular PR will be upstreamed into Zephyr mainline, so we want amount of changes. Thank you for pointing to
Well, could you please point out why current approach prevents certification? |
@GrygiriiS, so, what do you say? Can we proceed with the merge? Or you insist on reworking gref allocator in this PR? |
f1cdfde
to
660a899
Compare
Thanks to @Deedone for great idea, I've added and tested this approach, it works OK. @lorc @GrygiriiS
|
660a899
to
606823c
Compare
|
I believe that
Should go into separate commit. No need to mix changes to grant frame allocator with grant ref allocator. That commit as already large enough. |
606823c
to
b631737
Compare
Unfortunately, it is partially not possible for correct functioning. (Since last push:)
Also, revoked RB tags for changed commits. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
Reviewed-by: Volodymyr Babchuk <[email protected]>
Xen public headers were imported to Zephyr source tree, so only used structs were added (not copied as is). So, now for refactoring and improving gnttab driver we need to augument grant table public header with one more struct. It will be used for reading actual number of available Xen grant frames. Signed-off-by: Dmytro Firsov <[email protected]> Reviewed-by: Volodymyr Babchuk <[email protected]> Reviewed-by: Mykyta Poturai <[email protected]>
Previously Zephyr has simple gnttab driver adapted from MiniOS. It had a lot of problems related to non-optimal grant frame usage, ignoring Xen max_grant_frames limit etc. The main problem, that led to this improvement was issues with grant table DT region interpretation - all pages from Xen gnttab reserved region (first reg in hypervisor DT node) was treated as number of available grant frames for mapping, but it was not correct. Actual limit of grant frames is usually significantly less than this region, which caused Xen warning/errors during grant table init. Now grant table driver maps single frame at start and all other (prior to max amount) on demand by expanding gref list. Max amount is discovered by gnttab_query_size operation on driver init (Xen hypercall). Please note that Stage 1 mapping (region from device tree) is left as is (whole region on init with Zephyr top level map), changes are related only to Stage 2 mapping (actual grant table frame pages in hypervisor). As all accesses to grant table region is fully controlled by driver function this will not cause any problems. Also, as now we have a possibility to expand grant table, gref allocation is non-blocking anymore - API may return GNTTAB_INVAL_GREF for functions, that uses gref allocation. Signed-off-by: Dmytro Firsov <[email protected]> Reviewed-by: Volodymyr Babchuk <[email protected]> Reviewed-by: Mykyta Poturai <[email protected]>
Grant references are allocated with simple O(1) allocator, where every next free gref is stored in previous and first free gref is always stored in 0 element. Current implementation had a possibility for double-free of some gref, because it did not store any information about entries being claimed. It may led to invalid gref_list value. Add GNTTAB_GREF_USED value and mark all taken value to prevent double free in put_grant_entry(). Signed-off-by: Dmytro Firsov <[email protected]> Reviewed-by: Volodymyr Babchuk <[email protected]> Reviewed-by: Mykyta Poturai <[email protected]>
b631737
to
6f6a460
Compare
@Deedone confirmed that I may left his RB. Added @lorc @Deedone RB tags to commits, thank you for review. @GrygiriiS can we proceed with merge? |
Previously Zephyr has simple gnttab driver adapted from MiniOS. It had a lot of problems related to non-optimal grant frame usage, ignoring Xen max_grant_frames limit etc.
The main problem, that led to this improvement was issues with grant table DT region interpretation - all pages from Xen gnttab reserved region (first reg in hypervisor DT node) was treated as number of available grant frames for mapping, but it was not correct. Actual limit of grant frames is usually significantly less than this region, which caused Xen warning/errors during grant table init.
Now grant table driver maps single frame at start and all other (prior to max amount) on demand by expanding gref list. Max amount is discovered by gnttab_query_size operation on driver init (Xen hypercall).
Please note that Stage 1 mapping (region from device tree) is left as is (whole region on init with Zephyr top level map), changes are related only to Stage 2 mapping (actual grant table frame pages in hypervisor). As all accesses to grant table region is fully controlled by driver function this will not cause any problems.