-
Notifications
You must be signed in to change notification settings - Fork 81
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CP-35551: Remove dependency on kernel blktap2 driver #364
base: master
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks fine to me except for some minor complaints about error handling.
control/tap-ctl-allocate.c
Outdated
return errno; | ||
} | ||
|
||
flock(f, LOCK_EX); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Check the return value?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Essentially the only way this can fail is with an EINTR or ENOLCK but it's probably worth at least commenting that, similarly the other LOCK_EX and no check.
Only option would be to fail hard which is maybe better than continuing blindly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, EINTR can and should be retried.
ENOLCK is probably quite unlikely, but if it happens and you continue you'd end up in a situation where two processes could be in the critical section at the same time.
control/tap-ctl-free.c
Outdated
/* err = ioctl(fd, BLKTAP2_IOCTL_FREE_TAP, minor); */ | ||
/* err = (err == -1) ? -errno : 0; */ | ||
/* close(fd); */ | ||
flock(fd, LOCK_EX); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Check the return value?
drivers/tapdisk-image.c
Outdated
err = glob(pattern, 0, NULL, &glbuf); | ||
switch (err) { | ||
case GLOB_NOMATCH: | ||
EPRINTF("%s, failed to find parent NBD path", pattern); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the return value of this function? Above it returns -1 and sets errno, here it returns a value from the GLOB_*
namespace, and below it returns -errno.
Hi. Nice work on removing the blktap kernel module. Thank you! I'm trying to modify libxl so it writes xenstores entries to backend/vbd3 so tapback can pick them up. /etc/xen/scripts/block-tap uses tap-ctl to create and remove the tapdisks. One issue is that libxl waits for XenbusStateInitWait before launching the hotplug script. tapback doesn't move to InitWait without hotplug-status, but the hotplug scripts write hotplug-status. libxl then times out waiting for InitWait. I am using 96ffa61 to move tapback to InitWait and things seem to work with it. It seems like other code in tapback is waiting for hotplug-status connected, so moving to InitWait earlier should be okay for XAPI's use as well as libxl. With this branch merged into master, I get a "No such file or directory" error when running
It seems that the tapdisk process exits when With tapback, there is some trickiness with physical-device. It expects to lookup devices by major:minor. I modified the block-tap script to write_dev with the nbd path - that is needed since libxl will need that path to pass to QEMU. The hotplug script returns 0:0 for the device major:minor which won't be correct. I guess tapback sees physical-device first, and then physical-device-path second. The second one is correct and maybe takes precedence? I have only tested a single tapdisk at a time, so minor 0 is always correct. It seems like physical-device code should be removed and only physical-device-path should be used by tapdisk? I have more testing to do. There were a few instances where tapdisk prints repeats of:
This may have been when I had stray tapdisks hanging around. #377 is help address that issue since block-tap could find the wrong minor. |
Yeah, this is very much a work in progress and not ready for merging yet. It looks like there is some unanticipated change to refcounting on one layer or another that causes the VBD to determine it's all done and the server can exit. |
My libxl patches are here if anyone wants to look: https://github.com/jandryuk/xen/commits/libxl-blktap |
Signed-off-by: Mark Syms <[email protected]>
Signed-off-by: Mark Syms <[email protected]>
Signed-off-by: Mark Syms <[email protected]>
Signed-off-by: Mark Syms <[email protected]>
@jandryuk if you try this updated PR it should work better. I've now got it working and integrated with xapi and the storage manager code in XenServer without the kernel driver being present on the system. There are still some integration rough edges with the toolstack which need to be resolved but they all should be that side and not in blktap. |
Thanks, @MarkSymsCtx . tap-ctl destroy is working properly again. It's been wroking along with a couple of tapback changes I'll try to get posted soon. Above, I mention two VHDs showing these errors:
Both check as invalid:
I'm not sure how that happened. The fedora one is still booting for my testing, though it ran fsck one at least one or two boots. There is a decent chance tapdisks were left lingering on the VHD files between runs - in other words there could have been two tapdisks pointing at the same VHD file. |
In block-tap, I had to do this:
Would it make sense for |
I'm using this branch with 4 commits on top of this PR for libxl support: https://github.com/jandryuk/blktap/tree/pr-364-libxl it includes #377 because that fixes an issue with |
I was also thinking of changing to the newer (~2012!) uri syntax: jandryuk/xen@40e2f59 |
Hi 👋 I work at Vates and came across this PR. Love that we're getting closer to removing the dependency on blktap kernel patch to use NBD instead of it. Especially after this change in XAPI: xapi-project/xen-api@6433c79 . I was hoping to test this PR along with #377. But I was wondering @MarkSymsCtx if you also have some branch in SM with the related changes? We were running into this error and were wondering if that is something we can help with. But since you mentioned above that you have some storage manager code, thought to ask you first. Thanks :) |
As I said above #364 (comment), this is still very much a work in progress. We do have some of the SM changes but we've had to defer all of this work to the next major platform release to avoid breaking customer workloads. Taking the opportunity of the platform release to make these breaking changes. |
I had been hoping to revisit this $soon to see if we could move it forward. Breaking customer workloads is 😞 though. @MarkSymsCtx do you have any rough idea of your schedule for the next major release? Are you talking months or years? I was hoping to be able to drop the blktap kmod before the next time I have to uprev Linux versions. I'm working with libxl and not xapi, and that required some changes as seen in https://github.com/jandryuk/blktap/commits/pr-364-libxl. @v-thakkar if you could test jandryuk@dea81cf and jandryuk@680512a with XAPI, that would really help me out. jandryuk@e443937 would also be good to tested, but that might require the nbd changes in this PR. |
I agree with @jandryuk. Will appreciate the details about your schedule for the next major release @MarkSymsCtx , so that we all can work collaboratively to remove the dependency on blktap patch and don't have to keep backporting it to newer kernel versions. :) |
No description provided.