Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CA-392674: nbd_client_manager retry connect on nbd device busy #6021

Merged
merged 1 commit into from
Oct 14, 2024

Conversation

liulinC
Copy link
Collaborator

@liulinC liulinC commented Sep 27, 2024

to connect to nbd devices, nbd_client_manager will

  1. protect the operation with /var/run/nonpersistent/nbd_client_manager file lock
  2. check whether nbd is being used by nbd-client -check
  3. load nbd kernel module by modprobe nbd
  4. call nbd-client to connect to nbd device

However, step 3 will trigger systemd-udevd run asyncly, which would open and lock the same nbd devices, run udev rules, etc. This introduce races with step 4, e.g. both process want to open and lock the nbd device.

Note: the file lock in step 1 does NOT resovle the issue here, as it only coordinate multiple nbd_client_manager processes.

To fix the issue,

  • we patch nbd-client to report the device busy from kernel to nbd_client_manager
  • nbd_client_manager should check nbd-client exit code, and retry on device busy

@MarkSymsCtx
Copy link
Contributor

However, step 3 will trigger systemd-udevd run asyncly, which would open and lock the same nbd devices, run udev rules, etc. This introduce races with step 4, e.g. both process want to open and lock the nbd device.

This would be better fixed by adding a rule to udev to ensure that system-udevd leaves the devices alone and thus doesn't result in them being found busy.

@liulinC
Copy link
Collaborator Author

liulinC commented Sep 27, 2024

However, step 3 will trigger systemd-udevd run asyncly, which would open and lock the same nbd devices, run udev rules, etc. This introduce races with step 4, e.g. both process want to open and lock the nbd device.

This would be better fixed by adding a rule to udev to ensure that system-udevd leaves the devices alone and thus doesn't result in them being found busy.

XS8 stream udev handle the nbd device as well. (and has this race issue as well).
I have already raise some PR to clean some unnecessary udev rules.
I doubt should we totally ignore nbd devices in udev rules. (as XS8 steam do take care of it)

BTW, udev rules is only run and executed with the device been opened and lock been hold by systemd-udevd(workers),
https://github.com/systemd/systemd/blob/main/src/udev/udev-worker.c#L184-L203
This means the udev rules can only narrow down the race time window, but can not resolve the issue.

So I prefer to resolve the races issues other than just totally ignore nbd devices in the udev rules.

to connect to nbd devices, nbd_client_manager will
1. protect the operation with
/var/run/nonpersistent/nbd_client_manager file lock
2. check whether nbd is being used by `nbd-client -check`
3. load nbd kernel module by `modprobe nbd`
4. call `nbd-client` to connect to nbd device

However, step 3 will trigger systemd-udevd run asyncly, which would
open and lock the same nbd devices, run udev rules, etc. This introduce
races with step 4, e.g. both process want to open and lock the nbd
device.

Note: the file lock in step 1 does NOT resovle the issue here,
as it only coordinate multiple nbd_client_manager processes.

To fix the issue,
- we patch nbd-client to report the device busy from kernel to nbd_client_manager
- nbd_client_manager should check nbd-client exit code, and retry on device busy
- nbd_client_manager call `udevadm settle` to wait for udevd parsing udev rules

Note: checking nbd-client exit code is still necessary in case of racing with others

Signed-off-by: Lin Liu <[email protected]>
@liulinC
Copy link
Collaborator Author

liulinC commented Oct 9, 2024

job: 4126316~4126325 looks good.

@liulinC liulinC requested a review from gangj October 10, 2024 01:40
@lindig
Copy link
Contributor

lindig commented Oct 14, 2024

Can we get reviews from @MarkSymsCtx and @robhoes ?

@liulinC liulinC added this pull request to the merge queue Oct 14, 2024
Merged via the queue into xapi-project:master with commit 34a8796 Oct 14, 2024
15 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants