-
Notifications
You must be signed in to change notification settings - Fork 6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
nvmeof/NVMeofGwMonitorClient: use a separate mutex for beacons #59053
Conversation
I'm much less familiar with the NVMeofGwMonitorClient side, but this looks reasonable to me at a surface level. I do have some questions/comments below that are probably worth looking at. First, a boring nit: include the subdirectory in commit message: Next, from a read of NVMeofGwMonitorClient.cc, I suppose that you are worried that slow handling of messages is causing beacons to lag -- specifically that handle_nvmeof_gw_map is taking a long time. Is that the case? If so, I suggest that you include a comment on beacon_lock explaining that reasoning. Something like
My larger concern is that it's not clear to me what exactly about handle_nvmeof_gw_map is taking a long time. At a guess, it's the (synchronous?) grpc call to the reactor? If it's actually the case that that is hanging long enough for the mon to notice the gap, is the GW really healthy? Suppose the spdk reactor becomes unhealthy enough that no grpc calls go through, it seems like you'd really want the GW to be declared dead and failed over. The above doesn't mean that this change is incorrect as I could well be missing something -- I just want to point out that you do probably want to make sure that the arrival of a beacon at the monitor actually demonstrates that the gw can actually serve client IO, and not just that the tick thread is still running. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably worth adjusting the commit message and adding a short explanatory comment to the new mutex as I outline above, but this looks fine at a surface level.
It's worth taking a look at my comment immediately above, however, as I'm a bit worried that this will have an effect of allowing a non-functional gw to continue sending beacons (though I could very easily be missing some other mechanism that would prevent that).
4ed4cdc
to
0802f51
Compare
jenkins retest this please |
Add beacon_lock to mitigate potential beacon delays caused by slow message handling, particularly in handle_nvmeof_gw_map. Signed-off-by: Alexander Indenbaum <[email protected]>
0802f51
to
0dc4185
Compare
nvmeof/NVMeofGwMonitorClient: use a separate mutex for beacons
nvmeof/NVMeofGwMonitorClient: use a separate mutex for beacons
Add beacon_lock to mitigate potential beacon delays caused by slow message handling, particularly in handle_nvmeof_gw_map.
CI Run: ceph/ceph-nvmeof#784
Contribution Guidelines
To sign and title your commits, please refer to Submitting Patches to Ceph.
If you are submitting a fix for a stable branch (e.g. "quincy"), please refer to Submitting Patches to Ceph - Backports for the proper workflow.
When filling out the below checklist, you may click boxes directly in the GitHub web UI. When entering or editing the entire PR message in the GitHub web UI editor, you may also select a checklist item by adding an
x
between the brackets:[x]
. Spaces and capitalization matter when checking off items this way.Checklist
Show available Jenkins commands
jenkins retest this please
jenkins test classic perf
jenkins test crimson perf
jenkins test signed
jenkins test make check
jenkins test make check arm64
jenkins test submodules
jenkins test dashboard
jenkins test dashboard cephadm
jenkins test api
jenkins test docs
jenkins render docs
jenkins test ceph-volume all
jenkins test ceph-volume tox
jenkins test windows
jenkins test rook e2e