Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

reactor: improve reactor referencing API #6534

Merged
merged 18 commits into from
Dec 23, 2024

Conversation

garlick
Copy link
Member

@garlick garlick commented Dec 22, 2024

This replaces the libev style flux_reactor_active_incref() / _decref() interface, in which the watcher reference count is directly manipulated by callers, with a libuv style flux_watcher_ref() / _unref() interface. As discussed in #6510, the new interface is safer for users.

From flux_watcher_stop(3)

void flux_watcher_unref (flux_watcher_t *w);
void flux_watcher_ref (flux_watcher_t *w);
bool flux_watcher_is_referenced (flux_watcher_t *w);

flux_watcher_unref() drops the watcher's reference on the reactor.
This function is idempotent. flux_reactor_run() normally runs until
there are no more active watchers with references

flux_watcher_ref() restores the watcher's reference on the reactor.
This function is idempotent.

flux_watcher_is_referenced() returns true if the watcher is referenced.
All watchers are referenced unless updated with flux_watcher_unref().

How is it different for users?

The new interface doesn't require knowledge of how many reactor references are taken by a given watcher.

As an example, flux-start embeds a service that allows it to be controlled by tests. Since the original design of flux-start had it exiting once all child brokers have exited, it was convenient to arrange for the watchers associated with the server to not contribute the reactor's use count. The API change required the following code change:

diff --git a/src/cmd/flux-start.c b/src/cmd/flux-start.c
index 4634bcb1c..de5b73c2f 100644
--- a/src/cmd/flux-start.c
+++ b/src/cmd/flux-start.c
@@ -1038,13 +1038,9 @@ void start_server_initialize (const char *rundir, bool verbose)
         log_err_exit ("could not created embedded flux-start server");
     if (flux_msg_handler_addvec (ctx.h, htab, NULL, &ctx.handlers) < 0)
         log_err_exit ("could not register service methods");
-    /* Service related watchers:
-     * - usock server listen fd
-     * - flux_t handle watcher (adds 2 active prep/check watchers)
-     */
-    int ignore_watchers = 3;
-    while (ignore_watchers-- > 0)
-        flux_reactor_active_decref (ctx.reactor);
+
+    flux_watcher_unref (flux_get_handle_watcher (ctx.h));
+    flux_watcher_unref (usock_service_listen_watcher (ctx.h));
 }

 void start_server_finalize (void)

@garlick
Copy link
Member Author

garlick commented Dec 23, 2024

Pushed slightly improved test coverage.

Copy link
Contributor

@grondo grondo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a great improvement! Excellent!

Just a few comments inline, but nothing that should hold up the PR IMO.

Comment on lines 217 to 234
static void periodic_start (flux_watcher_t *w)
static void pdwatcher_start (flux_watcher_t *w)
Copy link
Contributor

@grondo grondo Dec 23, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a comment, I don't think this quite rises to the level of requiring a change to whole PR:

Was there a reason NAME for these watchers was kept to only 2 characters? I'm finding pdwatcher_start almost more unclear than periodic_start. Could it have have struct periodic_watcher and periodicwatcher_start?

(Same with tm vs timer, and hwatcher vs handlewatcher)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, that actually would be better. I'm not sure why I was stingy with the characters there. Old habits I suppose.

@@ -729,6 +729,12 @@ int flux_dispatch_requeue (flux_t *h)
return 0;
}

flux_watcher_t *flux_get_handle_watcher (flux_t *h)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Commit message typo: handle watch ➡️ handle watcher

Comment on lines +1042 to +1043
flux_watcher_unref (flux_get_handle_watcher (ctx.h));
flux_watcher_unref (usock_service_listen_watcher (ctx.h));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, so much nicer!

@@ -120,13 +120,28 @@ static void test_zmq (flux_reactor_t *reactor)
void *zs[2];
flux_watcher_t *r, *w;
const char *uri = "inproc://test_zmq";
flux_watcher_t *tmp;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

commit message: perhaps an extra space in incorrect zmqutil_watcher_create()


:func:`flux_watcher_is_referenced` returns true if the watcher is referenced.
All watchers are referenced unless updated with :func:`flux_watcher_unref`.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add note that not all watchers may support flux_watcher_unref()? Perhaps check flux_watcher_is_referenced() to ensure the function has had the desired effect. (or do all watchers in practice now support this call?)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All watchers in the public API support this call, but I didn't bother with the private ones (fbuf, sdbus). I'll make a note to that effect.

Problem: libev watcher wrapper code is hard to read due to poor
internal variable/function name choices and non-uniform structure.

Give implementations a unique internal name of the form: "NAME_watcher".
Give ops callbacks a "NAME_watcher_" prefix.
Name the ops struct "NAME_watcher_ops".
Always wrap the native ev_watcher in "struct NAME_watcher".

This commit is strictly cleanup and changes no watcher behavior.
Problem: passing incorrect arguments to reactor watcher functions
can trigger an assertion failure.

For functions that return success/failure, fail with errno=EINVAL
if the arguments are incorrect.  For void functions, do nothing if
the arguments are incorrect.
Problem: now that flux_stat_watcher_get_rstat() no longer asserts on
error, callers are unable to tell if the pointer arguments have been
assigned.

Change the function to return an int instead of void.
Update man page.
Add a unit test.
Problem: flux_reactor_active_incref() and _decref() are easy
to use incorrectly, resulting in a reactor hang.

libuv improved the interface by adding uv_ref(), uv_unref(),
uv_is_referenced() for its handles.  These accomplish the same thing
but are safer because they are associated with a handle (watcher) rather
than the reactor and they are idempotent.
See also: https://docs.libuv.org/en/v1.x/handle.html#refcount

Add the equivalent interface for the Flux reactor:

flux_watcher_ref()
flux_watcher_unref()
flux_watcher_is_referenced()

If a watcher doesn't implement the ref/unref callbacks then
flux_watcher_ref() and flux_watcher_unref() will be no-ops and
flux_watcher_is_referenced() will always return true.

Implement callbacks for the native libev watchers.
Problem: flux_watcher_ref() and _unref() don't work on handle
watchers.

Add the necessary internal callbacks for this to work.
Problem: the usock service provides no way to access the internal
watcher that accepts new connections for start/stop/ref/unref.

Add accessors:
usock_service_listen_watcher()
usock_server_listen_watcher() - needed by the above
Problem: there is no way to access the internal handle watcher
that is created for message dispatch.

Add an accessor:
flux_get_handle_watcher()
Problem: flux start uses the reactor active use count to ensure
that the reactor exits once the last client disconnects, but we now
have better interfaces for that.

Use flux_watcher_unref() instead.
Problem: some invalid arguments not detected at watcher creation.

Add code to fail with EINVAL if certain arguments are not set.
Problem: flux_watcher_ref() and _unref() don't work on zmq
watchers.

Add the necessary internal callbacks for this to work.
Update unit test.
Problem: there is no test coverage for incorrect zmqutil_watcher_create()
arguments.

Add a test.
Problem: the python bindings are using the deprecated
reactor active_incref/decref interfaces

Add ref() and unref() to the watcher base class.
Convert users to ref() and unref().
Drop the reactor incref() and decref() methods.
Problem: flux_reactor_active_incref/decref() is an inferior interface
compared to flux_watcher_ref/unref().

Drop these reactor methods.
Drop unit test.

Fixes flux-framework#6510
Problem: there is no test coverage for the new flux_watcher_ref(),
flux_watcher_unref(), and flux_watcher_is_referenced() interfaces.

Create a helper that tests both ref/unref/is_referenced and
start/stop/is_active.  Replace the repeated is_active code in
each watcher test with a call to the helper
Problem: handle watcher references have no test coverage.

The handle watcher doesn't have a standalone unit test, so add one
to the reactor unit test, including simple coverage for watcher
references.
Problem: flux_reactor_active_incref() and flux_reactor_active_decref()
appear in man pages but are no longer available.

Remove them.
Problem: flux_watcher_ref(), flux_watcher_unref(), and
flux_watcher_is_referenced() are undocumented.

Update man pages.
Problem: flux_get_handle_watcher() is not documented.

Add it to the message handler man page.
@garlick
Copy link
Member Author

garlick commented Dec 23, 2024

Re-pushed with the suggested changes.

I was able to pretty easily rename the internal watchers to the form NAME_watcher (structs, callbacks) which is more consistent with the names of the public flux_NAME_watcher_create() functions. Good suggestion to avoid the shorter names!

I also added a small cleanup in the zwatcher unit test.

@grondo
Copy link
Contributor

grondo commented Dec 23, 2024

Already approved, so feel free to set MWP!

@garlick
Copy link
Member Author

garlick commented Dec 23, 2024

Thanks! Doing that!

@mergify mergify bot merged commit 59cbf5e into flux-framework:master Dec 23, 2024
35 checks passed
Copy link

codecov bot commented Dec 23, 2024

Codecov Report

Attention: Patch coverage is 95.87021% with 14 lines in your changes missing coverage. Please review.

Project coverage is 83.64%. Comparing base (d58f245) to head (2821d37).
Report is 19 commits behind head on master.

Files with missing lines Patch % Lines
src/common/libflux/watcher_wrap.c 97.69% 6 Missing ⚠️
src/common/librouter/usock_service.c 68.75% 5 Missing ⚠️
src/bindings/python/flux/core/watchers.py 66.66% 2 Missing ⚠️
src/bindings/python/flux/cli/base.py 0.00% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #6534      +/-   ##
==========================================
+ Coverage   83.63%   83.64%   +0.01%     
==========================================
  Files         522      522              
  Lines       87860    88007     +147     
==========================================
+ Hits        73479    73614     +135     
- Misses      14381    14393      +12     
Files with missing lines Coverage Δ
src/bindings/python/flux/core/handle.py 98.66% <100.00%> (+1.81%) ⬆️
src/bindings/python/flux/job/watcher.py 94.21% <100.00%> (ø)
src/cmd/flux-start.c 83.93% <100.00%> (ø)
src/common/libflux/hwatcher.c 89.28% <100.00%> (+6.42%) ⬆️
src/common/libflux/msg_handler.c 86.64% <100.00%> (+0.10%) ⬆️
src/common/libflux/reactor.c 95.31% <ø> (-0.53%) ⬇️
src/common/libflux/watcher.c 100.00% <100.00%> (ø)
src/common/librouter/usock.c 86.89% <100.00%> (+0.07%) ⬆️
src/common/libzmqutil/zwatcher.c 85.96% <100.00%> (+3.96%) ⬆️
src/bindings/python/flux/cli/base.py 95.73% <0.00%> (ø)
... and 3 more

... and 3 files with indirect coverage changes

@garlick garlick deleted the watcher_ref branch December 23, 2024 16:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants