Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

replace libev with libuv #6492

Open
garlick opened this issue Dec 9, 2024 · 3 comments
Open

replace libev with libuv #6492

garlick opened this issue Dec 9, 2024 · 3 comments

Comments

@garlick
Copy link
Member

garlick commented Dec 9, 2024

Edit: the primary motivation for this is libuv's ability to be embedded in another event loop via uv_backend_fd(). If flux could make a backend fd available for its event loop, then it would be easier to use Flux in applications that already have an event loop, and to create language bindings that use a native event loop rather than requiring Flux's be used. See also: test-embed.c. Additional benefits may accrue such as streams and thread pool work scheduling.

After a discussion in our meeting last week, I did an experiment to convert flux-core to use libuv.

First, I did some refactoring to localize all the libev usage to reactor.c. This involved reimplementing the buffer watchers, handle watcher, and zeromq watcher using the flux API instead of libev. Then I began systematically converting reactor.c to libuv. Although, as discussed, libuv derives from liibev and thus many of its idioms carry over, I ran into the following problems:

  • no equivalent to ev_child
  • no equivalent to ev_periodic
  • no equivalent to ev_stat (oops, I missed uv_fs_event_t)
  • no equivalent to ev_set_priority()
  • no equivalent to ev_ref()
  • memory for handle (libuv name for watcher) objects cannot be freed in flux_watcher_destroy(). One calls uv_close() on the handle (registering a callback), the reactor runs, and someday you get a close callback, after which memory can be freed. Grr. Because Windows I guess.

It's a fair bit of work to reimplement the missing watchers though probably not a huge deal. I'm not sure what to do about the destructor issue though. It doesn't map well onto our API.

This is on my port_libuv branch if anyone wants to take a look. libflux compiles with the missing functionality. The reactor unit test runs successfully, but when run under valgrind a bunch of invalid reads show up because watcher memory is freed too soon. memory is leaked. There is also an unexplained segfault in the composite watcher test.

garlick added a commit to garlick/flux-core that referenced this issue Dec 10, 2024
This is a partial conversion with the following gaps:
- no periodic watchers
- no child watchers
- no stat watchers
- flux_reactor_active_incref/decref
- no flux_watcher_set_priority

Memory will leak if watchers are destroyed outside of the reactor
because uv_close() works asynchronously.

Currently there is an unexplained segfault in the composite watcher
unit test.

See issue flux-framework#6492
@vchuravy
Copy link
Contributor

I hadn't had time in a while to look at flux, but from my perspective this would "fix" #3969 since the two event-loops that were fighting were libuv and libev (with some Julia idiosyncracies on top).

garlick added a commit to garlick/flux-core that referenced this issue Dec 10, 2024
This is a partial conversion with the following gaps:
- no periodic watchers
- no child watchers
- flux_reactor_active_incref/decref
- no flux_watcher_set_priority

Memory will leak if watchers are destroyed outside of the reactor
because uv_close() works asynchronously.

Currently there is an unexplained segfault in the composite watcher
unit test.

See issue flux-framework#6492
@garlick
Copy link
Member Author

garlick commented Dec 13, 2024

I did come across this just now, which lists some of the caveats to converting to libuv I mentioned above plus more. (I am not sure how up to date this is). Anyway, for reference

https://www.gevent.org/loop_impls.html#limitations-and-differences

garlick added a commit to garlick/flux-core that referenced this issue Dec 16, 2024
This is a partial conversion with the following gaps:
- no periodic watchers
- no child watchers
- no flux_reactor_active_incref/decref
- no flux_watcher_set_priority

Memory will leak if watchers are destroyed outside of the reactor
because uv_close() works asynchronously.

Currently there is an unexplained segfault in the composite watcher
unit test.

See issue flux-framework#6492
garlick added a commit to garlick/flux-core that referenced this issue Dec 16, 2024
This is a partial conversion with the following gaps:
- no periodic watchers
- no child watchers
- no flux_reactor_active_incref/decref
- no flux_watcher_set_priority

Memory will leak if watchers are destroyed outside of the reactor
because uv_close() works asynchronously.

Currently there is an unexplained segfault in the composite watcher
unit test.

See issue flux-framework#6492
@garlick
Copy link
Member Author

garlick commented Jan 10, 2025

I pushed an update to my branch referenced above, rebased on current master after we dropped child watchers and added watcher references. The major bits missing/broken are now watcher priorities and periodic watchers. The code builds as normal with libev now, but supports a --with-libuv configure option so, for now, support for both reactors can coexist. This makes it a bit easier to develop since any non-uv changes can be easily tested with libev without shuffling commits around.

The segfault in test/composite_future.t mentioned above appears to be a use-after-free, where a future initialization frees data passed as an argument and then gets called again (with different reactor pointer - so maybe "now" vs "then" contexts) and dereferences it. Note to self: with some debug added, the test goes like this:

good (libev):

ok 82 - flux_future_push success
# future_timeout: alloc dptr=0x578a0ddc4710
ok 83 - flux_future_push timeout success
# timeout_init: r=0x578a0ddc4550 dptr=0x578a0ddc4710
# timeout_init: free dptr=0x578a0ddc4710
ok 84 - flux_future_then worked
ok 85 - flux_future_is_ready returns false
ok 86 - async: composite future is ready
ok 87 - async: retrieved handle child future
ok 88 - async: flux_future_get on child successful
ok 89 - async: retrieved handle to timeout future
ok 90 - async: timeout future fulfilled
ok 91 - flux_reactor_run returned
ok 92 - asynchronous callback called

bad (libuv)

ok 82 - flux_future_push success
# future_timeout: alloc dptr=0x62adeaf2ba60
ok 83 - flux_future_push timeout success
# timeout_init: r=0x62adeaf2e330 dptr=0x62adeaf2ba60
# timeout_init: free dptr=0x62adeaf2ba60
ok 84 - flux_future_then worked
ok 85 - flux_future_is_ready returns false
ok 86 - async: composite future is ready
ok 87 - async: retrieved handle child future
ok 88 - async: flux_future_get on child successful
ok 89 - async: retrieved handle to timeout future
# timeout_init: r=0x62adeaf2fa50 dptr=0x62adeaf2ba60
free(): double free detected in tcache 2
Aborted (core dumped)

Anyway wanted to provide a checkpoint since other tasks beckon.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants