Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add packet batch trigger for better packet handling #60

Draft
wants to merge 2 commits into
base: dev
Choose a base branch
from

Conversation

jagerman
Copy link
Member

@jagerman jagerman commented Nov 2, 2023

This adds an optional packet batch callback to endpoint that allows endpoint to signal the application when it is done processing a set of incoming packets for the endpoint -- either because it has processed them all, or because it hit the max-recv-per-loop limit (64).

This is designed to allow an application to more efficiently deal with packets, especially datagrams, in batches: by using this an application can use the datagram handler to collect incoming datagrams, then use the batch callback to process accumulated datagrams in one go. Because the batch callback always fires before libquic goes back to potentially blocking waiting to read new packets, this means the application can do batch processing without needing to resort to timers or polling for packet handling.

This is particularly important for Lokinet, where we take the packet in the callback transfer it to a job in the Lokinet main loop thread to process it there: doing this one packet at a time is not likely to scale well because of the sheer number of jobs that would have to be put on the logic thread event queue; by batching them into groups of up to 64 packets at a time we ought to be able to do considerably better.

@jagerman jagerman force-pushed the packet-batch-callback branch from b7af646 to 8a5b63b Compare November 2, 2023 21:19
@jagerman jagerman requested review from dr7ana and tewinget November 2, 2023 21:19
Comment on lines 40 to 41
using receive_callback_t = std::function<void(const Packet& pkt)>;
using receive_batch_callback_t = std::function<void()>;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a knee-jerk reaction to suffixing with {}_t now and you only have yourself to blame

@@ -103,6 +113,8 @@ namespace oxen::quic

event_ptr rev_ = nullptr;
receive_callback_t receive_callback_;
receive_batch_callback_t receive_callback_batch_;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe batch_processing_cb ?

src/udp.cpp Outdated
Comment on lines 135 to 137
if (self.pending_receive_batch_)
{
self.pending_receive_batch_ = false;
if (self.receive_callback_batch_)
self.receive_callback_batch_();
}
Copy link
Collaborator

@dr7ana dr7ana Nov 3, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm wondering about this boolean (self.pending_receive_batch_). Few questions/thoughts we can chat about later:

  1. I feel like there shouldn't be a case where self.pending_receive_batch_ would be true and self.receive_callback_batch_ would not be set? If the user doesn't provide a batch processing cb, I would guess that the batch processing logic should be avoided entirely? Though that boolean is flipped to true at line 206 ::process_packet, implying its flipped and checked invariant of whether batch processing is "active" for this endpoint or not
  2. What if we used the set-ness of the callback as an indicator of whether batch processing is "active" or "inactive" for this endpoint? Then the boolean could be a "we have a full batch" signal of sorts
  3. We could wrap the callback in a lightweight struct that has two members, a callback and an int/size_t specifying the size of the batch? The size parameter could even default to something sensible at or below the loop limit of 64 packets

Follow-on... would be cool to speed-test this with different batch sizes to see what works most optimally.

Follow-on follow-on... maybe even approximate grid search cross-validation across all our configurable parameters...

auto batch_counter_before_final = f.get();
REQUIRE(data_counter == 31);
REQUIRE(batch_counter_before_final > batches_before_flood);
REQUIRE(batch_counter == batch_counter_before_final + 1);
Copy link
Collaborator

@dr7ana dr7ana Nov 3, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interestingly the last REQUIRE failed on a few debug CI builds

@jagerman jagerman force-pushed the packet-batch-callback branch from 8a5b63b to 025fe0e Compare November 3, 2023 18:04
@jagerman
Copy link
Member Author

jagerman commented Nov 3, 2023

I think the accept-const-string change here is actually buggy: libquic ends up storing a string_view to be used when it can actually send the packet, but we have no guarantee that the string will still be around at that point.

This is dropped now.

This adds an optional packet post-receive callback to Endpoint that
allows Endpoint to signal the application when it is done processing a
set of incoming packets -- either because it has processed them all, or
because it hit the max-recv-per-loop limit (64).

This is designed to allow an application to more efficiently deal with
packets, especially datagrams, in batches: by using this an application
can use the datagram handler to collect incoming datagrams, then use the
post-receive callback to process accumulated datagrams in one go.
Because the post-receive callback always fires immediately *before*
libquic goes back to potentially blocking waiting to read new packets,
this means the application can do batch processing without needing to
resort to timers or polling for packet handling.

This is particularly important for Lokinet, where we take the packet in
the callback transfer it to a job in the Lokinet main loop thread to
process it there: doing this one packet at a time is not likely to scale
well because of the sheer number of jobs that would have to be put on
the logic thread event queue; by batching them into groups of up to 64
packets at a time we ought to be able to do considerably better, and by
triggering the processing based on the post-receive trigger we ensure we
don't introduce unnecessary delays in terms of when packets get
processed.
@jagerman jagerman force-pushed the packet-batch-callback branch from 025fe0e to 8318b81 Compare November 3, 2023 21:21
@jagerman jagerman marked this pull request as draft November 3, 2023 21:29
@jagerman
Copy link
Member Author

jagerman commented Nov 3, 2023

Keeping as draft until I've tried making this fit into Lokinet.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants