Skip to content

bpf: tcp: Exactly-once socket iteration #9026

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 12 commits into
base: bpf-next_base
Choose a base branch
from

Conversation

kernel-patches-daemon-bpf[bot]
Copy link

Pull request for series with
subject: bpf: tcp: Exactly-once socket iteration
version: 2
url: https://patchwork.kernel.org/project/netdevbpf/list/?series=967719

@kernel-patches-daemon-bpf
Copy link
Author

Upstream branch: 90b83ef
series: https://patchwork.kernel.org/project/netdevbpf/list/?series=967719
version: 2

@kernel-patches-daemon-bpf
Copy link
Author

Upstream branch: bb1556e
series: https://patchwork.kernel.org/project/netdevbpf/list/?series=967719
version: 2

@kernel-patches-daemon-bpf
Copy link
Author

Upstream branch: bb1556e
series: https://patchwork.kernel.org/project/netdevbpf/list/?series=967719
version: 2

@kernel-patches-daemon-bpf
Copy link
Author

Upstream branch: cd2e103
series: https://patchwork.kernel.org/project/netdevbpf/list/?series=967719
version: 2

@kernel-patches-daemon-bpf
Copy link
Author

Upstream branch: cd2e103
series: https://patchwork.kernel.org/project/netdevbpf/list/?series=967719
version: 2

@kernel-patches-daemon-bpf
Copy link
Author

Upstream branch: 7fdaba9
series: https://patchwork.kernel.org/project/netdevbpf/list/?series=967719
version: 2

@kernel-patches-daemon-bpf
Copy link
Author

Upstream branch: 919319b
series: https://patchwork.kernel.org/project/netdevbpf/list/?series=967719
version: 2

@kernel-patches-daemon-bpf
Copy link
Author

Upstream branch: 97744b4
series: https://patchwork.kernel.org/project/netdevbpf/list/?series=967719
version: 2

@kernel-patches-daemon-bpf kernel-patches-daemon-bpf bot force-pushed the series/964616=>bpf-next branch from 050123c to a75050c Compare June 5, 2025 20:27
@kernel-patches-daemon-bpf
Copy link
Author

Upstream branch: a570f38
series: https://patchwork.kernel.org/project/netdevbpf/list/?series=967719
version: 2

jrife added 6 commits June 5, 2025 13:57
Prepare for the next patch which needs to be able to choose either
GFP_USER or GFP_NOWAIT for calls to bpf_iter_tcp_realloc_batch.

Signed-off-by: Jordan Rife <[email protected]>
Reviewed-by: Kuniyuki Iwashima <[email protected]>
Require that iter->batch always contains a full bucket snapshot. This
invariant is important to avoid skipping or repeating sockets during
iteration when combined with the next few patches. Before, there were
two cases where a call to bpf_iter_tcp_batch may only capture part of a
bucket:

1. When bpf_iter_tcp_realloc_batch() returns -ENOMEM.
2. When more sockets are added to the bucket while calling
   bpf_iter_tcp_realloc_batch(), making the updated batch size
   insufficient.

In cases where the batch size only covers part of a bucket, it is
possible to forget which sockets were already visited, especially if we
have to process a bucket in more than two batches. This forces us to
choose between repeating or skipping sockets, so don't allow this:

1. Stop iteration and propagate -ENOMEM up to userspace if reallocation
   fails instead of continuing with a partial batch.
2. Try bpf_iter_tcp_realloc_batch() with GFP_USER just as before, but if
   we still aren't able to capture the full bucket, call
   bpf_iter_tcp_realloc_batch() again while holding the bucket lock to
   guarantee the bucket does not change. On the second attempt use
   GFP_NOWAIT since we hold onto the spin lock.

I did some manual testing to exercise the code paths where GFP_NOWAIT is
used and where ERR_PTR(err) is returned. I used the realloc test cases
included later in this series to trigger a scenario where a realloc
happens inside bpf_iter_tcp_batch and made a small code tweak to force
the first realloc attempt to allocate a too-small batch, thus requiring
another attempt with GFP_NOWAIT. Some printks showed both reallocs with
the tests passing:

May 09 18:18:55 crow kernel: resize batch TCP_SEQ_STATE_LISTENING
May 09 18:18:55 crow kernel: again GFP_USER
May 09 18:18:55 crow kernel: resize batch TCP_SEQ_STATE_LISTENING
May 09 18:18:55 crow kernel: again GFP_NOWAIT
May 09 18:18:57 crow kernel: resize batch TCP_SEQ_STATE_ESTABLISHED
May 09 18:18:57 crow kernel: again GFP_USER
May 09 18:18:57 crow kernel: resize batch TCP_SEQ_STATE_ESTABLISHED
May 09 18:18:57 crow kernel: again GFP_NOWAIT

With this setup, I also forced each of the bpf_iter_tcp_realloc_batch
calls to return -ENOMEM to ensure that iteration ends and that the
read() in userspace fails.

Signed-off-by: Jordan Rife <[email protected]>
Reviewed-by: Kuniyuki Iwashima <[email protected]>
Get rid of the st_bucket_done field to simplify TCP iterator state and
logic. Before, st_bucket_done could be false if bpf_iter_tcp_batch
returned a partial batch; however, with the last patch ("bpf: tcp: Make
sure iter->batch always contains a full bucket snapshot"),
st_bucket_done == true is equivalent to iter->cur_sk == iter->end_sk.

Signed-off-by: Jordan Rife <[email protected]>
Reviewed-by: Kuniyuki Iwashima <[email protected]>
Prepare for the next patch that tracks cookies between iterations by
converting struct sock **batch to union bpf_tcp_iter_batch_item *batch
inside struct bpf_tcp_iter_state.

Signed-off-by: Jordan Rife <[email protected]>
Reviewed-by: Kuniyuki Iwashima <[email protected]>
Replace the offset-based approach for tracking progress through a bucket
in the TCP table with one based on socket cookies. Remember the cookies
of unprocessed sockets from the last batch and use this list to
pick up where we left off or, in the case that the next socket
disappears between reads, find the first socket after that point that
still exists in the bucket and resume from there.

This approach guarantees that all sockets that existed when iteration
began and continue to exist throughout will be visited exactly once.
Sockets that are added to the table during iteration may or may not be
seen, but if they are they will be seen exactly once.

Signed-off-by: Jordan Rife <[email protected]>
Replicate the set of test cases used for UDP socket iterators to test
similar scenarios for TCP listening sockets.

Signed-off-by: Jordan Rife <[email protected]>
@kernel-patches-daemon-bpf
Copy link
Author

Upstream branch: 64a064c
series: https://patchwork.kernel.org/project/netdevbpf/list/?series=967719
version: 2

jrife added 6 commits June 5, 2025 13:57
Prepare to test TCP socket iteration over both listening and established
sockets by allowing the BPF iterator programs to skip the port check.

Signed-off-by: Jordan Rife <[email protected]>
Add parentheses around loopback address check to fix up logic and make
the socket state filter configurable for the TCP socket iterators.
Iterators can skip the socket state check by setting ss to 0.

Signed-off-by: Jordan Rife <[email protected]>
Prepare for bucket resume tests for established TCP sockets by making
the number of ehash buckets configurable. Subsequent patches force all
established sockets into the same bucket by setting ehash_buckets to
one.

Signed-off-by: Jordan Rife <[email protected]>
Prepare for bucket resume tests for established TCP sockets by creating
established sockets. Collect socket fds from connect() and accept()
sides and pass them to test cases.

Signed-off-by: Jordan Rife <[email protected]>
Prepare for bucket resume tests for established TCP sockets by creating
a program to immediately destroy and remove sockets from the TCP ehash
table, since close() is not deterministic.

Signed-off-by: Jordan Rife <[email protected]>
Replicate the set of test cases used for UDP socket iterators to test
similar scenarios for TCP established sockets.

Signed-off-by: Jordan Rife <[email protected]>
@kernel-patches-daemon-bpf kernel-patches-daemon-bpf bot force-pushed the series/964616=>bpf-next branch from 3681c6f to 29170be Compare June 5, 2025 20:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant