-
Notifications
You must be signed in to change notification settings - Fork 129
stream #8904
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
kkdwivedi
wants to merge
12
commits into
kernel-patches:bpf-next_base
Choose a base branch
from
kkdwivedi:stream
base: bpf-next_base
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
stream #8904
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add a new bpf_dynptr_from_mem_slice kfunc to create a dynptr from a PTR_TO_BTF_ID exposing a variable-length slice of memory, represented by the new bpf_mem_slice type. This slice is read-only, for a read-write slice we can expose a distinct type in the future. Since this is the first kfunc with potential local dynptr initialization, add it to the if-else list in check_kfunc_call. Signed-off-by: Kumar Kartikeya Dwivedi <[email protected]>
Add support for a stream API to the kernel and expose related kfuncs to BPF programs. Two streams are exposed, BPF_STDOUT and BPF_STDERR. These can be used for printing messages that can be consumed from user space, thus it's similar in spirit to existing trace_pipe interface. The kernel will use the BPF_STDERR stream to notify the program of any errors encountered at runtime. BPF programs themselves may use both streams for writing debug messages. BPF library-like code may use BPF_STDERR to print warnings or errors on misuse at runtime. The implementation of a stream is as follows. Everytime a message is emitted from the kernel (directly, or through a BPF program), a record is allocated by bump allocating from per-cpu region backed by a page obtained using try_alloc_pages. This ensures that we can allocate memory from any context. The eventual plan is to discard this scheme in favor of Alexei's kmalloc_nolock() [0]. This record is then locklessly inserted into a list (llist_add()) so that the printing side doesn't require holding any locks, and works in any context. Each stream has a maximum capacity of 4MB of text, and each printed message is accounted against this limit. Messages from a program are emitted using the bpf_stream_vprintk kfunc, which takes a stream argument in addition to working otherwise similar to bpf_trace_vprintk. The stream itself can be obtained using two kfuncs, bpf_stream_get for the current program, and bpf_prog_stream_get to obtain it for a target program ID. The bprintf buffer helpers are extracted out to be reused for printing the string into them before copying it into the stream, so that we can (with the defined max limit) format a string and know its true length before performing allocations of the stream element. For consuming elements from a stream, bpf_stream_next_elem can be called, which returns a bpf_stream_elem object that contains a bpf_mem_slice struct representing the message contents. A dynptr can be created from this memory slice object to access the contents of the bpf_stream_elem. Once consumed, the bpf_stream_free_elem can be used to release the message back to the memory allocator. The internals of bpf_stream_next_elem merit some discussion. First, the lockless list bpf_stream::log is a LIFO stack. Elements obtained using a llist_del_all() operation are in LIFO order, thus would break the chronological ordering if printed directly. Hence, this batch of messages is first reversed. Then, it is stashed into a separate list in the stream, i.e. the backlog_log. The head of this list is the actual message that should always be returned to the caller. For this purpose, we hold a lock around bpf_stream_backlog_pop(), as llist_del_first() (if we maintained a second lockless list for the backlog) wouldn't be safe from multiple threads anyway. Then, if we fail to find something in the backlog log, we splice out everything from the lockless log, and place it in the backlog log, and then return the head of the backlog. Next time we pop a message, we should visit the remaining elements in the backlog log first. We use rqspinlock for protecting the backlog log, to ensure we can invoke bpf_stream_next_elem in any context. With the exception of bpf_prog_stream_get, these kfuncs are available to all program types. bpf_prog_stream_get takes a spin_lock_bh, thus is susceptible to deadlocks if invoked in random kernel contexts. Hence, it is restricted to BPF_PROG_TYPE_SYSCALL. In the future, if the need arises, we can use rqspinlock to make it callable in any context. From the kernel side, the writing into the stream will be a bit more involved than the typical printk. First, the kernel typically may print a collection of messages into the stream, and parallel writers into the stream may suffer from interleaving of messages. To ensure each group of messages is visible atomically, we can lift the advantage of using a lockless list for pushing in messages. To enable this, we add a bpf_stream_stage() macro, and require kernel users to use bpf_stream_printk statements for the passed expression to write into the stream. Underneath the macro, we have a message staging API, where a bpf_stream_stage object on the stack accumulates the messages being printed into a local llist_head, and then a commit operation splices the whole batch into the stream's lockless log list. This is especially pertinent for rqspinlock deadlock messages printed to program streams. After this change, we see each deadlock invocation as a non-interleaving contiguous message without any confusion on the reader's part, improving their user experience in debugging the fault. While programs cannot benefit from this staged stream writing API, they could just as well hold an rqspinlock around their print statements to serialize messages, hence this is kept kernel-internal for now. Overall, this infrastructure provides NMI-safe any context printing of messages to two dedicated streams. Later patches will add support for printing splats in case of BPF arena page faults, rqspinlock deadlocks, and cond_break timeouts, and integration of this facility into bpftool for dumping messages to user space. [0]: https://lore.kernel.org/bpf/[email protected] Signed-off-by: Kumar Kartikeya Dwivedi <[email protected]>
Prepare a function for use in future patches that can extract the file info, line info, and the source line number for a given BPF program provided it's program counter. Only the basename of the file path is provided, given it can be excessively long in some cases. This will be used in later patches to print source info to the BPF stream. The source line number is indicated by the return value, and the file and line info are provided through out parameters. Signed-off-by: Kumar Kartikeya Dwivedi <[email protected]>
In preparation of figuring out the closest program that led to the current point in the kernel, implement a function that scans through the stack trace and finds out the closest BPF program when walking down the stack trace. Special care needs to be taken to skip over kernel and BPF subprog frames. We basically scan until we find a BPF main prog frame. The assumption is that if a program calls into us transitively, we'll hit it along the way. If not, we end up returning NULL. Contextually the function will be used in places where we know the program may have called into us. Due to reliance on arch_bpf_stack_walk(), this function only works on x86 with CONFIG_UNWINDER_ORC, arm64, and s390. Remove the warning from arch_bpf_stack_walk as well since we call it outside bpf_throw() context. Signed-off-by: Kumar Kartikeya Dwivedi <[email protected]>
Introduce a kernel function which is the analogue of dump_stack() printing some useful information and the stack trace. This is not exposed to BPF programs yet, but can be made available in the future. When we have a program counter for a BPF program in the stack trace, also additionally output the filename and line number to make the trace helpful. The rest of the trace can be passed into ./decode_stacktrace.sh to obtain the line numbers for kernel symbols. Signed-off-by: Kumar Kartikeya Dwivedi <[email protected]>
Begin reporting may_goto timeouts to BPF program's stderr stream. Make sure that we don't end up spamming too many errors if the program keeps failing repeatedly and filling up the stream, hence emit at most 512 error messages from the kernel for a given stream. Signed-off-by: Kumar Kartikeya Dwivedi <[email protected]>
Begin reporting rqspinlock deadlocks and timeout to BPF program's stderr. Signed-off-by: Kumar Kartikeya Dwivedi <[email protected]>
Begin reporting arena page faults and the faulting address to BPF program's stderr, for now limited to x86, but arm64 support should be easy to add. Signed-off-by: Kumar Kartikeya Dwivedi <[email protected]>
Introduce a new macro that allows printing data similar to bpf_printk(), but to BPF streams. The first argument is the stream ID, the rest of the arguments are same as what one would pass to bpf_printk(). Signed-off-by: Kumar Kartikeya Dwivedi <[email protected]>
Add bpftool support for dumping streams of a given BPF program. The syntax is `bpftool prog tracelog { stdout | stderr } PROG`. The stdout is dumped to stdout, stderr is dumped to stderr. Cc: Quentin Monnet <[email protected]> Signed-off-by: Kumar Kartikeya Dwivedi <[email protected]>
Add selftests to stress test the various facets of the stream API, memory allocation pattern, and ensuring dumping support is tested and functional. Create symlink to bpftool stream.bpf.c and use it to test the support to dump messages to ringbuf in user space, and verify output. Signed-off-by: Kumar Kartikeya Dwivedi <[email protected]>
e4e98c9
to
8b8229f
Compare
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
No description provided.