Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix tracing tc-bpf with --filter-track-skb-by-stackid #452

Merged
merged 1 commit into from
Nov 28, 2024

Conversation

Asphaltt
Copy link
Contributor

@Asphaltt Asphaltt commented Nov 7, 2024

For tracing tc-bpf (same for XDP), it is able to get FP from r10 to track skb by stack ID.

@jschwinger233 could you give me a hand to test this PR by pwru --filter-trace-tc --filter-track-skb-by-stackid?

@Asphaltt Asphaltt requested a review from a team as a code owner November 7, 2024 12:45
@Asphaltt Asphaltt requested review from jschwinger233 and removed request for a team November 7, 2024 12:45
@jschwinger233
Copy link
Member

Sorry but what's the problem you are fixing? 😬 Just went through issues but failed to find one, though I do remember there was an issue regarding --trace-tc + --bpf-helpers. 😭

Running pwru --filter-track-bpf-helpers --output-caller --filter-trace-tc 'src port 19233 and tcp[tcpflags]=tcp-syn' from main looks good to me...

@Asphaltt
Copy link
Contributor Author

Asphaltt commented Nov 8, 2024

Sorry. Should I raise an issue?

The issue was avoided by 5455e74.

Then, this PR is to fix the issue actually.

@jschwinger233
Copy link
Member

jschwinger233 commented Nov 8, 2024 via email

Copy link
Member

@jschwinger233 jschwinger233 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm afraid it doesn't work 😬

I used two different approaches to verify, both gave negative result.

Let me show the intuitive way:

diff --git a/bpf/kprobe_pwru.c b/bpf/kprobe_pwru.c
index a69f020..1674982 100644
--- a/bpf/kprobe_pwru.c
+++ b/bpf/kprobe_pwru.c
@@ -88,6 +88,7 @@ struct event_t {
        u64 param_second;
        u64 param_third;
        u32 cpu_id;
+
 } __attribute__((packed));
 
 #define MAX_QUEUE_ENTRIES 10000
@@ -508,6 +509,8 @@ cont:
        event->ts = bpf_ktime_get_ns();
        event->cpu_id = bpf_get_smp_processor_id();
 
+       event->print_stack_id = tracked_by;
+
        return true;
 }
 
diff --git a/internal/pwru/output.go b/internal/pwru/output.go
index 254b906..d3b128a 100644
--- a/internal/pwru/output.go
+++ b/internal/pwru/output.go
@@ -435,6 +435,7 @@ func (o *output) Print(event *Event) {
                fmt.Fprintf(o.writer, "%s", getShinfoData(event, o))
        }
 
+       fmt.Fprintf(o.writer, " %d", event.PrintStackId)
        fmt.Fprintln(o.writer)
 }

This patch unconditionally prints tracked_by variable from kernel to show the track reason.

With this PR + patch above, pwru --filter-track-skb-by-stackid --output-caller --filter-trace-tc 'src port 19233 and tcp[tcpflags]=tcp-syn' outputs:

0xffff91c8bbe772e8 2   ~bin/curl:192104 4026533015 0        ~010d98928d56:13 0x0800 1500  60    10.244.1.216:19233->10.244.3.176:8080(tcp) enqueue_to_backlog      netif_rx_internal 4
0xffff91c8bbe772e8 2   ~bin/curl:192104 4026533015 0        ~010d98928d56:13 0x0800 1500  60    10.244.1.216:19233->10.244.3.176:8080(tcp) __netif_receive_skb     process_backlog 4
0xffff91c8bbe772e8 2   curl:192104      4026533015 0        ~010d98928d56:13 0x0800 1500  60    10.244.1.216:19233->10.244.3.176:8080(tcp) __netif_receive_skb_one_core __netif_receive_skb 4
0xffff91c8bbe772e8 2   curl:192104      4026533015 0        ~010d98928d56:13 0x0800 1500  74    10.244.1.216:19233->10.244.3.176:8080(tcp) bpf_prog_a73bb86d02975a6f_cil_from_container[bpf](tc) __per_cpu_start 1
0xffff91c8bbe772e8 2   curl:192104      4026533015 0        ~010d98928d56:13 0x0800 1500  74    10.244.1.216:19233->10.244.3.176:8080(tcp) skb_ensure_writable                                   bpf_skb_pull_data 4
0xffff91c8bbe772e8 2   curl:192104      4026533015 0        ~010d98928d56:13 0x0800 1500  74    10.244.1.216:19233->10.244.3.176:8080(tcp) skb_ensure_writable                                   bpf_skb_pull_data 4
0xffff91c8bbe772e8 2   curl:192104      4026533015 81903e00 ~010d98928d56:13 0x0800 1500  60    10.244.1.216:19233->10.244.3.176:8080(tcp) ip_rcv                                                __netif_receive_skb_one_core 4

Have a look at the last colume of each line, which shows track reason of each event. The tc event still uses TRACKED_BY_FILTER (1) instead of TRACKED_BY_STACKID (4), demonstrating that get_stackid(is_kprobe=false) didn't unwind stack properly.

@Asphaltt
Copy link
Contributor Author

Asphaltt commented Nov 8, 2024

get_stackid(is_kprobe=false) didn't unwind stack properly.

Bad news.

But, why doesn't work expectedly?

Then, I write a simple demo to disasm

static __always_inline u64
get_tracing_fp(void)
{
    u64 fp;

    /* get frame pointer */
    asm volatile ("%[fp] = r10" : [fp] "+r"(fp) :);
    return fp;
}

It shows:

; lbr.c:27:5 asm volatile ("%[fp] = r10" : [fp] "+r"(fp) :);
0xffffffffc03ebc34: 48 89 ea            movq    %rbp, %rdx

And the stack layout at the time of executing get_tracing_fp() should be:

+-------+
|  ...  | callers stack
|       |
|  rip  | caller's rip
|  rip  | tgt prog's rip
|  rbp  | tgt prog's caller's rbp
+- - - -+ trampoline's rbp
|  ...  |
|       |
|  rip  | trampoline's rip
|  rbp  | trampoline's rbp
+- - - -+ fentry_tc's rbp   <-- fp
|  ...  |
|       |
|       |
|-------| rsp

Next, %[fp] = r10 should get fentry_tc's rbp.

Therefore, it should work expectedly.

What are we missing here?

@Asphaltt
Copy link
Contributor Author

Asphaltt commented Nov 8, 2024

@jschwinger233 can you bpf_printk() the unwinding FPs to compare FPs between kprobe and fentry_tc?

@jschwinger233
Copy link
Member

fp and stackid are correct. It's due to an issue of unshared bpf map which can be fixed by #459

@brb brb requested a review from jschwinger233 November 27, 2024 13:58
@jschwinger233
Copy link
Member

Can you do a rebase to include #459 (so I can test manually 😬 ), thank you!

For tracing tc-bpf (same for XDP), it is able to get FP from r10 to
track skb by stack ID.

Signed-off-by: Leon Hwang <[email protected]>
@Asphaltt Asphaltt force-pushed the fix/trace-tc-with-stackid branch from abcd431 to efcaa70 Compare November 27, 2024 14:23
@Asphaltt
Copy link
Contributor Author

Can you do a rebase to include #459

Done

Copy link
Member

@jschwinger233 jschwinger233 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💯

@jschwinger233 jschwinger233 merged commit 55bdaac into cilium:main Nov 28, 2024
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants