Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bpf: support XDP metadata #6

Open
wants to merge 1 commit into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion firmware/apps/nic/datapath.uc
Original file line number Diff line number Diff line change
Expand Up @@ -41,7 +41,7 @@ actions#:
actions_execute(pkt_vec, egress#)

ebpf_reentry#:
ebpf_reentry()
ebpf_reentry(pkt_vec)

#pragma warning(disable: 4702)
fatal_error("MAIN LOOP EXIT")
13 changes: 12 additions & 1 deletion firmware/apps/nic/ebpf.uc
Original file line number Diff line number Diff line change
Expand Up @@ -134,7 +134,7 @@ ebpf_init_cap_finalize()

#define _ebpf_pkt_vec *l$index1

#macro ebpf_reentry()
#macro ebpf_reentry(in_vec)
.begin
.reg egress_q_base
.reg stat
Expand All @@ -143,6 +143,10 @@ ebpf_init_cap_finalize()
.reg_addr ebpf_rc 0 A
.set ebpf_rc
.reg rc
.reg xdp_meta_len

// compute XDP meta length and save the meta in local memory
pv_save_xdp_meta(xdp_meta_len, in_vec)

pv_restore_meta_lm_ptr(_ebpf_pkt_vec)

Expand All @@ -160,6 +164,13 @@ ebpf_init_cap_finalize()
pv_set_tx_flag(_ebpf_pkt_vec, BF_L(PV_TX_HOST_RX_BPF_bf))
pv_invalidate_cache(_ebpf_pkt_vec)

// store xdp meta length if it is non-zero
alu[--, --, B, xdp_meta_len]
beq[pass#]
pv_meta_prepend(in_vec, xdp_meta_len)
pv_meta_push_type__sz1(in_vec, NFP_NET_META_XDP_META_LEN)
pass#:

__actions_restore_t_idx()
br_bset[rc, EBPF_RET_PASS, actions#]

Expand Down
123 changes: 120 additions & 3 deletions firmware/apps/nic/pv.uc
Original file line number Diff line number Diff line change
Expand Up @@ -74,23 +74,27 @@ passert(PV_MAX_CLONES, "EQ", 2)
* +-+---+-----+-------------------+-----+-------------------------+
* 10 |A| 0 | Packet Number | 0 | Offset | 2
* +-+---------+-------------------+-----+---------+---------------+
* 11 | Sequence Number | --- | Seq Ctx | Protocol | 3
* 11 | Sequence Number | XML | Seq Ctx | Protocol | 3
* +-------------------------------+-+-+-+---------+---+-+-+-+-+-+-+
* 12 | TX Host Flags |M|B|Seek (64B algn)|-|Q|I|i|C|c| 4
* 12 | TX Host Flags |M|B|Seek (64B algn)|X|Q|I|i|C|c| 4
Comment on lines -77 to +79
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These changes will clash with other work Netronome has incoming. Specifically, we want to grow the Seq Ctx field by one bit, and we have a different use for the X bit. We might have to add another word to the PV, which has a knock-on effect everywhere else. @mrapson-netronome and I are discussing this.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, I was ready to re-implement the layout when pushing the patch. For me ideal would be to have at least 4+ bits to store the length of XDP meta (in words).

XDP meta is now <= 32 bytes and in 4 bytes chunks, so I need at least 4 bits to store the length. Maybe in future it will be extended to 64 or 128 bytes (still in 4 bytes chunks though), so I would like to have 6+ bits, so that we won't have to change firmware/kernel ABI for this change.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The situation is that we are running out of local memory on the datapath ME's:
This is the layout without your change:

512                 i32.me0._PKT_IO_PKT_VEC
256                 i32.me0.gro_cli_lm_ctx
32                  i32.me0.__actions_sriov_keys
16                  i32.me0.nfd_in_ring_info
16                  i32.me0.nfd_out_ring_info
2048                i32.me0.EBPF_STACK_BASE
Total used 2880 of 4096B

With your change:

1024                i32.me0._PKT_IO_PKT_VEC
256                 i32.me0.gro_cli_lm_ctx
32                  i32.me0.__actions_sriov_keys
16                  i32.me0.nfd_in_ring_info
16                  i32.me0.nfd_out_ring_info
2048                i32.me0.EBPF_STACK_BASE
Total used 3392B of 4096B

You will notice that changing PV_SIZE_LW from 16 to 24 doubles the _PKT_IO_PKT_VEC size. That is because it must be a power-of-2 size. See the alloc logic in pv.uc line 304. The upshot is that there is 8 more words available in the PV that is not shown in the ASCII art. Perhaps you can move your extra fields to this unused section?

If we add to the above the features that Netronome plans to introduce soon:

1024                i32.me0._PKT_IO_PKT_VEC
640                 i32.me0.gro_cli_lm_ctx
16                  i32.me0.nfd_in_ring_info
16                  i32.me0.nfd_out_ring_info
32                  i32.me0._PKT_IO_RX_META
256                 i32.me0.__actions_keys <- had to "break" alignment
2048                i32.me0.EBPF_STACK_BASE
Total used 4032B of 4096B

You can see that we are really out of LM, (and I have to violate the alignment requirement of symbol __actions_keys). Thus there really isn't space to grow the XDP metadata.

If we need to go above 32B, we'll have to consider alternatives, like maintaining the metadata in CLS, or directly in the packet buffer.

* +-------------------------------+-+-+---------------+-+-+-+-+-+-+
* 13 | 8B Header Offsets (stacked outermost to innermost) | 5
* +-----------------+-----+-----------------------+---------------+
* 14 | Ingress Queue | LMP | VLAN ID | Queue Offset | 6
* +-----------------+-----+-----------------------+---------------+
* 15 | Metadata Type Fields | 7
* +---------------------------------------------------------------+
* 16-23 | XDP Metadata | 0-7 <- ACTIVE_LM_ADDR_2 ()
* +---------------------------------------------------------------+
Comment on lines +87 to +88
Copy link
Contributor

@JohanM84 JohanM84 Nov 22, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to lock down LM addr2 for this? Are we sure it's not used elsewhere?

Edit: I think you have to update pv_push and pv_pop as well?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am allocating data by extending PV_SIZE_LW to 24 words. Shouldn't pv_push and pv_pop work as is?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe... It certainly looks like it from inspecting the code. (Note to self, write a unit test to confirm this)

*
* 0 - Intentionally zero for efficient extraction and manipulation
* S - Split packet
* A - 1 if CTM buffer is allocated (ie. packet number and CTM address valid)
* CBS - CTM Buffer Size
* BLS - Buffer List
* XML - XDP Meta Length = (XML + 1) << 2
* P - Packet pending (multicast)
* X - XDP Meta is present
* Q - Queue offset selected (overrides RSS)
* V - One or more VLANs present
* M - dest MAC is multicast
Expand Down Expand Up @@ -131,7 +135,7 @@ passert(PV_MAX_CLONES, "EQ", 2)
* +-+---------------+
*/

#define PV_SIZE_LW 16
#define PV_SIZE_LW 24

#define PV_LENGTH_wrd 0
#define PV_CTM_ISL_bf PV_LENGTH_wrd, 31, 26
Expand All @@ -151,6 +155,7 @@ passert(PV_MAX_CLONES, "EQ", 2)

#define PV_SEQ_wrd 3
#define PV_SEQ_NO_bf PV_SEQ_wrd, 31, 16
#define PV_XDP_META_LEN_bf PV_SEQ_wrd, 15, 13
#define PV_SEQ_CTX_bf PV_SEQ_wrd, 12, 8
#define PV_PROTO_bf PV_SEQ_wrd, 7, 0
#define PV_PROTO_IPV4_bf PV_SEQ_wrd, 1, 1
Expand Down Expand Up @@ -178,6 +183,7 @@ passert(PV_MAX_CLONES, "EQ", 2)
#define PV_MAC_DST_MC_bf PV_FLAGS_wrd, 15, 15
#define PV_MAC_DST_BC_bf PV_FLAGS_wrd, 14, 14
#define PV_SEEK_BASE_bf PV_FLAGS_wrd, 13, 6
#define PV_XDP_META_bf PV_FLAGS_wrd, 5, 5
#define PV_QUEUE_SELECTED_bf PV_FLAGS_wrd, 4, 4
#define PV_CSUM_OFFLOAD_bf PV_FLAGS_wrd, 3, 0
#define PV_CSUM_OFFLOAD_IL3_bf PV_FLAGS_wrd, 3, 3
Expand Down Expand Up @@ -450,6 +456,7 @@ passert(PV_MAX_CLONES, "EQ", 2)
.reg meta_base
.reg ref_cnt
.reg write $meta[9]
.reg xdp_meta_len
.xfer_order $meta
.sig sig_meta

Expand All @@ -464,6 +471,9 @@ passert(PV_MAX_CLONES, "EQ", 2)
alu[out_meta_len, --, B, ref_cnt, <<2]
alu[lm_ptr, lm_ptr, -, out_meta_len]

pv_restore_xdp_meta(xdp_meta_len, in_vec)
alu[out_meta_len, out_meta_len, +, xdp_meta_len]

alu[idx, 8, -, ref_cnt]
jump[idx, t8#], targets[t8#, t7#, t6#, t5#, t4#, t3#, t2#, t1#], defer[3]
local_csr_wr[ACTIVE_LM_ADDR_2, lm_ptr]
Expand Down Expand Up @@ -505,6 +515,113 @@ end#:
.end
#endm

#macro pv_get_xdp_meta_len(out_len, in_vec, RET_LABEL)
.begin

.reg xml

br_bclr[BF_AL(in_vec, PV_XDP_META_bf), RET_LABEL], defer[1]
immed[out_len, 0]

bitfield_extract__sz1(xml, BF_AML(in_vec, PV_XDP_META_LEN_bf))
alu[out_len, xml, +, 1]
alu[out_len, --, B, out_len, <<2]

.end
#endm

#macro pv_save_xdp_meta(xdp_meta_len, in_vec)
.begin
.reg offset
.reg lm_ptr
.reg in_pkt_addr_hi
.reg in_pkt_addr_lo
.reg xdp_meta_base
.reg read $tr
.sig sig_rd

pv_get_xdp_meta_len(xdp_meta_len, in_vec, end#)
pv_get_base_addr(in_pkt_addr_hi, in_pkt_addr_lo, in_vec)
alu[xdp_meta_base, in_pkt_addr_lo, -, xdp_meta_len]

local_csr_rd[ACTIVE_LM_ADDR_1]
immed[lm_ptr, 0]
alu[lm_ptr, lm_ptr, +, 32]
local_csr_wr[ACTIVE_LM_ADDR_2, lm_ptr]

alu[offset, 32, -, xdp_meta_len] // jump offset
jump[offset, t8#], targets[t8#, t7#, t6#, t5#, t4#, t3#, t2#, t1#]

#define LOOP 8
#while (LOOP > 1)

t/**/LOOP#:
mem[read32, $tr, in_pkt_addr_hi, <<8, xdp_meta_base, 1], ctx_swap[sig_rd]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am guessing we do not have enough xfer registers to indirect_ref xdp_meta_base read this in a block?
This must be the case since this macro is called from a block of code that calls pv_invalidate_cache

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this is the case. What is the right way to fix this?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you can to re-use $__pv_pkt_data[32]. This is used to cache packet header data. When you use pv_save_xdp_meta in ebf_reentry, you can do pv_invalidate_cache(in_vec) afterward. On the next pv_seek the packet header will be re-read.

alu[*l$index2++, --, B, $tr]
alu[xdp_meta_base, xdp_meta_base, +, 4]
nop

#define_eval LOOP (LOOP - 1)
#endloop
#undef LOOP

t1#:
mem[read32, $tr, in_pkt_addr_hi, <<8, xdp_meta_base, 1], ctx_swap[sig_rd]
alu[*l$index2++, --, B, $tr]

end#:

.end
#endm

#macro pv_restore_xdp_meta(xdp_meta_len, in_vec)
.begin

.reg in_pkt_addr_hi
.reg in_pkt_addr_lo
.reg xdp_meta_base
.reg idx
.reg lm_ptr
.reg ref_cnt
.reg write $tr[8]
.xfer_order $tr
.sig sig_wr

aggregate_directive(.set_wr, $tr, 8)

pv_get_xdp_meta_len(xdp_meta_len, in_vec, end#)
pv_get_base_addr(in_pkt_addr_hi, in_pkt_addr_lo, in_vec)
alu[xdp_meta_base, in_pkt_addr_lo, -, xdp_meta_len]

local_csr_rd[ACTIVE_LM_ADDR_1]
immed[lm_ptr, 0]
alu[lm_ptr, lm_ptr, +, (32-4)]
alu[lm_ptr, lm_ptr, +, xdp_meta_len]
local_csr_wr[ACTIVE_LM_ADDR_2, lm_ptr]

alu[ref_cnt, --, B, xdp_meta_len, >>2]
alu[idx, 8, -, ref_cnt]
jump[idx, t7#], targets[t7#, t6#, t5#, t4#, t3#, t2#, t1#, t0#]

#define LOOP 7
#while (LOOP > 0)
t/**/LOOP#:
alu[$tr[LOOP], --, B, *l$index2--]
#define_eval LOOP (LOOP - 1)
#endloop
#undef LOOP
t0#:

ov_single(OV_LENGTH, ref_cnt, OVF_SUBTRACT_ONE)
#pragma warning(disable:5009)
mem[write32, $tr[0], in_pkt_addr_hi, <<8, xdp_meta_base, max_8], indirect_ref, ctx_swap[sig_wr], defer[1]
alu[$tr[0], --, B, *l$index2--]
#pragma warning(default:5009)

end#:

.end
#endm

#macro pv_multicast_init(io_vec, in_bls, CONTINUE_LABEL)
.begin
Expand Down