netdev CI testing #6666

kuba-moo · 2024-03-27T20:02:33Z

Reusable PR for hooking netdev CI to BPF testing.

Underneath "TIS Config" tag expose TIS diagnostic information. Expose the tisn of each TC under each lag port. $ sudo devlink health diagnose auxiliary/mlx5_core.eth.2/131072 reporter tx ...... TIS Config: lag port: 0 tc: 0 tisn: 0 lag port: 1 tc: 0 tisn: 8 ...... Signed-off-by: Feng Liu <[email protected]> Reviewed-by: Aya Levin <[email protected]> Signed-off-by: Tariq Toukan <[email protected]> Signed-off-by: NipaLocal <nipa@local>

TCA_MQPRIO_TC_ENTRY_INDEX is validated using NLA_POLICY_MAX(NLA_U32, TC_QOPT_MAX_QUEUE), which allows the value TC_QOPT_MAX_QUEUE (16). This leads to a 4-byte out-of-bounds stack write in the fp[] array, which only has room for 16 elements (0–15). Fix this by changing the policy to allow only up to TC_QOPT_MAX_QUEUE - 1. Fixes: f62af20 ("net/sched: mqprio: allow per-TC user input of FP adminStatus") Reported-by: Maher Azzouzi <[email protected]> Signed-off-by: Maher Azzouzi <[email protected]> Signed-off-by: NipaLocal <nipa@local>

The s390x ISM device data sheet clearly states that only one request-response sequence is allowable per ISM function at any point in time. Unfortunately as of today the s390/ism driver in Linux does not honor that requirement. This patch aims to rectify that. This problem was discovered based on Aliaksei's bug report which states that for certain workloads the ISM functions end up entering error state (with PEC 2 as seen from the logs) after a while and as a consequence connections handled by the respective function break, and for future connection requests the ISM device is not considered -- given it is in a dysfunctional state. During further debugging PEC 3A was observed as well. A kernel message like [ 1211.244319] zpci: 061a:00:00.0: Event 0x2 reports an error for PCI function 0x61a is a reliable indicator of the stated function entering error state with PEC 2. Let me also point out that a kernel message like [ 1211.244325] zpci: 061a:00:00.0: The ism driver bound to the device does not support error recovery is a reliable indicator that the ISM function won't be auto-recovered because the ISM driver currently lacks support for it. On a technical level, without this synchronization, commands (inputs to the FW) may be partially or fully overwritten (corrupted) by another CPU trying to issue commands on the same function. There is hard evidence that this can lead to DMB token values being used as DMB IOVAs, leading to PEC 2 PCI events indicating invalid DMA. But this is only one of the failure modes imaginable. In theory even completely losing one command and executing another one twice and then trying to interpret the outputs as if the command we intended to execute was actually executed and not the other one is also possible. Frankly, I don't feel confident about providing an exhaustive list of possible consequences. Fixes: 684b89b ("s390/ism: add device driver for internal shared memory") Reported-by: Aliaksei Makarau <[email protected]> Tested-by: Mahanta Jambigi <[email protected]> Tested-by: Aliaksei Makarau <[email protected]> Signed-off-by: Halil Pasic <[email protected]> Reviewed-by: Alexandra Winter <[email protected]> Signed-off-by: Alexandra Winter <[email protected]> Signed-off-by: NipaLocal <nipa@local>

Don't add _req to helper names for pure types. We don't currently print those so it makes no difference to existing codegen. Signed-off-by: Jakub Kicinski <[email protected]> Signed-off-by: NipaLocal <nipa@local>

Just to avoid making the main function even more enormous, before adding more things to print move the free printing to a helper which already prints the type. Signed-off-by: Jakub Kicinski <[email protected]> Signed-off-by: NipaLocal <nipa@local>

In general YNL provides allocation and free helpers for types. For pure nested structs which are used as multi-attr (and therefore have to be allocated dynamically) we already print a free helper as it's needed by free of the containing struct. Add printing of the alloc helper for consistency. The helper takes the number of entries to allocate as an argument, e.g.: static inline struct netdev_queue_id *netdev_queue_id_alloc(unsigned int n) { return calloc(n, sizeof(struct netdev_queue_id)); } Signed-off-by: Jakub Kicinski <[email protected]> Signed-off-by: NipaLocal <nipa@local>

For basic types we "flatten" setters. If a request "a" has a simple nest "b" with value "val" we print helpers like: req_set_a_b(struct a *req, int val) { req->_present.a = 1; req->b._present.val = 1; req->b.val = ... } This is not possible for multi-attr because they have to be allocated dynamically by the user. Print "object level" setters so that user preparing the object doesn't have to futz with the presence bits and other YNL internals. Add the ability to pass in the variable name to generated setters. Using "req" here doesn't feel right, while the attr is part of a request it's not the request itself, so it seems cleaner to call it "obj". Example: static inline void netdev_queue_id_set_id(struct netdev_queue_id *obj, __u32 id) { obj->_present.id = 1; obj->id = id; } Signed-off-by: Jakub Kicinski <[email protected]> Signed-off-by: NipaLocal <nipa@local>

Use the just-added YNL helpers instead of manually setting "_present" bits in the queue attrs. Compile tested only. Signed-off-by: Jakub Kicinski <[email protected]> Acked-by: Mina Almasry <[email protected]> Signed-off-by: NipaLocal <nipa@local>

My CC-adding automation returned nothing on a future patch to the include/linux/in6.h file, and I went looking for why. Add the missed in6.h to MAINTAINERS. Signed-off-by: Kees Cook <[email protected]> Signed-off-by: NipaLocal <nipa@local>

Add documentation clarifying that ARP and routing UAPI structures are constrained to IPv4-only usage, making them safe for the coming fixed-size sockaddr conversion (with the 14-byte struct sockaddr::sa_data). These are fine as-is, but their use was non-obvious to me, so I figured they could use a little more documentation: - struct arpreq: ARP protocol is IPv4-only by design - struct rtentry: Legacy IPv4 routing API, IPv6 uses different structures Signed-off-by: Kees Cook <[email protected]> Signed-off-by: NipaLocal <nipa@local>

There are cases in networking (e.g. wireguard, sctp) where a union is used to provide coverage for either IPv4 or IPv6 network addresses, and they include an embedded "struct sockaddr" as well (for "sa_family" and raw "sa_data" access). The current struct sockaddr contains a flexible array, which means these unions should not be further embedded in other structs because they do not technically have a fixed size (and are generating warnings for the coming -Wflexible-array-not-at-end flag addition). But the future changes to make struct sockaddr a fixed size (i.e. with a 14 byte sa_data member) make the "sa_data" uses with an IPv6 address a potential place for the compiler to get upset about object size mismatches. Therefore, we need a sockaddr that cleanly provides both an sa_family member and an appropriately fixed-sized sa_data member that does not bloat member usage via the potential alternative of sockaddr_storage to cover both IPv4 and IPv6, to avoid unseemly churn in the affected code bases. Introduce sockaddr_inet as a unified structure for holding both IPv4 and IPv6 addresses (i.e. large enough to accommodate sockaddr_in6). The structure is defined in linux/in6.h since its max size is sized based on sockaddr_in6 and provides a more specific alternative to the generic sockaddr_storage for IPv4 with IPv6 address family handling. The "sa_family" member doesn't use the sa_family_t type to avoid needing layer violating header inclusions. Signed-off-by: Kees Cook <[email protected]> Signed-off-by: NipaLocal <nipa@local>

As part of the removal of the variably-sized sockaddr for kernel internals, replace struct sockaddr with sockaddr_inet in the endpoint union. No binary changes; the union size remains unchanged due to sockaddr_inet matching the size of sockaddr_in6. Signed-off-by: Kees Cook <[email protected]> Signed-off-by: NipaLocal <nipa@local>

As part of the removal of the variably-sized sockaddr for kernel internals, replace struct sockaddr with sockaddr_inet in the sctp_addr union. No binary changes; the union size remains unchanged due to sockaddr_inet matching the size of sockaddr_in6. Signed-off-by: Kees Cook <[email protected]> Signed-off-by: NipaLocal <nipa@local>

When building, the following warnings will appear. " pci_irq.c: In function ‘mlx5_ctrl_irq_request’: pci_irq.c:494:1: warning: the frame size of 1040 bytes is larger than 1024 bytes [-Wframe-larger-than=] pci_irq.c: In function ‘mlx5_irq_request_vector’: pci_irq.c:561:1: warning: the frame size of 1040 bytes is larger than 1024 bytes [-Wframe-larger-than=] eq.c: In function ‘comp_irq_request_sf’: eq.c:897:1: warning: the frame size of 1080 bytes is larger than 1024 bytes [-Wframe-larger-than=] irq_affinity.c: In function ‘irq_pool_request_irq’: irq_affinity.c:74:1: warning: the frame size of 1048 bytes is larger than 1024 bytes [-Wframe-larger-than=] " These warnings indicate that the stack frame size exceeds 1024 bytes in these functions. To resolve this, instead of allocating large memory buffers on the stack, it is better to use kvzalloc to allocate memory dynamically on the heap. This approach reduces stack usage and eliminates these frame size warnings. Acked-by: Junxian Huang <[email protected]> Signed-off-by: Zhu Yanjun <[email protected]> Signed-off-by: NipaLocal <nipa@local>

The xdp_convert_buff_to_frame() function can return NULL when there is insufficient headroom in the buffer to store the xdp_frame structure or when the driver didn't reserve enough tailroom for skb_shared_info. Currently, the sfc driver does not check for this NULL return value in the XDP_TX case within efx_do_xdp(). While the efx_xdp_tx_buffers() function has some defensive checks, passing a NULL xdpf can still lead to undefined behavior when the function tries to access xdpf->len and xdpf->data. Fix by adding a proper NULL check in the XDP_TX case. If conversion fails, free the RX buffer and increment the bad drops counter, following the same pattern used for other XDP error conditions in this driver. Signed-off-by: Chenyuan Yang <[email protected]> Fixes: 1b698fa ("xdp: Rename convert_to_xdp_frame in xdp_convert_buff_to_frame") Signed-off-by: NipaLocal <nipa@local>

The xdp_convert_buff_to_frame() function can return NULL when there is insufficient headroom in the buffer to store the xdp_frame structure or when the driver didn't reserve enough tailroom for skb_shared_info. Currently, the sfc siena driver does not check for this NULL return value in the XDP_TX case within efx_do_xdp(). Fix by adding a proper NULL check in the XDP_TX case. If conversion fails, free the RX buffer and increment the bad drops counter, following the same pattern used for other XDP error conditions in this driver. Signed-off-by: Chenyuan Yang <[email protected]> Fixes: d48523c ("sfc: Copy shared files needed for Siena (part 2)") Signed-off-by: NipaLocal <nipa@local>

The xdp_convert_buff_to_frame() function can return NULL when there is insufficient headroom in the buffer to store the xdp_frame structure or when the driver didn't reserve enough tailroom for skb_shared_info. Currently, the otx2 driver does not check for this NULL return value in two critical paths within otx2_xdp_rcv_pkt_handler(): 1. XDP_TX case: Passes potentially NULL xdpf to otx2_xdp_sq_append_pkt() 2. XDP_REDIRECT error path: Calls xdp_return_frame() with potentially NULL This can lead to kernel crashes due to NULL pointer dereference. Fix by adding proper NULL checks in both paths. For XDP_TX, return false to indicate packet should be dropped. For XDP_REDIRECT error path, only call xdp_return_frame() if conversion succeeded, otherwise manually free the page. Please correct me if any error path is incorrect. This is similar to the commit cc3628d ("xen-netfront: handle NULL returned by xdp_convert_buff_to_frame()"). Signed-off-by: Chenyuan Yang <[email protected]> Fixes: 94c80f7 ("octeontx2-pf: use xdp_return_frame() to free xdp buffers") Signed-off-by: NipaLocal <nipa@local>

Move multiple copies of same code snippet doing `gro_flush` and `gro_normal_list` into separate helper function. Signed-off-by: Samiullah Khawaja <[email protected]> Reviewed-by: Willem de Bruijn <[email protected]> Signed-off-by: NipaLocal <nipa@local>

Prepare for adding an enum type for NAPI threaded states by adding netif_threaded_enable API. De-export the existing netif_set_threaded API and only use it internally. Update existing drivers to use netif_threaded_enable instead of the de-exported netif_set_threaded. Note that dev_set_threaded used by mt76 debugfs file is unchanged. Signed-off-by: Samiullah Khawaja <[email protected]> Signed-off-by: NipaLocal <nipa@local>

Instead of using '0' and '1' for napi threaded state use an enum with 'disabled' and 'enabled' states. Tested: ./tools/testing/selftests/net/nl_netdev.py TAP version 13 1..7 ok 1 nl_netdev.empty_check ok 2 nl_netdev.lo_check ok 3 nl_netdev.page_pool_check ok 4 nl_netdev.napi_list_check ok 5 nl_netdev.dev_set_threaded ok 6 nl_netdev.napi_set_threaded ok 7 nl_netdev.nsim_rxq_reset_down # Totals: pass:7 fail:0 xfail:0 xpass:0 skip:0 error:0 Signed-off-by: Samiullah Khawaja <[email protected]> Signed-off-by: NipaLocal <nipa@local>

Signed-off-by: Jakub Kicinski <[email protected]> Signed-off-by: NipaLocal <nipa@local>

tc_actions.sh keeps hanging the forwarding tests. sdf@: tdc & tdc-dbg started intermittenly failing around Sep 25th Signed-off-by: NipaLocal <nipa@local>

Signed-off-by: Jakub Kicinski <[email protected]> Signed-off-by: NipaLocal <nipa@local>

Signed-off-by: NipaLocal <nipa@local>

Signed-off-by: Jakub Kicinski <[email protected]> Signed-off-by: NipaLocal <nipa@local>

kuba-moo force-pushed the to-test branch from 6bd5e75 to bdd05e2 Compare March 27, 2024 21:49

kernel-patches-daemon-bpf bot force-pushed the bpf-next_base branch 3 times, most recently from 4f22ee0 to 8a9a8e0 Compare March 28, 2024 04:46

kuba-moo force-pushed the to-test branch 11 times, most recently from 64c403f to 8da1f58 Compare March 29, 2024 00:01

kernel-patches-daemon-bpf bot force-pushed the bpf-next_base branch 3 times, most recently from 78ebb17 to 9325308 Compare March 29, 2024 02:14

kuba-moo force-pushed the to-test branch 6 times, most recently from c8c7b2f to a71aae6 Compare March 29, 2024 18:01

kernel-patches-daemon-bpf bot force-pushed the bpf-next_base branch from 9325308 to 7940ae1 Compare March 29, 2024 18:12

kuba-moo force-pushed the to-test branch 2 times, most recently from d8feb00 to b16a6b9 Compare March 30, 2024 00:01

kernel-patches-daemon-bpf bot force-pushed the bpf-next_base branch from 7940ae1 to 8f1ff3c Compare March 30, 2024 00:21

kuba-moo force-pushed the to-test branch 2 times, most recently from 4164329 to c5cecb3 Compare March 30, 2024 06:00

Feng Liu and others added 29 commits July 22, 2025 20:00

tools: ynl-gen: don't add suffix for pure types

472ff01

Don't add _req to helper names for pure types. We don't currently print those so it makes no difference to existing codegen. Signed-off-by: Jakub Kicinski <[email protected]> Signed-off-by: NipaLocal <nipa@local>

MAINTAINERS: Add in6.h to MAINTAINERS

352638c

My CC-adding automation returned nothing on a future patch to the include/linux/in6.h file, and I went looking for why. Add the missed in6.h to MAINTAINERS. Signed-off-by: Kees Cook <[email protected]> Signed-off-by: NipaLocal <nipa@local>

timestamp - try waking [local patch]

d9a4e7b

Signed-off-by: Jakub Kicinski <[email protected]> Signed-off-by: NipaLocal <nipa@local>

selftests: net: enable profiling [local patch]

48d1331

Signed-off-by: Jakub Kicinski <[email protected]> Signed-off-by: NipaLocal <nipa@local>

forwarding: set timeout to 3 hours [local patch]

b8e88ec

tc_actions.sh keeps hanging the forwarding tests. sdf@: tdc & tdc-dbg started intermittenly failing around Sep 25th Signed-off-by: NipaLocal <nipa@local>

tc_action dbg [local patch]

8c6184b

Signed-off-by: Jakub Kicinski <[email protected]> Signed-off-by: NipaLocal <nipa@local>

drv: net: add timeout [local patch]

3f6b416

Signed-off-by: NipaLocal <nipa@local>

dbg: tests: bonding: print info on failure [local patch]

8096cf8

Signed-off-by: NipaLocal <nipa@local>

profile patch [local patch]

63d8d77

Signed-off-by: Jakub Kicinski <[email protected]> Signed-off-by: NipaLocal <nipa@local>

disable random kunit tests [local patch]

52261cf

Signed-off-by: Jakub Kicinski <[email protected]> Signed-off-by: NipaLocal <nipa@local>

Merge branch 'net-next-2025-07-23--03-00' into HEAD

2e3460b

kuba-moo force-pushed the to-test branch from c6794f1 to 2e3460b Compare July 23, 2025 03:01

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

netdev CI testing #6666

netdev CI testing #6666

kuba-moo commented Mar 27, 2024

Uh oh!

Uh oh!

netdev CI testing #6666

Are you sure you want to change the base?

netdev CI testing #6666

Conversation

kuba-moo commented Mar 27, 2024

Uh oh!

Uh oh!