Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: Random ptf verify packet failures caused by PR#15349 #16585

Open
congh-nvidia opened this issue Jan 20, 2025 · 3 comments · May be fixed by #16669
Open

[Bug]: Random ptf verify packet failures caused by PR#15349 #16585

congh-nvidia opened this issue Jan 20, 2025 · 3 comments · May be fixed by #16669
Labels

Comments

@congh-nvidia
Copy link
Contributor

congh-nvidia commented Jan 20, 2025

Issue Description

We observe random ptf verify packet failures recently, and after the RCA, it was found that PR(#15349) introduced the issue in:

count = 1
while count in used_index:
count = count + 1
if backplane_exist:
iface_map[count] = "backplane"

This change added the ptf backplane interface to an unused index of the ptf adapter port list. This causes issue in the methods like ptf.testutils.verify_packet_any_port()
Methods like verify_packet_any_port() not only validate the packet is received by the expected ports, but also validate it's not received by the unexpected ports.
The problem is, when the test packet dst IP matches the IP prefix advertised by the exabgp, the ptf backplane interface will receive the test packet from the neighbor VM. The reason is the routes are advertised by exabgp to VM through the ptf backplane interface.

ptf backplane interface:

root@70229c21e672:~# ifconfig 
backplane: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
        inet 10.10.246.254  netmask 255.255.255.0  broadcast 0.0.0.0
        inet6 fc0a::ff  prefixlen 64  scopeid 0x0<global>
        inet6 fc0a::ff  prefixlen 64  scopeid 0x0<global>
        ether 1a:f8:ed:0e:10:38  txqueuelen 1000  (Ethernet)
        RX packets 248605  bytes 162615977 (155.0 MiB)
        RX errors 0  dropped 282  overruns 0  frame 0
        TX packets 214828  bytes 38056135 (36.2 MiB)
        TX errors 0  dropped 20 overruns 0  carrier 0  collisions 0

Exabgp route in VM:

ARISTA01T0(config)#show ip route 192.168.0.253

VRF: default
Codes: C - connected, S - static, K - kernel, 
       O - OSPF, IA - OSPF inter area, E1 - OSPF external type 1,
       E2 - OSPF external type 2, N1 - OSPF NSSA external type 1,
       N2 - OSPF NSSA external type2, B - BGP, B I - iBGP, B E - eBGP,
       R - RIP, I L1 - IS-IS level 1, I L2 - IS-IS level 2,
       O3 - OSPFv3, A B - BGP Aggregate, A O - OSPF Summary,
       NG - Nexthop Group Static Route, V - VXLAN Control Service,
       DH - DHCP client installed default route, M - Martian,
       DP - Dynamic Policy Route, L - VRF Leaked,
       G  - gRIBI, RC - Route Cache Route

 B I      192.168.0.128/25 [200/0] via 10.10.246.254, Ethernet5

Ethernet5 is the interface of Arista VM connecting to ptf backplane interface.
This issue can be reproduced easily by running the acl test tests/acl/test_acl.py.

Results you see

When the test case fails due to this issue, you will see error like this:

AssertionError: Received expected packet on port 32 for device 0, but it should have arrived on one of these ports: [16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31].
========== RECEIVED ==========
0000  AE 03 1D 2B FC EB E2 CF E6 ED 0D 8B 86 DD 60 00  ...+..........`.
0010  00 00 00 2E 7E 3E 60 C0 A8 00 00 00 00 00 00 00  ....~>`.........
0020  00 00 00 00 00 05 20 C0 A8 00 00 00 00 00 00 00  ...... .........
0030  00 00 00 00 00 14 43 21 00 51 00 00 00 00 00 00  ......C!.Q......
0040  00 00 50 02 20 00 5D A2 00 00 74 65 73 74 5F 61  ..P. .]...test_a
0050  63 6C 20 74 65 73 74 5F 61 63 6C 20 74 65 73 74  cl test_acl test
0060  5F 61 63 6C                                      _acl
==============================

Here the testbed has only 32 dataplane interfaces mapping to ptf index 0-31, the 32 is the ptf backplane interface.
When running the function ptf.testutils.verify_packet_any_port(), if the packet received by the expected port is before the one received by the backplane interface, the test can pass. But if the packet received by the backplane interface arrives first, the test fails.

Results you expected to see

All the affected tests, which are using methods like testutils.verify_packet_any_port/verify_packet_any_port, should pass.

Is it platform specific

generic

Relevant log output

Output of show version

Attach files (if any)

No response

@congh-nvidia
Copy link
Contributor Author

Hi @wangxin @eddieruan-alibaba, could you please review this issue, it affects multiple tests.

@eddieruan-alibaba
Copy link
Contributor

Please assign it to @LARLSN from my team. We will add a topo check and only add backplane port in our topology.

@LARLSN LARLSN linked a pull request Jan 24, 2025 that will close this issue
10 tasks
@LARLSN
Copy link
Contributor

LARLSN commented Jan 24, 2025

fixed in #16669

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants