-
Notifications
You must be signed in to change notification settings - Fork 15
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Deployed to two servers, client reported an error during testing #10
Comments
Hello. The basic usage you show should be fine, but there are a couple things to also check. In addition to port 25000 being reachable on the server, the server must also be reachable on all UDP ephemeral ports (32768 - 60999 as of the Linux 2.4 kernel, available via cat /proc/sys/net/ipv4/ip_local_port_range). This is because after the setup request is received by the server on the control port (25000) it will open a UDP ephemeral port for the data traffic. The client will need to be able to finish the setup to that port and then use it for the data transfer. So, if there's a firewall in front of the server it generally needs to open 25000 and 32768 - 60999 for UDP. Another suggestion, in case your server has multiple routing options on more than one NIC, is to bind the server to the local NIC interface address. Just specify the local IP address on the server command line (e.g., ./udpst ). If neither helps your issue can you please provide the output (from both sides) with the "-v -D" options. Thanks |
Your error of "LOCAL WARNING: Incoming traffic has completely stopped", when doing an upstream test, seem to indicate that the status feedback messages from the server are not making it back to the client. Can you try these options...
|
The one thing I mentioned is to provide the local address for the server to bind to (instead of "Awaiting setup requests on :::25000"). So, on the server provide the local IP address "./udpst -v -D -j 10.53.5.253" (or whatever the local IP address of the test interface is). This will make sure that the server only communicates over that specific interface (which can be an issue when a server has multiple interfaces). Also, on both machines try testing locally as a sanity check. So, in one terminal window do "./udpst -v -D -j 127.0.0.1" for the server and in another terminal window (on the same machine) do "./udpst -v -D -j -u 127.0.0.1" for the client. Make sure this runs as expected. From the output you provided it's difficult to see what the issue might be, it looks like traffic starts and then stops. Assuming the previous local testing works, there could be a problem with UDP between those devices. It may be worth trying another tool such as iPerf to test UDP between those machines. Make sure to test with UDP for a valid comparison. Lastly, I see you're only doing upstream tests. Have you tried downstream tests? Do they work? |
Thanks for the detailed info. I don't think the issue is NAT/PAT since the setup handshake (setup request/response and test activation request/response) seems to be getting processed. Instead, I would guess that your issue has to do with your company's network security devices (firewall, intrusion detector, packet inspector,...) since it appears that load traffic starts but is then shutdown. After that the software watchdog expires and the test is ended. This isn't be very surprising with advanced security devices since the load traffic we test with can be easily mistaken for a DOS/DDOS attack (i.e., a high rate UDP stream). |
Thank you for your concern, can you tell from the output of the client as well as the server side that it is due to the client not receiving a response? |
You are basically correct. In the client output for the downstream test (DEBUG Status Feedback...) you can see that the Mbps values are greater than 0.00 at first and the RTTVar is not -1 for a few samples. That means there was some initial traffic in both directions. But quickly the Mbps goes to 0.00 and a little bit later you see the "traffic has completely stopped" message - so it looks like the load traffic got blocked. In the upstream direction it's a little harder to see because it appears that no load PDUs ever make it to the server ("Skipping status transmission..."). Again, I can understand how a security device might see the downstream test as a DOS attack after the load starts to ramp up. And an upstream test may be seen as an infected company machine (or bot) that is sending out to the Internet. And since our port number and protocol is not well known (especially for UDP), a security device can't really validate it like it might a standard Ookla test. So just to summarize, the protocol is setup to only allow 3 seconds of no datagram reception (in either direction) before the software aborts the test. |
Well, for an upload test it makes sense that the server capture wouldn't show any transmissions because it is waiting for an initial load PDU before sending the first status feedback message. However, for an upload test, a capture on the client machine should show load PDUs start to go out after the Test Activation response is received (they would use the same port numbers as the Test Activation PDUs). The client wouldn't know that the network might block those packets. Is it possible that the capture is missing or filtering the load PDUs? Can you test the capture mechanism with an upload test running to another machine locally (to make sure that it sees the packets when a test is working)? |
Hi, I just want to mention that after updating from udpst 7.X.X to the new version, we've experienced the same problem (Minimum connection required (1) / Incoming traffic has stopped completely) on a lot of different hardware platforms. We've tried the following things to get it working:
We also experienced that the upload test sometimes only reaches up to 10 Mbits or gives back a totally wrong value (~ 4 Billion MBits). The exact same firmware with the old version is working fine on all these devices. By now, we are communicating to all our customers to keep using the old (7.X.X) version (this works really well here). And the moment we are busy with a lot other featueres, so the minium work is it to continue with 7.X.X - but for the future it would be nice to support the newest version of the protocol. Thanks a lot for the good work |
Thank you for the feedback. We'd certainly like to resolve these if possible. As for the large values being returned on the MaxLinear Seale grx550, it may be due to the introduction of a 64-bit value ("uint64_t rxBytes; // Received bytes") in the "subIntStats" structure -- and specifically the conversion to/from network byte order. Can you confirm the endianness of that system via: lscpu | grep "Byte Order" If not little endian, and you're willing to do a simple code change to test this, you could modify the ntohll(x)/htonll(x) macros in udpst_common.h to test for it (as it should). Something like...
Thanks |
I just wanted to do a follow-up on that first issue (large values shown). I completely understand that you are very busy, but would you be able to do a quick check on the endianness of the MaxLinear Seale grx550 via a: lscpu | grep "Byte Order" Thank you |
Hi, sorry for the delay. The fix in your comment above solved the issue with the "big result". Now we are running into the same problem than on "Hawkeye 2.2 GHz, qcaarmv8", we are measuring only up to 10mbits (not more) in upstream direction. A log file with the "broken" and the "corrected - but slow" test is attached updated test table (still other open issues); Let me know if I can help with further tests or data |
Thank you for the confirmation. We'll make sure that fix is part of the next point release. As for a low upstream rate, the first thing is to confirm that you should disable jumbo frames via "-j" if they are not available. Also, if 1500-byte packets are supported end-to-end without fragmentation, you could use the "-T" option to slightly increase the max packet size from 1250 to 1500. This certainly helps...along with making sure the binary is compiled as 64-bit when the OS is 64-bit (particularly with ARM processors). However, in your case the most significant thing to try for a higher upstream rate is to disable GSO (via "cmake -D HAVE_GSO=OFF .")...on the assumption that it may not be fully supported. The interesting thing is that you are seeing loss in the very first sub-interval (when traffic is the lowest), along with reordering. If this does not resolve your issue, could you provide debug output (via "-vD") on both the client and server. This way we can see the socket buffer levels and loss progression during the test. Thanks |
Hi, i performed new test runs on the different platforms with the byte-order fix (see comment above) and with "cmake -D HAVE_GSO=OFF" on the client side (my embedded devices). All devices have a 32bit architecture. Here is the updated test-matrix: Its looking better, but there are still remaining issues: "Intel Puma 7" -> Downstream/Upstream test is not working at all Here are the log files (server/client side with options -v -D): And the wireshark capture files (too big as attachment): Note: The exact same device firmware with udpst 7.XXX is working on the same test setup / device (see matrix pdf). Thanks for your support... |
Thanks for the details - I'm still going through it. But if you have a chance, I think it would be worth trying to disable the remaining optimizations on the Intel Puma 7 devices: And unless your network interface is configured with a 9000-byte MTU, it makes sense to stick with jumbo sizes disabled. We've seen a lot of issues on some devices with fragmentation (e.g., insufficient Fragment Reassembly Memory - see README). Thanks |
Ok - unfortunately this does not seem to help, new logs with ( -D HAVE_RECVMMSG=OFF -D HAVE_SENDMMSG=OFF) : I will also try to debug on this issue in the near future.. The same setup with the old udpst version (just the udpst binary exchanged) is working. |
Well, that rules out a number of things - so it was a good experiment. And your idea to try 7.5.1 is a good one. In general, both directions are showing very strange values or some type of corruption of data fields. And for completeness, can you include the output from a "lscpu" command. |
lscpu is not available on the system, but here is the output of "cat /proc/cpuinfo" I also tested with v7.5.1. now - downstream/upstream test is working fine (same results as for 7.4.0) |
Hi,Please let me ask you a question, when I deploy the server side and client side of UDPST on different servers for simulation test, I encountered this problem as below.
The 25000UDP port is open and can be accessed
The client side runs a command like this: . /udpst -u ip
The server runs the command like this: . /udpst
Is there something I'm missing or overlooking?
The text was updated successfully, but these errors were encountered: