[BUG]: vsomeip slow to establish communication with lots of EventGroup #669

joeyoravec · 2024-04-10T18:30:51Z

vSomeip Version

v3.4.10

Boost Version

1.82

Environment

Android and QNX

Describe the bug

My automotive system has *.fidl with ~3500 attributes, one per CAN signal. My *.fdepl maps each attribute into a unique EventGroup.

Any time the network connection is established, or broken and re-established, I get an avalanche of ~3500 subscribes, followed by ~3500 acknowledgements, transmitted one-per-frame. The entire sequence does not fit inside a 2 seconds Service Discovery interval. When the work does not complete within the timeout interval then routingmanager will issue StopSubscribe and SubscribeNAK. The system will retry but it will take a long time, at least a couple of Service Discovery intervals.

The train logic is supposed to aggregate these together, sending a train only when it’s full or 5 ms elapse, but there are several places in the code that prevent this.

Reproduction Steps

This behavior is easily reproduced when the system has a *.fidl with 1000s of attributes and *.fdepl puts each into a unique EventGroup.

Subscribe to all ~3500 attributes, use an ifconfig down; sleep 10; ifconfig up to break and re-establish the network connection, look at the tcpdump and observe the network behavior.

Expected behaviour

The train logic should do a "pretty good job" to aggregate many SUBSCRIBE and many SUBSCRIBEACK into each Service Discovery packet.

Logs and Screenshots

With the existing code you should see 1000s of back-to-back SUBSCRIBE like:

5039	9.333908	10.6.0.3	10.6.0.10	SOME/IP-SD	86	SOME/IP Service Discovery Protocol [SubscribeNack]
5040	9.334271	10.6.0.10	10.6.0.3	SOME/IP-SD	104	SOME/IP Service Discovery Protocol [Subscribe]
5041	9.335307	10.6.0.10	10.6.0.3	SOME/IP-SD	98	SOME/IP Service Discovery Protocol [Subscribe]
5042	9.335710	10.6.0.10	10.6.0.3	SOME/IP-SD	114	SOME/IP Service Discovery Protocol [Subscribe]
5043	9.336492	10.6.0.10	10.6.0.3	SOME/IP-SD	98	SOME/IP Service Discovery Protocol [Subscribe]
5044	9.336762	10.6.0.10	10.6.0.3	TCP	66	36651 → 30510 [FIN, ACK] Seq=142 Ack=1 Win=64256 Len=0 TSval=269564273 TSecr=2

each of ~98 bytes, separate packets, nothing or almost-nothing aggregated. In this region we see a SUBSCRIBENACK and socket close because the entire sequence exceeded the 2s Service Discovery timeout interval

The text was updated successfully, but these errors were encountered:

joeyoravec · 2024-04-10T20:45:28Z

I've opened draft pull requests:

with the code-changes that I've applied locally to address this issue. I would appreciate any feedback on the approach.

joeyoravec · 2024-05-22T16:24:28Z

I've updated the pull request for 3.4.x (but not 3.1.x) with an additional commit for a problem discovered in testing. I was getting this warning:

Received an unreliable vSomeIP SD message with too short length field local: 10.6.0.10:30490 remote: 10.6.0.3:30490

and the root-cause was here:

vsomeip/implementation/endpoints/src/udp_server_endpoint_impl.cpp

Lines 682 to 690 in 6c0e9db

    
           } else { 
        
               if (its_service != VSOMEIP_SD_SERVICE || 
        
                   (current_message_size > VSOMEIP_SOMEIP_HEADER_SIZE && 
        
                           current_message_size >= remaining_bytes)) { 
        
                   its_host->on_message(&_buffer[i], 
        
                           current_message_size, this, _is_multicast, 
        
                           VSOMEIP_ROUTING_CLIENT, 
        
                           nullptr, 
        
                           its_remote_address, its_remote_port);

on_message_received supports multiple messages in a single UDP frame but only processes the message:

if the message is not SOMEIP-SD
else if the message is SOMEIP-SD and there’s no subsequent message in the frame

After changing the train logic to aggregate multiple SOMEIP-SD messages into a single UDP frame we want it to process all messages found in the frame, no matter if the messages are SOMEIP or SOMEIP-SD

duartenfonseca · 2024-08-28T17:20:03Z

hi @joeyoravec i have been trying to reproduce your problem on my environment, so that we could validate the fix, however I am having some problems.
I used one of the CommonAPI examples (link) to achieve this, with the following configurations:
example_configs.zip

Can you check if these make sense? our provide the ones you used so that i could check it.

Thanks!

joeyoravec added the bug label Apr 10, 2024

This was referenced Apr 10, 2024

improve vsomeip sluggish connect (master 3.4.x branch) #670

Draft

improve vsomeip sluggish connect (maintain 3.1.x branch) #671

Draft

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG]: vsomeip slow to establish communication with lots of EventGroup #669

[BUG]: vsomeip slow to establish communication with lots of EventGroup #669

joeyoravec commented Apr 10, 2024

joeyoravec commented Apr 10, 2024

joeyoravec commented May 22, 2024

duartenfonseca commented Aug 28, 2024 •

edited

Loading

[BUG]: vsomeip slow to establish communication with lots of EventGroup #669

[BUG]: vsomeip slow to establish communication with lots of EventGroup #669

Comments

joeyoravec commented Apr 10, 2024

vSomeip Version

Boost Version

Environment

Describe the bug

Reproduction Steps

Expected behaviour

Logs and Screenshots

joeyoravec commented Apr 10, 2024

joeyoravec commented May 22, 2024

duartenfonseca commented Aug 28, 2024 • edited Loading

duartenfonseca commented Aug 28, 2024 •

edited

Loading