improve vsomeip sluggish connect (master 3.4.x branch) #670

joeyoravec · 2024-04-10T20:34:02Z

fixes #669

As described in issue #669 the train logic should aggregate Service Discovery, but does not currently. This leads to very low performance on systems offering 1000s of EventGroups.

These changes improve performance on 3.4.x branch. I'm leaving this PR in draft status to gather feedback.

Allow passengers with same identifier aboard same train

File implementation/endpoints/src/server_endpoint_impl.cpp has a test:

        if (its_data.train_->passengers_.end()
                != its_data.train_->passengers_.find(its_identifier)) {
            must_depart = true;
        } else

this test will “hit” for every single Service Discovery message: Subscribe, SubscribeAck, etc. Each has the same service/identifier, forcing the train to depart immediately, and preventing the train logic from aggregating multiple SD passengers into the same train.

I've eliminated this entire check on my system. The only rationale I can think to keep this logic is debouncing, to have exactly 1 passenger in each train with the "final, debounced EventGroup value". I don't think the code implements this. If this was a design goal then I think it would need to be handled higher in the stack on an EventGroup-by-EventGroup basis, not as a calculation of train departure time.

Eliminate debounce logic

Debouncing would make sense if it was applied on a message-by-message basis so each gets transmitted “no more often than Xms” dropping all updates except the final one. This way vsomeip events could get low-pass filtered like CAN periodic: some transmitting 20ms, others transmitting 500ms.

The calculation in implementation/endpoints/src/server_endpoint_impl.cpp looks wrong because it applies this parameter to decide when an entire multi-passenger train should depart.

Apply retention time for Service Discovery

There is conditional logic in both 3.1.20 and 3.4.10 in implementation/endpoints/src/server_endpoint_impl.cpp to “always use zero” and skip configured retention time for service discovery:

    if (its_service != VSOMEIP_SD_SERVICE && its_method != VSOMEIP_SD_METHOD) {
        get_configured_times_from_endpoint(its_service, its_method,
                &its_debouncing, &its_retention);
    }

I've eliminated this override so we can use the configured 5ms for service discovery. The only rationale to keep the original behavior is to minimize latency and respond immediately without waiting. In my case it makes more sense to wait and aggregate because there is really a lot of traffic.

Process any number of SOMEIP-SD messages aggregated in single UDP frame

on_message_received supports multiple messages in a single UDP frame but only processes the message:

if the message is not SOMEIP-SD
else if the message is SOMEIP-SD and there’s no subsequent message

After changing the train logic to aggregate multiple SOMEIP-SD messages into a single UDP frame we want it to process everything found within the frame no matter if it’s SOMEIP or SOMEIP-SD.

Current performance

With all of these changes together, my Service Discovery is taking about 300ms end-to-end down from >>2s (usually even exceeding a single SD interval and needing to retry). In this example:

there’s nominal 200-400 microseconds gap between packets, sometimes back-to-back and sometimes up to 2ms. Significantly better system load because it’s aggregating packets together into fewer network send calls.

Our design uses separate EventIDs for each attribute. Currently we see ~3500 total subscribes, followed by ~3500 acknowledgements, transmitted one-per-frame. The entire sequence does not finish within a two second SD interval leading to timeouts, retries, and poor performance. All SD messages have the same identifier, so this logic forced the train must_depart ~3500 times carrying a single message. Very inefficient. This change improves performance by allowing multiple passengers with the same identifier aboard the same train; resulting in fewer, larger frames on the wire.

Upstream logic overrides retention and debounce to zero for SD messages, forcing any train containing SD passengers to depart immediately. Use configured times, allowing a train to aggregate multiple SD passengers.

Debouncing would make sense if it was applied on a message-by-message basis so each gets transmitted "no more often than Xms" dropping all updates except the final one. However upstream code applies this parameter to decide when an entire multi-passenger train should depart. Remove this logic; we're not using the feature, it doesn't work the way we expect, and it prevents aggregating multiple passengers into trains.

goncaloalmeida · 2024-05-03T14:47:48Z

@joeyoravec is this ready to review or are you still working on this?

joeyoravec · 2024-05-04T00:15:58Z

@joeyoravec is this ready to review or are you still working on this?

I've left this in draft so it's clear that this is not ready for merge. Yes this "works" and is worth reviewing.

on_message_received supports multiple messages in a single UDP frame but only processes the message: - if the message is not SOMEIP-SD - else if the message is SOMEIP-SD and there’s no subsequent message After changing the train logic to aggregate multiple SOMEIP-SD messages into a single UDP frame we want it to process everything found within the frame no matter if it’s SOMEIP or SOMEIP-SD.

lutzbichler · 2024-08-27T13:51:13Z

I think this mixes up things. The mechanism in the send-method must ensure that there are no two messages with the same message-id in one train. To avoid (too) many SD messages in case of (too) many eventgroup subscriptions, these should be collected in the SD module before sending.

Eugene Kozlov added 3 commits April 10, 2024 14:48

Use configured retention and debounce times for SD messages

1a1ea81

Upstream logic overrides retention and debounce to zero for SD messages, forcing any train containing SD passengers to depart immediately. Use configured times, allowing a train to aggregate multiple SD passengers.

This was referenced Apr 10, 2024

improve vsomeip sluggish connect (maintain 3.1.x branch) #671

Draft

[BUG]: vsomeip slow to establish communication with lots of EventGroup #669

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

improve vsomeip sluggish connect (master 3.4.x branch) #670

improve vsomeip sluggish connect (master 3.4.x branch) #670

joeyoravec commented Apr 10, 2024 •

edited

Loading

goncaloalmeida commented May 3, 2024 •

edited

Loading

joeyoravec commented May 4, 2024

lutzbichler commented Aug 27, 2024

improve vsomeip sluggish connect (master 3.4.x branch) #670

Are you sure you want to change the base?

improve vsomeip sluggish connect (master 3.4.x branch) #670

Conversation

joeyoravec commented Apr 10, 2024 • edited Loading

Allow passengers with same identifier aboard same train

Eliminate debounce logic

Apply retention time for Service Discovery

Process any number of SOMEIP-SD messages aggregated in single UDP frame

Current performance

goncaloalmeida commented May 3, 2024 • edited Loading

joeyoravec commented May 4, 2024

lutzbichler commented Aug 27, 2024

joeyoravec commented Apr 10, 2024 •

edited

Loading

goncaloalmeida commented May 3, 2024 •

edited

Loading