You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
My automotive system has *.fidl with ~3500 attributes, one per CAN signal. My *.fdepl maps each attribute into a unique EventGroup.
Especially when resuming from suspend-to-ram it's possible that UDP SOMEIP-SD will be operational but TCP socket will be broken. This leads to tce restart() but during this time any Subscribe will receive SubscribeNack in response:
thereafter the code will fall through, calling shutdown_and_close_socket_unlocked() and perform the full restart even while a connection is in progress.
As the system continues processing 1000s of SubscribeNack this will be a tight loop of 100% cpu load and multiple seconds to plow-through the workload. This can easily exceed a 2s ServiceDiscovery interval and cascade to further problems.
Reproduction Steps
My reproduction was:
start with fully-established communication between tse and tce
tce enters suspend-to-ram with TCP socket established
allow tse to continue running, exceed TCP keepalive timeout, and close the TCP socket
tce resumes from suspend-to-ram thinking TCP socket is still established, then discovers it to be closed
but any use-case where tse closes the TCP socket but UDP is functional should be sufficient.
Expected behaviour
Performance should be better.
Logs and Screenshots
No response
The text was updated successfully, but these errors were encountered:
eliminate the tce restart() call from service_discovery_impl::handle_eventgroup_subscription_nack(). It's not clear why this is required or how it would help
modify tce restart() to "early terminate" better, perhaps an unlimited number of times within the 5 second timeout
ensure that SOMEIP-SD gets inhibited around any event like suspend-to-ram where network communication will be lost. Try to prevent Subscribe until the TCP socket gets re-established
Interested in feedback on what would be most effective
vSomeip Version
v3.4.10
Boost Version
1.82
Environment
Android and QNX
Describe the bug
My automotive system has
*.fidl
with ~3500 attributes, one per CAN signal. My*.fdepl
maps each attribute into a unique EventGroup.Especially when resuming from suspend-to-ram it's possible that UDP SOMEIP-SD will be operational but TCP socket will be broken. This leads to tce
restart()
but during this time any Subscribe will receive SubscribeNack in response:as the number of EventGroup scales to a large number, this become catastrophic to performance.
In
service_discovery_impl::handle_eventgroup_subscription_nack()
each EventGroup callsrestart()
:vsomeip/implementation/service_discovery/src/service_discovery_impl.cpp
Lines 2517 to 2521 in cf49723
and in
tcp_client_endpoint_impl::restart()
while::CONNECTING
the code will "early terminate" from maximum 5 restarts:vsomeip/implementation/endpoints/src/tcp_client_endpoint_impl.cpp
Lines 77 to 85 in cf49723
thereafter the code will fall through, calling
shutdown_and_close_socket_unlocked()
and perform the full restart even while a connection is in progress.As the system continues processing 1000s of SubscribeNack this will be a tight loop of 100% cpu load and multiple seconds to plow-through the workload. This can easily exceed a 2s ServiceDiscovery interval and cascade to further problems.
Reproduction Steps
My reproduction was:
but any use-case where tse closes the TCP socket but UDP is functional should be sufficient.
Expected behaviour
Performance should be better.
Logs and Screenshots
No response
The text was updated successfully, but these errors were encountered: