You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Robus can experience some message collisions on the network due to the multi-master aspect of the protocol. After a collision, Robus has to retry to send a message and do something to avoid re-colliding. But it seems that we still have one condition where collision avoidance doesn't work.
Context and environment
Few explanations about basic protocol timeout
On Robus timeout is used to avoid transmission during a reception AKA collision. The idea is to lock the transmission as soon as we receive something and unlock it after a timeout. more info about timeout in the related documentation page
To manage that Robus reset a timer to a specific value at each byte's reception so that after an inactivity period on the bus all the nodes can send messages again.
Timeout used for collision avoidance
Sometimes 2 nodes will try to send messages at the same time. In this condition, the timeout is not working and we still have a collision on the network. This collision will be detected and Robus will retry to send the message after a timeout period depending on its node ID to avoid to recollide with the same node again:
But the thing is that the collision avoidance timer is the same timer used for normal reception so in reality the node 2 collision avoidance timeout is overwritten by the reception of node 1 tx:
This leads us to the case where we could have a failure of collision avoidance :
Here we have 3 nodes colliding and then a fourth node colliding with the retry of node 1. This leads us to a collision loop.
How to reproduce the bug
@houkhouk only sees it in one specific condition in years, so it's almost impossible to reproduce voluntarily.
Possible solution
To avoid this we could give the timeout timer priority to the latest timeout. If a normal timeout should trigger before a collision avoidance timeout we should not reset it.
To say it differently this timer should prioritize the latest timeout possible:
The text was updated successfully, but these errors were encountered:
Details
Which version of the bug has been detected on
Luos engine 3.1.0 and all others before that
Description of the bug
Robus can experience some message collisions on the network due to the multi-master aspect of the protocol. After a collision, Robus has to retry to send a message and do something to avoid re-colliding. But it seems that we still have one condition where collision avoidance doesn't work.
Context and environment
Few explanations about basic protocol timeout
On Robus timeout is used to avoid transmission during a reception AKA collision. The idea is to lock the transmission as soon as we receive something and unlock it after a timeout. more info about timeout in the related documentation page
To manage that Robus reset a timer to a specific value at each byte's reception so that after an inactivity period on the bus all the nodes can send messages again.
Timeout used for collision avoidance
Sometimes 2 nodes will try to send messages at the same time. In this condition, the timeout is not working and we still have a collision on the network. This collision will be detected and Robus will retry to send the message after a timeout period depending on its node ID to avoid to recollide with the same node again:
But the thing is that the collision avoidance timer is the same timer used for normal reception so in reality the node 2 collision avoidance timeout is overwritten by the reception of node 1 tx:
This leads us to the case where we could have a failure of collision avoidance :
Here we have 3 nodes colliding and then a fourth node colliding with the retry of node 1. This leads us to a collision loop.
How to reproduce the bug
@houkhouk only sees it in one specific condition in years, so it's almost impossible to reproduce voluntarily.
Possible solution
To avoid this we could give the timeout timer priority to the latest timeout. If a normal timeout should trigger before a collision avoidance timeout we should not reset it.
To say it differently this timer should prioritize the latest timeout possible:
The text was updated successfully, but these errors were encountered: