Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
fix: fix pipeline congestion shortcut synchronization (#1740)
* fix: fix pipeline congestion shortcut synchronization #1627 introduced a congestion shortcut mechanism, but this one was not correctly synchronized. There was indeed situations experienced by users in which congested flag was set and never reset, which implies a drop of all successive messages (the publisher becomes kind of dead). The congestion flag is in fact set after the deadline of a message is reached, while it is unset when batches were refilled, only with relaxed atomic operations. Also, after the deadline is reached, there is no further check of the queue. The most obvious synchronization flow here is that between the instant where the thread is waken up because the deadline has been reached, and the one where the congested flag is set, it is possible that the tx task is unblocked and all the batches are sent and refilled. The congested flags would been set after, so there would be no batch to refill to unset it back. This flow seems hard to achieve when there are many batches in the queue, but it is still theoretically possible. And when fragmentation chain is dropped in the middle, pushing the ephemeral batch takes more time before setting the congested flag; this is precisely the case where the bug was observed by the way, so it may indicate the described flow is the reason behind, but it's not sure. The proposed fix adds a additional synchronization step after setting the congested flag: we retry to push the message, this time with an already expired deadline. If the batches were refilled, the message should be able to be sent and the congestion flag will be reset. If batches were in fact refilled, but the message was fragmented and still ends by being dropped, it still means batches have been pushed and will be refilled later, still unsetting the congested flag. * Fix typo * fix: typo --------- Co-authored-by: Oussama Teffahi <[email protected]>
- Loading branch information