You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
There is something wrong with how hubble is sending HubEvents to a subscription. They are supposed to be ordered by id, but sometimes they are being sent out of order.
The node one runs fine, but the rust one throws errors very quickly. At first we thought this meant there was a bug in the rust library. But when I captured some packets and printed them out with wireshark, I can see the events out of order on the wire.
Even though this node gist doesn't see the bug, our production node code does. In fact, one time it saw an event id out of order by more than 1 million.
The text was updated successfully, but these errors were encountered:
For anyone else running into this: the underlying cause is due to a quirk of how async events are handled in the combination of Rust + TypeScript that Hubble uses. It is more likely to be hit in hubs under high load.
The fix is likely not quick/easy—it would require porting the entire Hubble gRPC server to Rust as well—a large undertaking.
For that reason, we recommend anyone who processes events should:
Not reject an event just because it has an earlier event ID than the latest event ID you've seen so far
Use a "recently seen" cache to avoid double processing events, instead of relying on the event ID
FWIW, Warpcast uses the cache technique to avoid double processing.
What is the bug?
There is something wrong with how hubble is sending HubEvents to a subscription. They are supposed to be ordered by id, but sometimes they are being sent out of order.
I mad
How can it be reproduced? (optional)
I made these two simple apps that only subscribe and then log when messages are out of order: https://gist.github.com/BlinkyStitt/619706df5aac39e601ff0b5e6a85e88b. I can move it into a proper repo with the protobuf files if you need me to.
The node one runs fine, but the rust one throws errors very quickly. At first we thought this meant there was a bug in the rust library. But when I captured some packets and printed them out with wireshark, I can see the events out of order on the wire.
452947924307972
452947924307971
capture.pcap.zip
Even though this node gist doesn't see the bug, our production node code does. In fact, one time it saw an event id out of order by more than 1 million.
The text was updated successfully, but these errors were encountered: