(Discussion of Lab: KeyedProcessFunction
and Timers (Long Ride Alerts))
These cases are worth noting:
- The START event is missing. Then END event will sit in state indefinitely (this is a leak!).
- The END event is missing. The timer will fire and the state will be cleared (this is ok).
- The END event arrives after the timer has fired and cleared the state. In this case the END event will be stored in state indefinitely (this is another leak!).
These leaks could be addressed by either using state TTL, or another timer, to eventually clear any lingering state.
Regardless of how clever we are with what state we keep, and how long we choose to keep it, we should eventually clear it -- because otherwise our state will grow in an unbounded fashion. And having lost that information, we will run the risk of late events causing incorrect or duplicated results.
This tradeoff between keeping state indefinitely versus occasionally getting things wrong when events are late is a challenge that is inherent to stateful stream processing.
For each of these, add tests to check for the desired behavior.
- Extend the solution so that it never leaks state.
- Define what it means for an event to be missing, detect missing START and END events, and send some notification of this to a side output.