-
Notifications
You must be signed in to change notification settings - Fork 13.8k
[FLINK-38477][runtime] Add FINISHED watermark status to support proper watermark aggregation #27083
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
…r watermark aggregation
@flinkbot run azure |
} | ||
return; | ||
} | ||
// Other watermark statuses (ACTIVE, IDLE) should not occur for finished tasks |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nits:
- you could could fail fast an issue the Exception if (!watermarkStatus.isFinished()) {.throw }
then you would not need thereturn
. - can we put out an identifier of the task in the Exception to help with diagnostics.
* override this method because they rely on StatusWatermarkValve to aggregate FINISHED status | ||
* from upstream inputs and propagate it downstream automatically. | ||
*/ | ||
protected void emitFinishedStatus() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we arrange the class hierarchy so that emitFinishedStatus() is only overridable for sources?
protected void emitFinishedStatusToOutputs() { | ||
try { | ||
if (operatorChain != null) { | ||
RecordWriterOutput<?>[] streamOutputs = operatorChain.getStreamOutputs(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we have no stream outputs? if so we could issue another LOG.warn("Cannot emit FINISHED watermark status: operator chain has no stream outputs");)
|
||
public static final int IDLE_STATUS = -1; | ||
public static final int ACTIVE_STATUS = 0; | ||
public static final int FINISHED_STATUS = 1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we turn this into an enum?
|
||
public WatermarkStatus(int status) { | ||
if (status != IDLE_STATUS && status != ACTIVE_STATUS) { | ||
if (status != IDLE_STATUS && status != ACTIVE_STATUS && status != FINISHED_STATUS) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how can we get into this situation?
private final int inputCount; | ||
|
||
private int watermarksSeen = 0; | ||
private int finishedStatusSeen = 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am curious on you thinking on how this new status effects https://nightlies.apache.org/flink/flink-docs-master/docs/dev/datastream-v2/watermark/#emit-watermark customizations.
What is the purpose of the change
This PR fixes a watermark aggregation bug that can cause zero output from time-based operators (e.g., IntervalJoin) when some upstream subtasks finish while others later become temporarily idle.
Details of the issue are described in FLINK-38477.
Root cause:
Flink currently allows finished subtasks to dominate watermark aggregation by emitting Long.MAX_VALUE instead of being excluded like idle inputs. This prematurely advances downstream watermarks to infinity (Long.MAX_VALUE), halting event-time progress and eliminating output.
Brief change log
Goal: Ensure a finished input is excluded from operator watermark aggregation, just like an idle input. This prevents
Long.MAX_VALUE
from dominating aggregation when some channels are finished and others are idle.Implementation
FINISHED
.status = FINISHED
watermark = Long.MAX_VALUE
if the watermarks of all input channels are already Long.MAX_VALUE.status = ACTIVE
,watermark = min(all active channel watermarks)
.status = IDLE
,watermark = max(all idle channel watermarks)
.status = FINISHED
,watermark = Long.MAX_VALUE
.This ensures finished inputs are ignored during aggregation until every input has completed, at which point the operator watermark correctly advances to
Long.MAX_VALUE
.Core issues identified
Current state model:
ACTIVE
←→IDLE
IDLE
= “temporarily no data, may resume”FINISHED
= “permanently done, never resuming”Rejected approach: treating finished as idle.
Issue 1: IDLE status gets overwritten
StreamTask.endData()
emitsWatermarkStatus.IDLE
when a source finishs.operatorChain.finishOperators()
callsWatermarkAssignerOperator.processWatermark()
.WatermarkAssignerOperator
receivesLong.MAX_VALUE
, it forcibly emitsWatermarkStatus.ACTIVE
.Issue 2: Incorrect aggregation when all channels are IDLE
Long.MAX_VALUE
):StatusWatermarkValve
callsfindAndOutputMaxWatermarkAcrossAllChannels()
.Long.MAX_VALUE
watermark.Conclusion:
FINISHED
status is necessary to create a three-state WatermarkStatus (ACTIVE / IDLE / FINISHED) that properly distinguishes temporary idleness from permanent completion.Long-term refactoring (future follow-up)
This PR fixes the immediate bug, but the broader problem is fragmented watermark lifecycle management.
Current problems
Proposed long-term direction
Source-owned completion and embedded watermark strategy:
Sources should declare when they are active, idle, or finished, and emit both status and watermark atomically. Wrapper operators should not override this behavior.
This refactoring would prevent the entire class of bugs we've been fixing and make watermark completion semantics clear and maintainable.
Verifying this change
Unit tests (new)
Long.MAX_VALUE
.Manual/e2e tests
table.exec.source.idle-timeout=10s
).Long.MAX_VALUE
, all records dropped → no output.After fix: watermark does not jump; resumed records are processed correctly.
Does this pull request potentially affect one of the following parts:
@Public(Evolving)
): noDocumentation
FINISHED
watermark status)