-
Notifications
You must be signed in to change notification settings - Fork 13.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[FLINK-19059]: Support non-time retract mode for OverAggregate operator #25753
base: master
Are you sure you want to change the base?
Conversation
- Non-time attribute over aggregates can support retract mode - Time attribute over aggregates support only append mode
169870d
to
d0b856b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR @bvarghese1!
I've only taken a look a the operator implementation and left a few suggestion to improve the efficiency of the operator.
Best, Fabian
protected transient JoinedRowData output; | ||
|
||
// state to hold the accumulators of the aggregations | ||
private transient ValueState<RowData> accState; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what is stored in accState
?
Is it the accumulators for the last record in the ordered list?
registerProcessingCleanupTimer(ctx, ctx.timerService().currentProcessingTime()); | ||
|
||
// get last accumulator | ||
RowData lastAccumulatorCurr = accState.value(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
accState
is never updated.
I don't think we need to have accumulator state the way it is used here.
Storing the accumulators for each input row would reduce the computational cost for updates.
new ValueStateDescriptor<List<RowData>>("sortedKeyState", sortedKeyTypeInfo); | ||
sortedKeyState = getRuntimeContext().getState(sortedKeyStateDescriptor); | ||
|
||
MapStateDescriptor<Long, RowData> valueStateDescriptor = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if we don't persist the accumulators for all input rows, another optimization would be to split the input row data into two maps, one for the fields that are needed to compute the aggregates and one for those that are simply forwarded.
Then we could recompute the accumulators by just deserializing the first map.
|
||
while (iterator.hasNext()) { | ||
RowData curKey = iterator.next(); // (ID, sortKey) | ||
RowData inputKey = GenericRowData.of(-1L, sortKeyFieldGetter.getFieldOrNull(input)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
move this out of the while loop?
break; | ||
} | ||
// Can also add the accKey to the sortedKeyList to avoid reading from the valueMapState | ||
RowData curRow = valueMapState.get(curKey.getLong(keyIdx)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just realized that the Map approach won't helps if we can have to access all elements for any update (insert or delete). We can either persist the accumulators for all input rows (so we only need to recompute all accumulators after the inserted/deleted row) or if we split the input row into the fields that are needed for the aggregate computation and those that aren't
// Generate UPDATE_BEFORE | ||
output.setRowKind(RowKind.UPDATE_BEFORE); | ||
output.replace(curValue, prevFunction.getValue()); | ||
out.collect(output); | ||
// Generate UPDATE_AFTER | ||
output.setRowKind(RowKind.UPDATE_AFTER); | ||
output.replace(curValue, currFunction.getValue()); | ||
out.collect(output); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we would store the accumulators for all input rows, we could check if the aggregates really changed.
For example a MIN()
, MAX()
, LAG()
, or LEAD()
function might not change, even if we inserted or deleted a row. We should only emit updates if something really changed.
Any change we filter out here, will reduce the processing overhead in subsequent operators / sinks.
out.collect(output); | ||
|
||
// Emit updated agg value for all records after newly inserted row | ||
while (iterator.hasNext()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we store the accumulators, we can early out this loop as soon as the accumulators before and after the change are identical. Since the accumulator for row x depends on the accumulator of row x-1 and x itself and x didn't change, we can stop traversing the list once the accumulators are identical before and after the row was inserted.
output.setRowKind(RowKind.UPDATE_BEFORE); | ||
output.replace(curValue, prevFunction.getValue()); | ||
out.collect(output); | ||
|
||
output.setRowKind(RowKind.UPDATE_AFTER); | ||
output.replace(curValue, currFunction.getValue()); | ||
out.collect(output); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we persist per-row accumulators, we could check if the rows differ and only emit updates in that case.
List<RowData> sortedKeyList = sortedKeyState.value(); | ||
ListIterator<RowData> iterator = sortedKeyList.listIterator(); | ||
|
||
while (iterator.hasNext()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we could early out the loop once the accumulators before and after the change are the same (see detailed comment above)
out.collect(output); | ||
iterator.remove(); | ||
valueMapState.remove(curKey.getLong(keyIdx)); | ||
currFunction.retract(curValue); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it would be better if we wouldn't need to rely on retractable accumulators.
They are often less space and/or compute efficient.
@lincoln-lil I noticed that you made some similar changes (more specifically the planner changes) for optimizing the Deduplicate operator. Would you mind giving me some suggestions regarding some planner specific changes that needs to be done if we want to support retract mode for a specific operator? I looked at your PR and I think I would need to also update the NDU files. Any other files/classes that I should look into? |
Reviewed by Chi on 12/12/24. appears that this PR is healthily progressing |
7484013
to
b76198c
Compare
What is the purpose of the change
INSERT INTO sink_t SELECT key, val, ts, SUM(val) OVER (PARTITION BY key ORDER BY <NON-TIME-ATTRIBUTE> RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS sum_val FROM source_t
Brief change log
Verifying this change
Please make sure both new and modified tests in this PR follow the conventions for tests defined in our code quality guide.
(Please pick either of the following options)
Does this pull request potentially affect one of the following parts:
@Public(Evolving)
: (no)Documentation