Address memory over-accounting in array_agg #16816

gabotechs · 2025-07-18T08:46:08Z

Which issue does this PR close?

Closes #.

Rationale for this change

Follow up on:

The ArrayAggAccumulator, unlike DistinctArrayAggAccumulator or OrderSensitiveArrayAggAccumulator or many other accumulators, accumulates the values by directly referencing to the source ArrayRefs rather than making a ScalarValue out of it. This is good for performance as we can afford to just keep references to the original buffers, but it has the drawback that it makes complicated measuring the memory consumed by the ArrayAggAccumulator.

When calling ArrayRef::get_array_memory_size() we get the memory occupied by the whole underlaying buffer, but it's not technically true that ArrayAggAccumulator is occupying all that space.

I found this other ArrayData::get_slice_memory_size() method with the following docs:

Returns the total number of the bytes of memory occupied by the buffers by this slice of ArrayData 
(See also diagram on ArrayData).

This is approximately the number of bytes if a new ArrayData was formed by creating new Buffers
 with exactly the data needed.

For example, a DataType::Int64 with 100 elements, Self::get_slice_memory_size would return 100 * 8 = 800.
If the ArrayData was then Self::sliceed to refer to its first 20 elements, then Self::get_slice_memory_size
on the sliced ArrayData would return 20 * 8 = 160.

Which leads me to think that if we are accumulating ArrayRefs, rather than compacting them by copying their data, it might be better to just not do it and report the consumed memory by calling get_slice_memory_size(). I think it's still not technically true that ArrayAggAccumulator is occupying that space, as it's memory not owned by ArrayAggAccumulator, it's just referenced by it but owned by someone else, but it's the closest thing to reality that I was able to come up with.

What changes are included in this PR?

Stops copying ArrayRef data in ArrayAggAccumulator and starts measuring its occupied size with get_slice_memory_size()

Are these changes tested?

yes, by existing tests

Are there any user-facing changes?

People should stop seeing ResourceExhausted errors in aggregations that imply ArrayAgg

…asuring array_agg accumulator size

fmonjalet

I like the approach. It's a case where the under accounting seems more manageable than a vast over accounting, and performance should be better than copying too.

fmonjalet · 2025-07-18T15:04:39Z

datafusion/functions-aggregate/src/array_agg.rs

@@ -1008,8 +1002,7 @@ mod tests {
        acc2.update_batch(&[data(["b", "c", "a"])])?;
        acc1 = merge(acc1, acc2)?;

-        // without compaction, the size is 2652.
-        assert_eq!(acc1.size(), 732);
+        assert_eq!(acc1.size(), 266);


Nice! To convince myself this approach was still accurate enough, I was thinking we should compare this size to the raw data size, which I got from:

let data1 = data(["a", "c", "b"]); let data2 = data(["b", "c", "a"]); println!("Size of data: {} {}", data1.get_array_memory_size(), data2.get_array_memory_size());

I see that the data size is 1208 for each. But it's related to excessive capacity of the Array, which I would hope does not happen too much for larger arrays (the ones that actually matter for memory accounting). I would also hope unused memory pages are not effectively allocated by the kernel anyway, so not taking memory in practice (unsure about that in this case).

If we update the data helper to use shrink_to_fit:

fn data<T, const N: usize>(list: [T; N]) -> ArrayRef where ScalarValue: From<T>, { let values: Vec<_> = list.into_iter().map(ScalarValue::from).collect(); let mut array = ScalarValue::iter_to_array(values).expect("Cannot convert to array"); array.shrink_to_fit(); array }

Then we get 139 byte per array, which is 278 total, so we are under-accounting by only 12 bytes in practice. Which sounds good.

comphead · 2025-07-18T15:07:34Z

datafusion/functions-aggregate/src/array_agg.rs

-            // not used here.
-            self.values
-                .push(make_array(copy_array_data(&val.to_data())));
+            self.values.push(val)


love this one, as the call looks much cheaper now

github-actions bot added the functions Changes to functions implementation label Jul 18, 2025

Use get_slice_memory_size() instead of get_array_memory_size() for me…

ac407a1

…asuring array_agg accumulator size

gabotechs force-pushed the fix-size-measurement-for-array-agg-accumulator branch from c90401b to ac407a1 Compare July 18, 2025 08:50

fmonjalet approved these changes Jul 18, 2025

View reviewed changes

comphead reviewed Jul 18, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Address memory over-accounting in array_agg #16816

Address memory over-accounting in array_agg #16816

gabotechs commented Jul 18, 2025 •

edited

Loading

Uh oh!

fmonjalet left a comment

Uh oh!

fmonjalet Jul 18, 2025

Uh oh!

comphead Jul 18, 2025

Uh oh!

Uh oh!

Address memory over-accounting in array_agg #16816

Are you sure you want to change the base?

Address memory over-accounting in array_agg #16816

Conversation

gabotechs commented Jul 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

Uh oh!

fmonjalet left a comment

Choose a reason for hiding this comment

Uh oh!

fmonjalet Jul 18, 2025

Choose a reason for hiding this comment

Uh oh!

comphead Jul 18, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

gabotechs commented Jul 18, 2025 •

edited

Loading