You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Currently interleaving ByteViewArrays are done with the fallback implementation, which uses a MutableArrayBuilder. The extend method on this builder copies all variadic buffers because it doesn't know if there are buffers not referenced by any views in the array.
Especially on datafusion's TopK implementation, which uses a heap that interleaves arrow arrays to produce the top k rows, current interleave implementation results in an explosion of variadic buffer count for byte view arrays, adding the same set of buffers over and over again. Where this becomes really problematic is when sending such arrays over flight, current encoder materialises all variadic buffers.
I wonder if kernels are blindly concatenating identical buffers together, instead of using something like Buffer::ptr_eq to avoid a new entry for the exact same buffer allocation?
Describe the bug
Quoting @onursatici from #6779:
This also came up recently on #6779 from @ShiKaiWi and a converstaion with @tustvold @XiangpengHao and myself here: #6427 (comment)
To Reproduce
Call interleave or concat with a bunch of StringViewArrays (I think)
Expected behavior
(ideally) if an existing buffer is already in a StringViewArray's
variadic_buffer
list it shouldn't be added againAdditional context
The text was updated successfully, but these errors were encountered: