Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

StringView: Using the Interleave kernel (and potentially others) results in many repeated buffers in variadic_buffers #6780

Open
alamb opened this issue Nov 23, 2024 · 1 comment · May be fixed by #6779
Labels

Comments

@alamb
Copy link
Contributor

alamb commented Nov 23, 2024

Describe the bug
Quoting @onursatici from #6779:

Currently interleaving ByteViewArrays are done with the fallback implementation, which uses a MutableArrayBuilder. The extend method on this builder copies all variadic buffers because it doesn't know if there are buffers not referenced by any views in the array.

Especially on datafusion's TopK implementation, which uses a heap that interleaves arrow arrays to produce the top k rows, current interleave implementation results in an explosion of variadic buffer count for byte view arrays, adding the same set of buffers over and over again. Where this becomes really problematic is when sending such arrays over flight, current encoder materialises all variadic buffers.

This also came up recently on #6779 from @ShiKaiWi and a converstaion with @tustvold @XiangpengHao and myself here: #6427 (comment)

To Reproduce
Call interleave or concat with a bunch of StringViewArrays (I think)

Expected behavior
(ideally) if an existing buffer is already in a StringViewArray's variadic_buffer list it shouldn't be added again

Additional context

@alamb alamb added the bug label Nov 23, 2024
@alamb
Copy link
Contributor Author

alamb commented Nov 23, 2024

Another suggestion is

I wonder if kernels are blindly concatenating identical buffers together, instead of using something like Buffer::ptr_eq to avoid a new entry for the exact same buffer allocation?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant