Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[C++] Copy bitmap all at once when casting from string-view to offset string and binary types #43573

Open
felipecrv opened this issue Aug 5, 2024 · 10 comments · May be fixed by #44822
Open

Comments

@felipecrv
Copy link
Contributor

Describe the enhancement requested

Builders append one bit to the validity bitmap on every value, but that is unnecessarily slow when casting arrays as casts don't have to transform the validity bitmap in any way.

#43302 (comment)

Component(s)

C++

@felipecrv
Copy link
Contributor Author

Implementation tip from the linked comment thread:

How can I do that while still using the builders? I will create an issue and link it from the comments.

You could instead use individual TypedBufferBuilders for the offsets and data buffers. That shouldn't add a lot of complexity.

@llama90 @mapleFU, any of you interested in implementing this improvement?

@buaazhwb
Copy link
Contributor

@felipecrv I think the builder->Resize() operation has pre-allocate space for offset and validity buffer. Will it be much slower than use individual offsets and data buffers and copy the validity buffer?

@mapleFU
Copy link
Member

mapleFU commented Aug 12, 2024

@buaazhwb

I think the builder->Resize() operation has pre-allocate space for offset and validity buffer. Will it be much slower than use individual offsets and data buffers and copy the validity buffer?

I think this is due to "append bit to bitmap", Copy Bitmap is just like a memcpy, however, append bit to bitmap is a bit-or for everybit, which would be slower

@mapleFU
Copy link
Member

mapleFU commented Aug 12, 2024

@felipecrv

Sorry for late replying, should I start this after #43302 is merged ?

@buaazhwb
Copy link
Contributor

@buaazhwb

I think the builder->Resize() operation has pre-allocate space for offset and validity buffer. Will it be much slower than use individual offsets and data buffers and copy the validity buffer?

I think this is due to "append bit to bitmap", Copy Bitmap is just like a memcpy, however, append bit to bitmap is a bit-or for everybit, which would be slower

@mapleFU got it. It seems like we need to build array data without using builder. Can i have a chance to optimize it after #43302 merged?

@mapleFU
Copy link
Member

mapleFU commented Aug 12, 2024

Just go ahead 👍

@felipecrv
Copy link
Contributor Author

@felipecrv

Sorry for late replying, should I start this after #43302 is merged ?

Yes. This issue is about improving part of what #43302 does. And I recommend not trying to do it right away because aas @buaazhwb already noticed, it will require building arrays without the array builders and custom code will be needed for classic string arrays and string view arrays.

@mapleFU
Copy link
Member

mapleFU commented Sep 14, 2024

@buaazhwb the cast is merged, this can be move forward, would you still like to try this?

@buaazhwb
Copy link
Contributor

@buaazhwb the cast is merged, this can be move forward, would you still like to try this?

@mapleFU Yes,I will continue it later

@CrystalZhou0529
Copy link
Contributor

Hello, since this feature has not been worked on actively for a while, I just opened a PR for it. Please take a look and let me know if you have any suggestions!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
4 participants