You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
There has been recent work to move the ChunkResolver to public API.
ChunkResolver uses O(log(num_chunks)) binary search to identify chunks, which is optimised for random access. For sequential row-by-row access, using ChunkResolver would be inefficient.
Sometimes a user needs to be able to do row-major processing of the data. To that note, the proposal is to add these helper methods to the ChunKResolver API for more efficient sequential access traversal.
ChunkResolver resolver(batches);
for (ChunkLocation loc; resolver.Valid(loc); loc = resolved.Next(loc)) {
// re-use loc for all the typed columns since they are split on the same offsets
}
Component(s)
C++
The text was updated successfully, but these errors were encountered:
anjakefala
changed the title
[C++] Improve sequential access use of ChunkResolver
[C++] Improve performance of sequential access of ChunkResolver
Nov 5, 2024
I think this is a great idea, though we might need to either rename ChunkResolver or otherwise just ensure that the public docs make all the limitations clear.
That's really my only concern, we want to avoid this becoming the "standard way to iterate a table" and instead ensure that users can easily know when they should or shouldn't use the ChunkResolver
Describe the enhancement requested
There has been recent work to move the ChunkResolver to public API.
ChunkResolver
usesO(log(num_chunks))
binary search to identify chunks, which is optimised for random access. For sequential row-by-row access, using ChunkResolver would be inefficient.Sometimes a user needs to be able to do row-major processing of the data. To that note, the proposal is to add these helper methods to the
ChunKResolver
API for more efficient sequential access traversal.These helper methods were written by @felipecrv:
with the resulting loops:
Component(s)
C++
The text was updated successfully, but these errors were encountered: