Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multifile Merge optimization idea #36

Open
zilder opened this issue Jul 27, 2021 · 0 comments
Open

Multifile Merge optimization idea #36

zilder opened this issue Jul 27, 2021 · 0 comments

Comments

@zilder
Copy link
Contributor

zilder commented Jul 27, 2021

Multifile Merge is implemented using heap data structure to provide sorted output. On each iteration we remove the top element from the heap, replace it with a new one (from the same source) and heapify. Sometimes (possibly oftentimes) when we read a new row group, all elements of that row group would appear on top of the heap before any other element; in other words all elements of that row group are less than any other element in the heap. In this case it would be cheaper to skip heapify step and read elements from that row group one after another until it's exhausted.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant