Skip to content

Update LogMergePolicy to skip to a target number of documents #2627

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

kryesh
Copy link
Contributor

@kryesh kryesh commented Apr 18, 2025

From discord conversation:
https://discord.com/channels/908281611840282624/915785344396439552/1341705839668625468

This updated logic makes LogMergePolicy aim for a specific target number of documents, and opportunistically skip merge operations to reach that target document count.
Pros:

  • Reduced IO/CPU usage from skipping intermediate merge operations
  • No longer susceptible to creating huge merge operations by merging many large segments into a single segment many times larger than the target size (previously max_docs_before_merge)
    • The theoretical maximum size of a segment with this updated logic is (target_segment_size * 2) - 2

Cons:

  • If an index has a little over target_segment_size total docs then it may get merged to a single segment and thus not parallelize well when searching

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant