[Draft] Add BandwidthCappedMergeScheduler for enforcing a global merge bandwidth cap #14964
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Add BandwidthCappedMergeScheduler for enforcing a global merge bandwidth cap
This draft PR introduces a prototype
BandwidthCappedMergeScheduler
, which extendsConcurrentMergeScheduler
to enforce a global bandwidth cap on merge operations across all active merges within anIndexWriter
. The scheduler is inspired by the issue/discussion in lucene#14148 and feedback from @mikemccand and others. The motivation is to provide a simple, global bandwidth cap for merge operations, especially useful during "update storms" or "war time" scenarios where aggressive merging can cause page faults.Implementation
BandwidthCappedMergeScheduler
inlucene.index
ConcurrentMergeScheduler
to reuse merge management and threadingupdateMergeThreads()
to dynamically adjust per-merge IO ratesInitially, I experimented with skipping merges in the scheduler (by aborting or refusing to run merges that would exceed the bandwidth cap). However, this approach proved problematic:
IndexWriter
or theMergePolicy
.CMS IO Rate Limiter?
Should We Integrate with CMS IO Rate Limiter?
One open question: should we integrate CMS’s adaptive IO rate logic into this scheduler as well, or something like switch to the global cap only during "war time", such as:
Future Improvements
For now, the implementation is intentionally simple. However, there are several ways to make it more efficient and fair:
MergePolicy
to influence which merges are selected, or even skip merges earlier in the process.Testing
LuceneTestCase.java
and running the full test suite.NRTPerfTest
to simulate update storms and then generate segment traces to visualize bandwidth usage and merge behavior. Hopefully see a less spiky segment tracing graph.Next Steps
IndexWriter
or implementing aBandwidthAwareTieredMergePolicy
that can respond to bandwidth constraints in real time. Thoughts on this?Thanks for reviewing! Looking forward to feedback, suggestions, and further discussion. This PR is opened to propose and evaluate a bandwidth-capped merge scheduler design, and to gather feedback for further development.
Related to #14148