Skip to content

Decrease minimum deletes percentage in TMP #14893

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 3 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions lucene/CHANGES.txt
Original file line number Diff line number Diff line change
Expand Up @@ -100,6 +100,9 @@ API Changes
* GITHUB#14426: Support determining desired off-heap memory requirements through
KnnVectorsReader::getOffHeapByteSize (Chris Hegarty)

* GITHUB#14893: TieredMergePolicy minimum deletes percentage decreased from 5% inclusive to 0% exclusive.
(Stefan Vodita)

New Features
---------------------
* GITHUB#14404: Introducing DocValuesMultiRangeQuery.SortedNumericStabbingBuilder into sandbox.
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -130,18 +130,25 @@ public double getMaxMergedSegmentMB() {
/**
* Sets the maximum percentage of doc id space taken by deleted docs. The denominator includes
* both active and deleted documents. Lower values make the index more space efficient at the
* expense of increased CPU and I/O activity. Values must be between 5 and 50. Default value is
* expense of increased CPU and I/O activity. Values must be between 0 and 50. Default value is
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we enhance this javadoc to note the dangers of very low (< 5%) target deletions? Something like:

Values below 5% can lead to exceptionally high merge cost where indexing will continuously
merge nearly all segments, and select newly merged segments immediately for merging again,
often forcing degenerate merge selection like singleton merges.  If you venture into this dark
forest, consider limiting the maximum number of concurrent merges and threads (link to
ConcurrentMergeScheduler's setMaxMergesAndThreads) as a coarse attempt to bound the
otherwise pathological indexing behavior.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the suggestion @mikemccand! I added the paragraph almost exactly as you wrote it.

* 20.
*
* <p>When the maximum delete percentage is lowered, the indexing thread will call for merges more
* often, meaning that write amplification factor will be increased. Write amplification factor
* measures the number of times each document in the index is written. A higher write
* amplification factor will lead to higher CPU and I/O activity as indicated above.
*
* <p>Values below 5% can lead to exceptionally high merge cost where indexing will continuously
* merge nearly all segments, and select newly merged segments immediately for merging again,
* often forcing degenerate merge selection like singleton merges. If you venture into this dark
* forest, consider limiting the maximum number of concurrent merges and threads (see {@link
* ConcurrentMergeScheduler#setMaxMergesAndThreads}) as a coarse attempt to bound the otherwise
* pathological indexing behavior.
*/
public TieredMergePolicy setDeletesPctAllowed(double v) {
if (v < 5 || v > 50) {
if (v <= 0 || v > 50) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like that we still enforce > 0. It makes it more apparent that you are approaching a singular pathological case as you go to 0.001 or whatever

throw new IllegalArgumentException(
"indexPctDeletedTarget must be >= 5.0 and <= 50 (got " + v + ")");
"indexPctDeletedTarget must be > 0 and <= 50 (got " + v + ")");
}
deletesPctAllowed = v;
return this;
Expand Down
Loading