Fix ByteBlockPool integer overflow by implementing buffer limit detection #15330

ashish159357 · 2025-10-12T15:22:32Z

Problem

ByteBlockPool uses 32KB buffers with an integer offset tracker ( byteOffset). When more than 65,535 buffers are allocated, integer overflow occurs in the byteOffset calculation (byteOffset = bufferUpto * BYTE_BLOCK_SIZE), causing ArithmeticException during indexing of documents with large numbers of tokens.

Root Cause

Each buffer is 32KB (BYTE_BLOCK_SIZE = 32768)
Maximum safe buffer count: Integer.MAX_VALUE / BYTE_BLOCK_SIZE = 65535
When bufferUpto >= 65535, the multiplication overflows

Solution
Implement proactive DWPT flushing when buffer count approaches the limit:

Detection: Added isApproachingBufferLimit() method to detect when buffer count approaches the overflow threshold
Propagation: Buffer limit status flows from ByteBlockPool → IndexingChain → DocumentsWriterPerThread → DocumentsWriterFlushControl
Prevention: Force flush DWPT before overflow occurs, similar to existing RAM-based flushing.

Key Changes

Added buffer limit detection in ByteBlockPool
Integrated check into DocumentsWriterFlushControl.doAfterDocument()
Uses threshold of 65,000 to provide safety margin before actual limit of 65,535
Maintains existing performance characteristics while preventing crashes

…cene-15152

github-actions · 2025-10-12T15:23:24Z

This PR does not have an entry in lucene/CHANGES.txt. Consider adding one. If the PR doesn't need a changelog entry, then add the skip-changelog label to it and you will stop receiving this reminder on future updates to the PR.

rmuir · 2025-10-12T15:31:01Z

When more than 65,535 buffers are allocated, integer overflow occurs in the byteOffset calculation (byteOffset = bufferUpto * BYTE_BLOCK_SIZE), causing ArithmeticException during indexing of documents with large numbers of tokens.

But this is not supported: the limits on IndexWriter are 2GB

msokolov · 2025-10-12T15:34:54Z

maybe AI-generated? The bullet point formatting looks characteristic. Not that that is banned or anything, but it might need additional scrutiny

github-actions · 2025-10-12T15:47:35Z

This PR does not have an entry in lucene/CHANGES.txt. Consider adding one. If the PR doesn't need a changelog entry, then add the skip-changelog label to it and you will stop receiving this reminder on future updates to the PR.

github-actions · 2025-10-13T13:46:07Z

This PR does not have an entry in lucene/CHANGES.txt. Consider adding one. If the PR doesn't need a changelog entry, then add the skip-changelog label to it and you will stop receiving this reminder on future updates to the PR.

bharath-techie · 2025-10-13T16:35:16Z

Hi @rmuir @msokolov ,
I'm yet to review this PR. But I see your points as the hard limit check should be enough as it accounts for byteBlockPool as well.

For context , I originally created this issue #15152 - where an opensearch user encountered the byteblockpool overflow during recovery.

 message [shard failure, reason [index id[3458764570588151359] origin[LOCAL_TRANSLOG_RECOVERY] seq#[53664468]]], failure [NotSerializableExceptionWrapper[arithmetic_exception: integer overflow]], markAsStale [true]]
NotSerializableExceptionWrapper[arithmetic_exception: integer overflow]
    at java.lang.Math.addExact(Math.java:883)
    at org.apache.lucene.util.ByteBlockPool.nextBuffer(ByteBlockPool.java:199)
    at org.apache.lucene.index.ByteSlicePool.allocKnownSizeSlice(ByteSlicePool.java:118)
    at org.apache.lucene.index.ByteSlicePool.allocSlice(ByteSlicePool.java:98)
    at org.apache.lucene.index.TermsHashPerField.writeByte(TermsHashPerField.java:226)
    at org.apache.lucene.index.TermsHashPerField.writeVInt(TermsHashPerField.java:266)
    at org.apache.lucene.index.FreqProxTermsWriterPerField.writeProx(FreqProxTermsWriterPerField.java:86)
    at org.apache.lucene.index.FreqProxTermsWriterPerField.addTerm(FreqProxTermsWriterPerField.java:197)
    at org.apache.lucene.index.TermsHashPerField.positionStreamSlice(TermsHashPerField.java:214)
    at org.apache.lucene.index.TermsHashPerField.add(TermsHashPerField.java:202)
    at org.apache.lucene.index.IndexingChain$PerField.invertTokenStream(IndexingChain.java:1287)
    at org.apache.lucene.index.IndexingChain$PerField.invert(IndexingChain.java:1183)
    at org.apache.lucene.index.IndexingChain.processField(IndexingChain.java:731)
    at org.apache.lucene.index.IndexingChain.processDocument(IndexingChain.java:609)
    at org.apache.lucene.index.DocumentsWriterPerThread.updateDocuments(DocumentsWriterPerThread.java:263)
    at org.apache.lucene.index.DocumentsWriter.updateDocuments(DocumentsWriter.java:425)
    at org.apache.lucene.index.IndexWriter.updateDocuments(IndexWriter.java:1558)
    at org.apache.lucene.index.IndexWriter.addDocuments(IndexWriter.java:1516)
    at org.opensearch.index.engine.InternalEngine.addStaleDocs(InternalEngine.java:1291)
    at org.opensearch.index.engine.InternalEngine.indexIntoLucene(InternalEngine.java:1210)
    at org.opensearch.index.engine.InternalEngine.index(InternalEngine.java:1011)
    at org.opensearch.index.shard.IndexShard.index(IndexShard.java:1226)

I think the check for IndexWriterHardLimit in FlushControl comes after we do DocumentsWriter.updateDocuments where adding many documents could potentially exceed the limit and hit this exception.

Do we need a buffer for writer limits to account for next set of documents ?
Do we need to limit the number of docs that can be passed to this method ?

github-actions · 2025-10-14T02:19:03Z

This PR does not have an entry in lucene/CHANGES.txt. Consider adding one. If the PR doesn't need a changelog entry, then add the skip-changelog label to it and you will stop receiving this reminder on future updates to the PR.

github-actions · 2025-10-14T02:28:50Z

This PR does not have an entry in lucene/CHANGES.txt. Consider adding one. If the PR doesn't need a changelog entry, then add the skip-changelog label to it and you will stop receiving this reminder on future updates to the PR.

[email protected] added 2 commits October 8, 2025 16:11

resolved the overflow issue.

2327d3a

Merge branch 'main' of https://github.com/ashish159357/lucene into lu…

328e2f6

…cene-15152

github-actions bot added the module:core/index label Oct 12, 2025

ashish159357 changed the title ~~Fix ByteBlockPool integer overflow by implementing buffer limit detection #15152~~ Fix ByteBlockPool integer overflow by implementing buffer limit detection Oct 12, 2025

ashish159357 mentioned this pull request Oct 12, 2025

Force flush the DWPT to prevent ByteBlockPool buffers overflow during active indexing #15152

Open

inline !!

c96c82c

inline the code !!

791c56a

refactor the test case !!

5187071

refactor the test case !!

141c022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix ByteBlockPool integer overflow by implementing buffer limit detection #15330

Fix ByteBlockPool integer overflow by implementing buffer limit detection #15330

Uh oh!

ashish159357 commented Oct 12, 2025

Uh oh!

github-actions bot commented Oct 12, 2025

Uh oh!

rmuir commented Oct 12, 2025

Uh oh!

msokolov commented Oct 12, 2025

Uh oh!

github-actions bot commented Oct 12, 2025

Uh oh!

github-actions bot commented Oct 13, 2025

Uh oh!

bharath-techie commented Oct 13, 2025

Uh oh!

github-actions bot commented Oct 14, 2025

Uh oh!

github-actions bot commented Oct 14, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Fix ByteBlockPool integer overflow by implementing buffer limit detection #15330

Are you sure you want to change the base?

Fix ByteBlockPool integer overflow by implementing buffer limit detection #15330

Uh oh!

Conversation

ashish159357 commented Oct 12, 2025

Uh oh!

github-actions bot commented Oct 12, 2025

Uh oh!

rmuir commented Oct 12, 2025

Uh oh!

msokolov commented Oct 12, 2025

Uh oh!

github-actions bot commented Oct 12, 2025

Uh oh!

github-actions bot commented Oct 13, 2025

Uh oh!

bharath-techie commented Oct 13, 2025

Uh oh!

github-actions bot commented Oct 14, 2025

Uh oh!

github-actions bot commented Oct 14, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants