Skip to content

Conversation

LiebingYu
Copy link
Contributor

Purpose

Linked issue: close #1716

Brief change log

Tests

API and Format

Documentation

@LiebingYu LiebingYu force-pushed the fix-corrupt-index-new branch 2 times, most recently from b9b5f2d to ceaf704 Compare September 24, 2025 07:50
@LiebingYu
Copy link
Contributor Author

Ready for CR @wuchong @swuferhong

Copy link
Contributor

@swuferhong swuferhong left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@LiebingYu Thanks for your work. I left some comments.

new WriterStateManager(
logSegments.getTableBucket(),
logTabletDir,
this.writerStateManager.writerExpirationMs());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why we need to new a WriterStateManager? Maybe we can add a clear() method?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This logic follow Kafka, and ceate a new WriterStateManager is a lightweight operation. I think it's ok.

numUnflushed,
logSegments.getTableBucket());

int truncatedBytes = -1;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to be this too aggressive here? Deleting all subsequent logSegments just because one cannot be repaired — I feel this might pose a risk of data loss. Also, we don't have test coverage for this logic.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This logic also follow Kafka. From my perspective, data loss is unlikely because the data is stored in multiple replicas. Once the file is truncated to the correct position, it can synchronize the latest data from the leader. If truncation is not carried out, the file appears to be unrecoverable, and if the host machine becomes the leader afterward, unforeseen problems might occur.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And I add test to cover this.

@LiebingYu LiebingYu force-pushed the fix-corrupt-index-new branch from ceaf704 to 1e8a582 Compare September 25, 2025 14:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

ReplicaFetcherThread keeps throwing UnknownServerException because of corrupt index file
2 participants