[IOTDB-6355] Fix query scan will return duplicated timestamp or unordered timestamp while TsFileResource degrading #14458
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Assuming that we have the following file distribution, and all the files shown have been degraded to FileTimeIndex:
While we do scanning, we will firstly unpack the frist seq file and the first and second unseq file, then while we merge on read, we will continue to get timestamp 4, then we using startTime of the second seq time which is 5 to judge whether it's overlapped with timestamp 4. 5 is larger than 4, so we think that the second seq tsfile isn't overlapped with timestamp 4, and as it's in seq space, so we mistakenly assumend that none of the subsequent files would overlap with timestamp 4.
The above processing is correct while there doesn't exist degraded resource in seq space, however while exsiting degraded resource in seq space, if current seq tsfile's startTime is larger than the current timestamp, we cannot conclude that the startTime of subsequent seq tsfiles are all larger than current timestamp.
The solution is that we will return Long.MIN_VALUE for FileTimeIndex in seq space which means that all the degraded seq resources will be unpacked anyway. And then we can use the TimeSeriesMetadata to further precisely judge we need to stop searching: