1.6.x #14397

hashmapybx · 2025-10-22T10:53:30Z

Key Improvements

Code Structure Optimization
Splitting the complex testTwoLevelList() method into multiple, clear private methods
Extracting duplicate logic from writeAndValidate() into a separate method
Enhanced Exception Handling
Each method has clear exception handling and detailed error messages
Using throw new IOException(..., e) to preserve the original exception stack trace
Adding contextual information such as file paths to facilitate debugging
Improved Readability
Adding detailed JavaDoc comments to explain each method's functionality
Using meaningful variable names ( expectedIter , actualIter )
Adding assertion messages ( as("Expected more rows at index %d", i) )
Resource Management Improvements
A separate createTempFile() method now handles file creation failures
Using a finally block in writeTestDataToParquet() to ensure writer closure
Each try-with-resources block has clear exception handling
New Methods
validateTwoLevelListConversion() - Validates two-level list conversion
writeTestDataToParquet() - Writes test data
readAndValidateTwoLevelList() - Read and validate a two-level list
writeRecordsToParquet() - Write records
readAndValidateRecords() - Read and validate records
createTempFile() - Create a temporary file

…(backport apache#10691) (apache#10787) ParallelIterable schedules 2 * WORKER_THREAD_POOL_SIZE tasks for processing input iterables. This defaults to 2 * # CPU cores. When one or some of the input iterables are considerable in size and the ParallelIterable consumer is not quick enough, this could result in unbounded allocation inside `ParallelIterator.queue`. This commit bounds the queue. When queue is full, the tasks yield and get removed from the executor. They are resumed when consumer catches up. (cherry picked from commit 7831a8d) Co-authored-by: Piotr Findeisen <[email protected]>

As part of the change in commit 7831a8d, queue low water mark was introduced. However, it resulted in increased number of manifests being read when planning LIMIT queries in Trino Iceberg connector. To avoid increased I/O, back out the change for now.

Bumps `orc` from 1.9.3 to 1.9.4. Updates `org.apache.orc:orc-core` from 1.9.3 to 1.9.4 Updates `org.apache.orc:orc-tools` from 1.9.3 to 1.9.4 --- updated-dependencies: - dependency-name: org.apache.orc:orc-core dependency-type: direct:production update-type: version-update:semver-patch - dependency-name: org.apache.orc:orc-tools dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

…uetReader - Extract complex logic into separate private methods for better maintainability - Add comprehensive JavaDoc comments for all methods - Enhance exception handling with detailed error messages and context - Improve variable naming for clarity (expectedIter, actualIter) - Add assertion messages for better test failure diagnostics - Separate two-level list validation and record validation flows

nastra · 2025-10-23T05:47:10Z

@hashmapybx did you mean to open this PR against Iceberg's main branch? It seems like the code is partially specific to Flink 1.17, which isn't maintained anymore

raunaqmorarka and others added 4 commits July 26, 2024 18:20

github-actions bot added core flink labels Oct 22, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

1.6.x #14397

1.6.x #14397

Uh oh!

hashmapybx commented Oct 22, 2025

Uh oh!

nastra commented Oct 23, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

1.6.x #14397

Are you sure you want to change the base?

1.6.x #14397

Uh oh!

Conversation

hashmapybx commented Oct 22, 2025

Uh oh!

nastra commented Oct 23, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants