Skip to content

Conversation

@hashmapybx
Copy link

Key Improvements

  1. Code Structure Optimization
    Splitting the complex testTwoLevelList() method into multiple, clear private methods
    Extracting duplicate logic from writeAndValidate() into a separate method
  2. Enhanced Exception Handling
    Each method has clear exception handling and detailed error messages
    Using throw new IOException(..., e) to preserve the original exception stack trace
    Adding contextual information such as file paths to facilitate debugging
  3. Improved Readability
    Adding detailed JavaDoc comments to explain each method's functionality
    Using meaningful variable names ( expectedIter , actualIter )
    Adding assertion messages ( as("Expected more rows at index %d", i) )
  4. Resource Management Improvements
    A separate createTempFile() method now handles file creation failures
    Using a finally block in writeTestDataToParquet() to ensure writer closure
    Each try-with-resources block has clear exception handling
  5. New Methods
    validateTwoLevelListConversion() - Validates two-level list conversion
    writeTestDataToParquet() - Writes test data
    readAndValidateTwoLevelList() - Read and validate a two-level list
    writeRecordsToParquet() - Write records
    readAndValidateRecords() - Read and validate records
    createTempFile() - Create a temporary file

raunaqmorarka and others added 4 commits July 26, 2024 18:20
…(backport apache#10691) (apache#10787)

ParallelIterable schedules 2 * WORKER_THREAD_POOL_SIZE tasks for
processing input iterables. This defaults to 2 * # CPU cores.  When one
or some of the input iterables are considerable in size and the
ParallelIterable consumer is not quick enough, this could result in
unbounded allocation inside `ParallelIterator.queue`. This commit bounds
the queue. When queue is full, the tasks yield and get removed from the
executor. They are resumed when consumer catches up.

(cherry picked from commit 7831a8d)

Co-authored-by: Piotr Findeisen <[email protected]>
As part of the change in commit
7831a8d, queue low water mark was
introduced. However, it resulted in increased number of manifests being
read when planning LIMIT queries in Trino Iceberg connector. To avoid
increased I/O, back out the change for now.
Bumps `orc` from 1.9.3 to 1.9.4.

Updates `org.apache.orc:orc-core` from 1.9.3 to 1.9.4

Updates `org.apache.orc:orc-tools` from 1.9.3 to 1.9.4

---
updated-dependencies:
- dependency-name: org.apache.orc:orc-core
  dependency-type: direct:production
  update-type: version-update:semver-patch
- dependency-name: org.apache.orc:orc-tools
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
…uetReader

- Extract complex logic into separate private methods for better maintainability
- Add comprehensive JavaDoc comments for all methods
- Enhance exception handling with detailed error messages and context
- Improve variable naming for clarity (expectedIter, actualIter)
- Add assertion messages for better test failure diagnostics
- Separate two-level list validation and record validation flows
@nastra
Copy link
Contributor

nastra commented Oct 23, 2025

@hashmapybx did you mean to open this PR against Iceberg's main branch? It seems like the code is partially specific to Flink 1.17, which isn't maintained anymore

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants