Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Spark] Fix resource leaking when a deletion vector file is not found #2113

Closed
wants to merge 2 commits into from

Conversation

vkorukanti
Copy link
Collaborator

@vkorukanti vkorukanti commented Sep 27, 2023

Which Delta project/connector is this regarding?

  • Spark
  • Standalone
  • Flink
  • Kernel
  • Other (fill in here)

Description

The DeltaParquetFileFormat adds additional glue for filtering the deleting rows (according to the deletion vector) from the iterator returned by the ParquetFileFormat. In case when the DV file is not found, the iterator returned from ParquetFileFormat should be close.

How was this patch tested?

An integration simulating the DV file deletion and verifying no resource leak.

Does this PR introduce any user-facing changes?

No

@vkorukanti vkorukanti added this to the 3.0.0 milestone Sep 27, 2023
iterToReturn.asInstanceOf[Iterator[InternalRow]]
} catch {
case NonFatal(e) =>
// Close the iterator if it is a closeable resource. The `ParquetFileFormat` opens
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This pattern is similar to how Spark closes the iterator returned from the data sources.

Copy link
Collaborator

@ryan-johnson-databricks ryan-johnson-databricks left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Subtle... thanks for the fix!

@vkorukanti vkorukanti closed this in 2d92266 Oct 2, 2023
vkorukanti added a commit to vkorukanti/delta that referenced this pull request Oct 3, 2023
The `DeltaParquetFileFormat` adds additional glue for filtering the deleting rows (according to the deletion vector) from the iterator returned by the `ParquetFileFormat`. In case when the DV file is not found, the iterator returned from `ParquetFileFormat` should be close.

An integration test simulating the DV file deletion and verifying no resource leak.

Closes delta-io#2113

Signed-off-by: Venki Korukanti <[email protected]>
GitOrigin-RevId: d378b495630da31ff2af062dd9124874fdb69e12
vkorukanti added a commit to vkorukanti/delta that referenced this pull request Oct 3, 2023
The `DeltaParquetFileFormat` adds additional glue for filtering the deleting rows (according to the deletion vector) from the iterator returned by the `ParquetFileFormat`. In case when the DV file is not found, the iterator returned from `ParquetFileFormat` should be close.

An integration test simulating the DV file deletion and verifying no resource leak.

Closes delta-io#2113

Signed-off-by: Venki Korukanti <[email protected]>
GitOrigin-RevId: d378b495630da31ff2af062dd9124874fdb69e12
Kimahriman pushed a commit to Kimahriman/delta that referenced this pull request Oct 3, 2023
The `DeltaParquetFileFormat` adds additional glue for filtering the deleting rows (according to the deletion vector) from the iterator returned by the `ParquetFileFormat`. In case when the DV file is not found, the iterator returned from `ParquetFileFormat` should be close.

An integration test simulating the DV file deletion and verifying no resource leak.

Closes delta-io#2113

Signed-off-by: Venki Korukanti <[email protected]>
GitOrigin-RevId: d378b495630da31ff2af062dd9124874fdb69e12
@vkorukanti vkorukanti deleted the fixResourceLeak branch May 9, 2024 02:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants