Core: Move deleted files to Hadoop trash if configured #14501

jordepic · 2025-11-04T16:21:49Z

As of now, the HadoopFileIO uses the Java delete API, which always skips using a configured trash directory. If the table's hadoop configuration has trash enabled, we should use it. In the event of single file deletions, we aim to mimic existing behavior, throwing an error when attempting to delete a directory. In the single file delete case, this notably adds an RPC call (to check if a path is a directory) if the trash is enabled.

core/src/main/java/org/apache/iceberg/hadoop/HadoopFileIO.java

core/src/test/java/org/apache/iceberg/hadoop/TestHadoopFileIO.java

anuragmantri

Thanks for the PR @jordepic. This is very useful.

However, it seems like a behavior change (even if trash was enabled previously, Iceberg was not honoring it). IMO, we should make this configurable using a property to avoid surprises (unexpected storage consumption).

anuragmantri · 2025-11-06T00:02:43Z

core/src/main/java/org/apache/iceberg/hadoop/HadoopFileIO.java

  }

+  private void deletePath(FileSystem fs, Path toDelete, boolean recursive) throws IOException {
+    Trash trash = new Trash(fs, getConf());


I'm concerned about number of Trash objects we create. Does Hadoop API ensure we can reuse the trash object for a given (fs, conf)?
I couldn't tell from https://hadoop.apache.org/docs/stable/api/org/apache/hadoop/fs/Trash.html#Trash-org.apache.hadoop.fs.FileSystem-org.apache.hadoop.conf.Configuration-

It's a good call. I've added a new toggle that can be put in the hadoop configuration to determine if we want to use the trash for iceberg, following Russel Spitzer's example in other HadoopFileIO changes.

I've taken a look regarding object reuse. The trash can change due to lots of changes in configuration (meaning I'd have to create a cache based on 5+ configuration values which are susceptible to change in the future), unlike the file system (Key doesn't actually rely on conf, just relies on the URI and user group information). With that being said, the change that I made to check for hadoop configuration first makes it so that we don't create the Trash object unless specifically opted into. I hope that this is good enough for now - an iceberg user will now have to opt into this change to experience any possible object churn.

As of now, the HadoopFileIO uses the Java delete API, which always skips using a configured trash directory. If the table's hadoop configuration has trash enabled, we should use it. In the event of file deletions, we aim to mimic existing behavior, throwing an error when attempting to delete a directory. In the single file delete case, this notably adds an RPC call (to check if a path is a directory) if the trash is enabled.

github-actions bot added the core label Nov 4, 2025

jordepic force-pushed the HADOOP_FILE_IO_CHANGE branch from b0ba8b9 to 8d07d49 Compare November 4, 2025 16:50

manuzhang reviewed Nov 5, 2025

View reviewed changes

core/src/main/java/org/apache/iceberg/hadoop/HadoopFileIO.java Outdated Show resolved Hide resolved

manuzhang reviewed Nov 5, 2025

View reviewed changes

core/src/test/java/org/apache/iceberg/hadoop/TestHadoopFileIO.java Show resolved Hide resolved

jordepic force-pushed the HADOOP_FILE_IO_CHANGE branch from 8d07d49 to 5cb16cf Compare November 5, 2025 16:14

anuragmantri reviewed Nov 6, 2025

View reviewed changes

jordepic force-pushed the HADOOP_FILE_IO_CHANGE branch from 5cb16cf to efc6a8f Compare November 6, 2025 15:25

danielcweeks self-requested a review November 10, 2025 19:24

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Core: Move deleted files to Hadoop trash if configured #14501

Core: Move deleted files to Hadoop trash if configured #14501

jordepic commented Nov 4, 2025

Uh oh!

Uh oh!

Uh oh!

anuragmantri left a comment

Uh oh!

anuragmantri Nov 6, 2025

Uh oh!

jordepic Nov 6, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Core: Move deleted files to Hadoop trash if configured #14501

Are you sure you want to change the base?

Core: Move deleted files to Hadoop trash if configured #14501

Conversation

jordepic commented Nov 4, 2025

Uh oh!

Uh oh!

Uh oh!

anuragmantri left a comment

Choose a reason for hiding this comment

Uh oh!

anuragmantri Nov 6, 2025

Choose a reason for hiding this comment

Uh oh!

jordepic Nov 6, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants