Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor(puffin): Move puffin crate contents inside iceberg crate #789

Merged

Conversation

fqaiser94
Copy link
Contributor

@fqaiser94 fqaiser94 commented Dec 13, 2024

Part of #744

Summary

  • Move contents of the puffin crate over to the existing iceberg crate
  • Delete the puffin crate

Context

@liurenjie1024 raised a good question about why we needed a separate puffin crate at all.
On reflection, I realized that there's no particularly strong reason to have a separate puffin crate.
Hence this PR.

I don't have a super strong opinion on this change; happy to leave things as-is if people object.
I was however persuaded to make this change when I noticed that in the Java implementation, the Puffin functionality is packaged as part of the iceberg-core module (rather than as a separate module).

Copy link
Member

@Xuanwo Xuanwo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A concern is whether all Iceberg users will require Puffin support. It's important to note that users who only access the catalog will also need to pull the Iceberg core. However, I don't have a strong opinion on this.

I'll approve this to unblock the PR from merging. This decision can be left to @Fokko and @liurenjie1024.

@Fokko
Copy link
Contributor

Fokko commented Dec 13, 2024

I can see value in both. It could be that folks just want to have logic to read the Puffin files, and then they could just use the crate (for example PyIceberg) without having to pull in core. Not super strong on this either, if we feel like we should just move it into the core crate, than that's okay with me as well 👍

@Xuanwo
Copy link
Member

Xuanwo commented Dec 13, 2024

Thank you, @Fokko, for the feedback. Since neither of us has a strong opinion on this and the suggestion comes from @liurenjie1024 with agreement from @fqaiser94, I believe there’s no need to wait for another round of review. This change can easily be reverted at any time if we find it to be incorrect. So, let’s move forward.

@Xuanwo Xuanwo changed the title Move puffin crate contents inside iceberg crate refactor: Move puffin crate contents inside iceberg crate Dec 13, 2024
@Xuanwo Xuanwo merged commit 88ad671 into apache:main Dec 13, 2024
17 checks passed
@fqaiser94 fqaiser94 changed the title refactor: Move puffin crate contents inside iceberg crate refactor(puffin): Move puffin crate contents inside iceberg crate Dec 13, 2024
@fqaiser94 fqaiser94 deleted the move_from_puffin_crate_to_iceberg_crate branch December 13, 2024 12:14
@liurenjie1024
Copy link
Contributor

The reason why I suggest moving puffin into core crate is to avoid circulate dependency problem, thinking about the case we put it outside:

  1. Puffer Read/Writer -> FileIO -> core crate
  2. core crate -> planning/deletion vector processing -> puffin crate

Usually I'm leaning toward to smaller crates so that user could choose a minimum set of dependencies. But if we really want to do this, we need to split core crate into smaller crates, for example file io, data reader/writer.

@Fokko
Copy link
Contributor

Fokko commented Dec 17, 2024

That makes sense @liurenjie1024 thanks for the explanation 👍 Coming from Java/Python, everything in Rust is small anyway, so I think we're good by moving Puffin into Core.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants