Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduce require_files for tracking the add files in table state #594

Merged
merged 1 commit into from
May 4, 2022

Conversation

mosyp
Copy link
Contributor

@mosyp mosyp commented May 3, 2022

Description

Since #454 dev is stopped. We aim to gradually address that issue.
Firstly, this PR. The changes introduces require_files which similar to require_tombstones, filters out any files if the flag is set to false. Hence the table state will end up with metadata only. This is a perfect behavior for append only apps (like kafka delta ingest for example).

Secondly, the catch is the how to create a checkpoint, because we need both adds & removes for that. One possible approach is a create checkpoint in another process, which is applicable. However we can omit that and do in process if we apply the work from #454. E.g. by operating on arrow/record batch objects to reduce memory usage / decentralization just enough to read/create checkpoint.

Thirdly, if 2nd step is successful, we can leverage that experience to finish/enhance the idea from #454

Related Issue(s)

A follow up of #445. But here we introduce ignoring of add files.

@mosyp mosyp force-pushed the ignore-adds-removes branch from 8c0d7fd to bc2e009 Compare May 3, 2022 11:07
@mosyp mosyp requested review from houqp, xianwill and rtyler May 3, 2022 11:08
@mosyp mosyp marked this pull request as ready for review May 3, 2022 11:17
@houqp
Copy link
Member

houqp commented May 4, 2022

nice optimization for append only workload!

@mosyp mosyp merged commit 812d827 into delta-io:main May 4, 2022
@mosyp mosyp deleted the ignore-adds-removes branch May 4, 2022 05:52
fvaleye pushed a commit to fvaleye/delta-rs that referenced this pull request May 4, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants