-
Notifications
You must be signed in to change notification settings - Fork 2.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Docs: add note for day
transform
#11749
Open
xxchan
wants to merge
1
commit into
apache:main
Choose a base branch
from
xxchan:xxchan/weary-skink
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from all commits
Commits
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, it's unclear what is physical type versus logical type.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you have better suggestions?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about using usingSeems there are discussions about this #9345date
directly here. In the context of iceberg,date
is primitive type. ref: https://iceberg.apache.org/spec/#nested-types:~:text=38%20or%20less-,date,-Calendar%20date%20without cc @FokkoThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TBH I also don't fully understand why we can't use
date
directly here, since it's a primitive type.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I needed some time to think about this. I would also lean towards stating that it returns a
date
. The confusion (and also the terminology) comes from that it is being encoded in Avro where the physical (bytes written to disk) is anint
, that's annotated with a logical type that it represents adate
.The relevant piece of code:
iceberg/api/src/main/java/org/apache/iceberg/transforms/Dates.java
Lines 92 to 98 in fe2f593
Curious to learn what others think @RussellSpitzer @rdblue @danielcweeks @nastra
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think many previous questions and discussions ended prematurely with "date is also just an integer". Or "physically as int", but displayed as "date", this is also very confusing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the spec is clear here for
day
transform.days from 1970-01-01
is aint
. What might be confusing is withdate
and implementation details, which can be enhanced like apache/iceberg-python#1211There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe this is not clear enough, and has lead to problems repeately in the wild like apache/iceberg-rust#478.
As also mentioned by Fokko, what is now persisted is really an "Avro Date". Parse it by assuming it's an Avro Int will lead to error.
Actually this looks a case of abstraction leak to me: We didn't specify
date
isint
(days from 1970-01-01
).But the
day
transform here requires:int
(days from 1970-01-01
)Date
(This is not mentioned in the spec here, but is in the reference implementation.)This implicitly forces
date
to beint
. (And thenday
transform's return should also bedate
)There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, but since it is serialized with Avro, it will always be an
int
:Specifying this twice would lead to duplication of the Avro spec into the Iceberg spec. The error in Iceberg-Rust did raise my eyebrow a bit since I would expect it to read the
int
without thelogicalType
as well because there is no ambiguity.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also encountered this before in apache/iceberg-python#1208 and apache/iceberg-python#1211. There's also this accompanying devlist thread https://lists.apache.org/thread/2gq7b54nvc9q6f1j08l9lnzgm5onkmx5