-
Notifications
You must be signed in to change notification settings - Fork 32
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(spec): Add primitive data types #18
Conversation
crates/paimon/Cargo.toml
Outdated
@@ -27,6 +27,7 @@ license.workspace = true | |||
version.workspace = true | |||
|
|||
[dependencies] | |||
chrono = "0.4.38" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This has been changed in #15.
crates/paimon/src/spec/types.rs
Outdated
} | ||
|
||
pub trait DataTypeTrait { | ||
/** Returns whether a value of this type can be {@code null}. */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi, please use ///
for comments.
#[repr(u8)] | ||
#[derive(Debug, Clone, Eq, PartialEq, Serialize, Deserialize)] | ||
#[serde(rename_all("camelCase"))] | ||
pub enum DataTypeRoot { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are we implementing bitset? How about using bitflags
instead?
crates/paimon/src/spec/types.rs
Outdated
type_root: DataTypeRoot, | ||
} | ||
|
||
pub trait DataTypeTrait { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Question: We need this trait because we want users to have their own data type?
crates/paimon/src/spec/types.rs
Outdated
state.end() | ||
} | ||
|
||
fn equals(&self, other: &Self) -> bool { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rust doesn't implement equals
or hash
in this way. We need to use Hash
and Eq
/PartialEq
.
#[derive(Debug, Clone, PartialEq, Eq, Deserialize, Serialize)] | ||
pub struct DecimalType { | ||
element_type: DataType, | ||
precision: u32, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about using rust_decimal
: https://docs.rs/rust_decimal/latest/rust_decimal/
Thanks @Xuanwo @SteNicholas I will modify it according to the comments later. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@XuQianJin-Stars Thanks for your contribution, Left some minor comments.
crates/paimon/src/spec/types.rs
Outdated
} | ||
|
||
/// Returns true if the data type is with the family. | ||
/// Impl Reference: <https://github.com/apache/paimon/blob/db8bcd7fdd9c2705435d2ab1d2341c52d1f67ee5/paimon-common/src/main/java/org/apache/paimon/types/DataType.java#L214> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The line number seems not correct.
crates/paimon/src/spec/types.rs
Outdated
} | ||
} | ||
|
||
/// Returns true if the data type is with the family. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Incorrect comment
crates/paimon/src/spec/types.rs
Outdated
.any(|f: &DataTypeFamily| self.is_with_family(f.clone())) | ||
} | ||
|
||
/// Returns true if the data type is with the family. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Incorrect comment
crates/paimon/src/spec/types.rs
Outdated
self.type_root.families().contains(family) | ||
} | ||
|
||
/// Returns true if the data type is with the family. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Incorrect comment
crates/paimon/src/spec/types.rs
Outdated
self.copy(self.is_nullable) | ||
} | ||
|
||
/// Returns true if the data type is with the family. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto
crates/paimon/src/spec/types.rs
Outdated
self.copy(false) | ||
} | ||
|
||
fn as_sql_string(&self) -> String { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This method should be moved to the subclass, Right?
crates/paimon/src/spec/types.rs
Outdated
} | ||
|
||
pub fn as_sql_string(&self) -> String { | ||
format!("ARRAY<{}>", self.element_type.as_sql_string()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The nullability
is not displayed in the sql string.
/// | ||
/// Impl Reference: <https://github.com/apache/paimon/blob/master/paimon-common/src/main/java/org/apache/paimon/types/DataTypeFamily.java> | ||
#[derive(Debug, Clone, PartialEq, Eq)] | ||
pub struct DataTypeFamily: u32 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm curious why we use bitset here instead of the original HashSet
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's better to use a bitset here for easier readability and to avoid unnecessary allocation. DataTypeFamily
just a u32
which can store in stack but HashSet
needs heap allocations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Get it, thanks.
type_root: DataTypeRoot, | ||
} | ||
|
||
impl Display for DataType { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should impl Display for the concrete types such as BigIntType
not for the DataType
? Otherwise, we can not directly print the subtypes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot for your work. We are on the right track. There are some comments for detailed API design, please take a look.
crates/paimon/src/spec/types.rs
Outdated
/// The root of data type. | ||
/// | ||
/// Impl Reference: <https://github.com/apache/paimon/blob/db8bcd7fdd9c2705435d2ab1d2341c52d1f67ee5/paimon-common/src/main/java/org/apache/paimon/types/DataTypeRoot.java#L49> | ||
#[derive(Debug, Clone, PartialEq, Eq, Deserialize, Serialize, Hash)] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we can make DataTypeRoot
Copy
?
crates/paimon/src/spec/types.rs
Outdated
|
||
/// Returns a deep copy of this type. It requires an implementation of {@link #copy(boolean)}. | ||
/// Impl Reference: <https://github.com/apache/paimon/blob/db8bcd7fdd9c2705435d2ab1d2341c52d1f67ee5/paimon-common/src/main/java/org/apache/paimon/types/DataType.java#L120> | ||
fn copy_with_nullable(&self) -> Self { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is Copy
in rust.
crates/paimon/src/spec/types.rs
Outdated
|
||
/// Returns a deep copy of this type with possibly different nullability. | ||
/// Impl Reference: <https://github.com/apache/paimon/blob/db8bcd7fdd9c2705435d2ab1d2341c52d1f67ee5/paimon-common/src/main/java/org/apache/paimon/types/DataType.java#L113> | ||
fn copy(&self, is_nullable: bool) -> Self { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can make this API called fn with_nullable(self, is_nullable: bool) -> Self
crates/paimon/src/spec/types.rs
Outdated
|
||
/// Compare two data types without nullable. | ||
/// Impl Reference: <https://github.com/apache/paimon/blob/db8bcd7fdd9c2705435d2ab1d2341c52d1f67ee5/paimon-common/src/main/java/org/apache/paimon/types/DataType.java#L129> | ||
fn copy_ignore_nullable(&self) -> Self { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is duplicated with with_nullable(false)
?
crates/paimon/src/spec/types.rs
Outdated
self.copy(false) | ||
} | ||
|
||
fn serialize_json(&self) -> String { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ideally, we don't need to provide API like this directly. If we have special requirement for Serialize
, we need to implement by hand instead of using #[derive(Serialize)]
crates/paimon/src/spec/types.rs
Outdated
} | ||
} | ||
|
||
pub fn default_value() -> Self { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The same.
|
||
impl Display for FloatType { | ||
fn fmt(&self, f: &mut Formatter<'_>) -> std::fmt::Result { | ||
if !self.element_type.is_nullable() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The same.
/// | ||
/// Impl Reference: <https://github.com/apache/paimon/blob/master/paimon-common/src/main/java/org/apache/paimon/types/TimestampType.java>. | ||
#[derive(Debug, Clone, PartialEq, Eq, Deserialize, Serialize, Hash)] | ||
pub struct LocalZonedTimestampType { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interesting, paimon has a special type for local zone timestamp.
crates/paimon/src/spec/types.rs
Outdated
self.precision | ||
) | ||
} else { | ||
write!(f, "TIMESTAMP WITH LOCAL TIME ZONE({})", self.precision) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can implement like:
write!(f, "TIMESTAMP WITH LOCAL TIME ZONE({})", self.precision)?;
if !self.element_type.is_nullable() {
write!(f, " NOT NULL")?;
}
Ok(())
crates/paimon/src/spec/types.rs
Outdated
Self::new(true, precision) | ||
} | ||
|
||
pub fn default_value() -> Self { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The same.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mostly LGTM to me now. There all only some small nits.
crates/paimon/src/spec/types.rs
Outdated
/// Returns whether the family type of the type equals to the family or not. | ||
/// | ||
/// Impl Reference: <https://github.com/apache/paimon/blob/db8bcd7fdd9c2705435d2ab1d2341c52d1f67ee5/paimon-common/src/main/java/org/apache/paimon/types/DataType.java#L103> | ||
fn with_family(&self, family: DataTypeFamily) -> bool { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about naming it is_family
?
crates/paimon/src/spec/types.rs
Outdated
|
||
/// Returns whether the root of the type is part of at least one family of the families or not. | ||
/// Impl Reference: <https://github.com/apache/paimon/blob/db8bcd7fdd9c2705435d2ab1d2341c52d1f67ee5/paimon-common/src/main/java/org/apache/paimon/types/DataType.java#L94> | ||
fn is_any_with_family(&self, families: &[DataTypeFamily]) -> bool { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about naming it is_any_of_family
to align with is_any_of
?
crates/paimon/src/spec/types.rs
Outdated
fn is_any_with_family(&self, families: &[DataTypeFamily]) -> bool { | ||
families | ||
.iter() | ||
.any(|f: &DataTypeFamily| self.with_family(f.clone())) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
DataTypeFamily
is Copy, so I'm guessing we don't need to clone it.
Have you tried:
families
.iter()
.any(self.is_family)
crates/paimon/src/spec/types.rs
Outdated
|
||
/// Returns a deep copy of this type with possibly different nullability. | ||
/// Impl Reference: <https://github.com/apache/paimon/blob/db8bcd7fdd9c2705435d2ab1d2341c52d1f67ee5/paimon-common/src/main/java/org/apache/paimon/types/DataType.java#L113> | ||
fn with_nullable(&self, is_nullable: bool) -> Self { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can place construct API in the front to make it easier to find.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great, thanks a lot for your effort!
cc @SteNicholas, @Aitozi and @JingsongLi for another look. |
I have one more question, Why the DataType is not defined as trait ? If not, how the DataType used in other class. eg: |
Hi, thanks for bringing this up. I don't quite understand the issue you mentioned. Are you trying to say that it's possible for users to implement they own data types and we don't know how to display those types? If so, I believe current implemantion of impl Display for ArrayType {
fn fmt(&self, f: &mut Formatter<'_>) -> std::fmt::Result {
if self.element_type.is_nullable() {
write!(f, "ARRAY")
} else {
write!(f, "ARRAY NOT NULL")
}
}
} This is the part of json represent of array. In fact, we need to use |
After some offline discussion with @Aitozi, I got his point now. I think we need to build an enum for |
An enum |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
This PR will add primitive data types.