Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(spec): Add primitive data types #18

Merged
merged 18 commits into from
Jul 25, 2024
Merged

Conversation

XuQianJin-Stars
Copy link
Contributor

This PR will add primitive data types.

@@ -27,6 +27,7 @@ license.workspace = true
version.workspace = true

[dependencies]
chrono = "0.4.38"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This has been changed in #15.

}

pub trait DataTypeTrait {
/** Returns whether a value of this type can be {@code null}. */
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, please use /// for comments.

#[repr(u8)]
#[derive(Debug, Clone, Eq, PartialEq, Serialize, Deserialize)]
#[serde(rename_all("camelCase"))]
pub enum DataTypeRoot {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we implementing bitset? How about using bitflags instead?

type_root: DataTypeRoot,
}

pub trait DataTypeTrait {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question: We need this trait because we want users to have their own data type?

state.end()
}

fn equals(&self, other: &Self) -> bool {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rust doesn't implement equals or hash in this way. We need to use Hash and Eq/PartialEq.

#[derive(Debug, Clone, PartialEq, Eq, Deserialize, Serialize)]
pub struct DecimalType {
element_type: DataType,
precision: u32,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@XuQianJin-Stars
Copy link
Contributor Author

Thanks @Xuanwo @SteNicholas I will modify it according to the comments later.

Copy link
Contributor

@Aitozi Aitozi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@XuQianJin-Stars Thanks for your contribution, Left some minor comments.

}

/// Returns true if the data type is with the family.
/// Impl Reference: <https://github.com/apache/paimon/blob/db8bcd7fdd9c2705435d2ab1d2341c52d1f67ee5/paimon-common/src/main/java/org/apache/paimon/types/DataType.java#L214>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The line number seems not correct.

}
}

/// Returns true if the data type is with the family.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Incorrect comment

.any(|f: &DataTypeFamily| self.is_with_family(f.clone()))
}

/// Returns true if the data type is with the family.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Incorrect comment

self.type_root.families().contains(family)
}

/// Returns true if the data type is with the family.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Incorrect comment

self.copy(self.is_nullable)
}

/// Returns true if the data type is with the family.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

self.copy(false)
}

fn as_sql_string(&self) -> String {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This method should be moved to the subclass, Right?

}

pub fn as_sql_string(&self) -> String {
format!("ARRAY<{}>", self.element_type.as_sql_string())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The nullability is not displayed in the sql string.

///
/// Impl Reference: <https://github.com/apache/paimon/blob/master/paimon-common/src/main/java/org/apache/paimon/types/DataTypeFamily.java>
#[derive(Debug, Clone, PartialEq, Eq)]
pub struct DataTypeFamily: u32 {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm curious why we use bitset here instead of the original HashSet ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's better to use a bitset here for easier readability and to avoid unnecessary allocation. DataTypeFamily just a u32 which can store in stack but HashSet needs heap allocations.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Get it, thanks.

type_root: DataTypeRoot,
}

impl Display for DataType {
Copy link
Contributor

@Aitozi Aitozi Jul 23, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should impl Display for the concrete types such as BigIntType not for the DataType? Otherwise, we can not directly print the subtypes.

Copy link
Member

@Xuanwo Xuanwo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot for your work. We are on the right track. There are some comments for detailed API design, please take a look.

/// The root of data type.
///
/// Impl Reference: <https://github.com/apache/paimon/blob/db8bcd7fdd9c2705435d2ab1d2341c52d1f67ee5/paimon-common/src/main/java/org/apache/paimon/types/DataTypeRoot.java#L49>
#[derive(Debug, Clone, PartialEq, Eq, Deserialize, Serialize, Hash)]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we can make DataTypeRoot Copy?


/// Returns a deep copy of this type. It requires an implementation of {@link #copy(boolean)}.
/// Impl Reference: <https://github.com/apache/paimon/blob/db8bcd7fdd9c2705435d2ab1d2341c52d1f67ee5/paimon-common/src/main/java/org/apache/paimon/types/DataType.java#L120>
fn copy_with_nullable(&self) -> Self {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is Copy in rust.


/// Returns a deep copy of this type with possibly different nullability.
/// Impl Reference: <https://github.com/apache/paimon/blob/db8bcd7fdd9c2705435d2ab1d2341c52d1f67ee5/paimon-common/src/main/java/org/apache/paimon/types/DataType.java#L113>
fn copy(&self, is_nullable: bool) -> Self {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can make this API called fn with_nullable(self, is_nullable: bool) -> Self


/// Compare two data types without nullable.
/// Impl Reference: <https://github.com/apache/paimon/blob/db8bcd7fdd9c2705435d2ab1d2341c52d1f67ee5/paimon-common/src/main/java/org/apache/paimon/types/DataType.java#L129>
fn copy_ignore_nullable(&self) -> Self {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is duplicated with with_nullable(false)?

self.copy(false)
}

fn serialize_json(&self) -> String {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ideally, we don't need to provide API like this directly. If we have special requirement for Serialize, we need to implement by hand instead of using #[derive(Serialize)]

}
}

pub fn default_value() -> Self {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The same.


impl Display for FloatType {
fn fmt(&self, f: &mut Formatter<'_>) -> std::fmt::Result {
if !self.element_type.is_nullable() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The same.

///
/// Impl Reference: <https://github.com/apache/paimon/blob/master/paimon-common/src/main/java/org/apache/paimon/types/TimestampType.java>.
#[derive(Debug, Clone, PartialEq, Eq, Deserialize, Serialize, Hash)]
pub struct LocalZonedTimestampType {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting, paimon has a special type for local zone timestamp.

self.precision
)
} else {
write!(f, "TIMESTAMP WITH LOCAL TIME ZONE({})", self.precision)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can implement like:

write!(f, "TIMESTAMP WITH LOCAL TIME ZONE({})", self.precision);
if !self.element_type.is_nullable() {
    write!(f, " NOT NULL")?;
}
Ok(())

Self::new(true, precision)
}

pub fn default_value() -> Self {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The same.

Copy link
Member

@Xuanwo Xuanwo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mostly LGTM to me now. There all only some small nits.

/// Returns whether the family type of the type equals to the family or not.
///
/// Impl Reference: <https://github.com/apache/paimon/blob/db8bcd7fdd9c2705435d2ab1d2341c52d1f67ee5/paimon-common/src/main/java/org/apache/paimon/types/DataType.java#L103>
fn with_family(&self, family: DataTypeFamily) -> bool {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about naming it is_family?


/// Returns whether the root of the type is part of at least one family of the families or not.
/// Impl Reference: <https://github.com/apache/paimon/blob/db8bcd7fdd9c2705435d2ab1d2341c52d1f67ee5/paimon-common/src/main/java/org/apache/paimon/types/DataType.java#L94>
fn is_any_with_family(&self, families: &[DataTypeFamily]) -> bool {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about naming it is_any_of_family to align with is_any_of?

fn is_any_with_family(&self, families: &[DataTypeFamily]) -> bool {
families
.iter()
.any(|f: &DataTypeFamily| self.with_family(f.clone()))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DataTypeFamily is Copy, so I'm guessing we don't need to clone it.

Have you tried:

families
    .iter()
    .any(self.is_family)


/// Returns a deep copy of this type with possibly different nullability.
/// Impl Reference: <https://github.com/apache/paimon/blob/db8bcd7fdd9c2705435d2ab1d2341c52d1f67ee5/paimon-common/src/main/java/org/apache/paimon/types/DataType.java#L113>
fn with_nullable(&self, is_nullable: bool) -> Self {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can place construct API in the front to make it easier to find.

Copy link
Member

@Xuanwo Xuanwo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great, thanks a lot for your effort!

@Xuanwo
Copy link
Member

Xuanwo commented Jul 25, 2024

cc @SteNicholas, @Aitozi and @JingsongLi for another look.

@Aitozi
Copy link
Contributor

Aitozi commented Jul 25, 2024

I have one more question, Why the DataType is not defined as trait ? If not, how the DataType used in other class. eg: org.apache.paimon.types.DataField#asSQLString the type string can not be obtained ? CC @Xuanwo

@Xuanwo
Copy link
Member

Xuanwo commented Jul 25, 2024

I have one more question, Why the DataType is not defined as trait ? If not, how the DataType used in other class. eg: org.apache.paimon.types.DataField#asSQLString the type string can not be obtained ? CC @Xuanwo

Hi, thanks for bringing this up. I don't quite understand the issue you mentioned.

Are you trying to say that it's possible for users to implement they own data types and we don't know how to display those types? If so, I believe current implemantion of Display is wrong:

impl Display for ArrayType {
    fn fmt(&self, f: &mut Formatter<'_>) -> std::fmt::Result {
        if self.element_type.is_nullable() {
            write!(f, "ARRAY")
        } else {
            write!(f, "ARRAY NOT NULL")
        }
    }
}

This is the part of json represent of array. In fact, we need to use write!(f, "ARRAY<{}>, element_type)" here.

@Xuanwo
Copy link
Member

Xuanwo commented Jul 25, 2024

After some offline discussion with @Aitozi, I got his point now. I think we need to build an enum for DataType instead of just DataTypeRoot in which we carries the entire data type information. We can implement this as follow-up of this PR.

@Aitozi
Copy link
Contributor

Aitozi commented Jul 25, 2024

An enum DataType make sense to me. +1 for the current shape. Let's move for next.

Copy link
Contributor

@Aitozi Aitozi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

@Aitozi Aitozi merged commit ce42c78 into apache:main Jul 25, 2024
7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants