Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

chore: add datatype new_list #4561

Merged
merged 2 commits into from
Aug 2, 2023
Merged

Conversation

fansehep
Copy link
Contributor

Which issue does this PR close?

Closes #4544

Rationale for this change

What changes are included in this PR?

Are there any user-facing changes?

add a simple func to and the default name is "item".

@github-actions github-actions bot added the arrow Changes to the arrow crate label Jul 22, 2023
@@ -576,6 +576,11 @@ impl DataType {
_ => self == other,
}
}

/// Create a List DataType default name is "item"
pub fn new_list(data_type: DataType, nullable: bool) -> Self {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think for consistency with Field::new_list we should take the item name as an argument. Thoughts @alamb ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I personally think we should use the standard / convention "item" always (not take a new parameter) to encourage people to follow the convention. If people want a different name, they can construct the DataType directly.

In terms of consistency with, https://docs.rs/arrow/latest/arrow/datatypes/struct.Field.html#method.new_list I would recommend we remove the name parameter from that method, due to the same rationale above

Copy link
Contributor

@tustvold tustvold Jul 22, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

recommend we remove the name parameter from that method

It can't be removed, it is part of the field parameter that is passed in, which needs to be kept for nullability, metadata, etc... I just think it will be more confusing for users if some APIs specify "item" and some don't

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just think it will be more confusing for users if some APIs specify "item" and some don't

I don't think it will increase confusion as it is already there: "I am making a list, what name should I use for its child" when the name never is used

To reduce the confusion I think we should reduce the need for specifying that field name ever when making lists

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @fansehep

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this method will be helpful and make downstream code working with Lists easier to manage. Thank you @fansehep

@alamb
Copy link
Contributor

alamb commented Aug 2, 2023

@tustvold and I had some conversation about this PR that I wanted to add as context

I believe this PR is a step forward because the way DataType::List is represented in arrow-rs forces the user to pick a name for something that is essentially anonymous ("item") is very confusing.

So in my mind, it is a really bad user experience that users have know to cargo cult "item". For example DataFusion copies "item" in several places:

https://github.com/search?q=repo%3Aapache%2Farrow-datafusion%20%5C%22item%5C%22&type=code

I believe that @tustvold is worried that hiding the details of DataType::List construction also hides the construction of the Field (and thus encourages use of the API that might lose field level metadata)

While this is probably fine, it seems a bit incoherent to have some APIs in terms of DataType and some Field.

I plan follow on PRs to:

  1. Clarify the documentation that DataType::new_list does not carry forward field metadata
  2. Try and add Field::new_list_item or something

@alamb
Copy link
Contributor

alamb commented Aug 2, 2023

Thanks again @fansehep

@alamb
Copy link
Contributor

alamb commented Aug 2, 2023

Follow on: #4627

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
arrow Changes to the arrow crate
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Convenience method to create DataType::List correctly
3 participants