Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parquet schema hint doesn't support integer types upcasting #6891

Closed
gruuya opened this issue Dec 17, 2024 · 2 comments · May be fixed by #6892
Closed

Parquet schema hint doesn't support integer types upcasting #6891

gruuya opened this issue Dec 17, 2024 · 2 comments · May be fixed by #6892
Labels
development-process Related to development process of arrow-rs enhancement Any new improvement worthy of a entry in the changelog parquet Changes to the parquet crate

Comments

@gruuya
Copy link
Contributor

gruuya commented Dec 17, 2024

Is your feature request related to a problem or challenge? Please describe what you are trying to do.
The present matching logic for overriding a Parquet schema doesn't support Integer up-casting

fn apply_hint(parquet: DataType, hint: DataType) -> DataType {
match (&parquet, &hint) {
// Not all time units can be represented as LogicalType / ConvertedType
(DataType::Int32 | DataType::Int64, DataType::Timestamp(_, _)) => hint,
(DataType::Int32, DataType::Time32(_)) => hint,
(DataType::Int64, DataType::Time64(_)) => hint,
// Date64 doesn't have a corresponding LogicalType / ConvertedType
(DataType::Int64, DataType::Date64) => hint,

Describe the solution you'd like
I'd like to be able to override any integer type as long is avoids precision loss (though it could be argued that even this is too conservative).

Describe alternatives you've considered
Apply some kind of a schema adapter/mask at a higher level, e.g. via some DataFusion extension mechanism.

Additional context
Related to apache/iceberg-rust#813.

@gruuya gruuya added the enhancement Any new improvement worthy of a entry in the changelog label Dec 17, 2024
@tustvold
Copy link
Contributor

Closing as duplicate of #6735

@tustvold tustvold closed this as not planned Won't fix, can't repro, duplicate, stale Dec 17, 2024
@alamb alamb added the parquet Changes to the parquet crate label Dec 18, 2024
@alamb
Copy link
Contributor

alamb commented Dec 18, 2024

label_issue.py automatically added labels {'parquet'} from #6892

@tustvold tustvold added the development-process Related to development process of arrow-rs label Dec 18, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
development-process Related to development process of arrow-rs enhancement Any new improvement worthy of a entry in the changelog parquet Changes to the parquet crate
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants