Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix reading multiple complex data types exception #140

Merged
merged 1 commit into from
Jan 3, 2025

Conversation

harveyyue
Copy link
Contributor

Fix #139

Copy link
Collaborator

@Jefffrey Jefffrey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for raising this. CI failures look unrelated, should be fixed by #141 so just need to pull from main.

Could you also include a short explanation for this fix, to help me understand how it fixes the issue?

Comment on lines +162 to +169
let mut projection = Vec::with_capacity(projected_schema.fields().len());
for named_column in builder.file_metadata().root_data_type().children() {
if let Some((_table_idx, _table_field)) =
projected_schema.fields().find(named_column.name())
{
projection.push(named_column.data_type().column_index());
}
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
let mut projection = Vec::with_capacity(projected_schema.fields().len());
for named_column in builder.file_metadata().root_data_type().children() {
if let Some((_table_idx, _table_field)) =
projected_schema.fields().find(named_column.name())
{
projection.push(named_column.data_type().column_index());
}
}
let projection = builder
.file_metadata()
.root_data_type()
.children()
.iter()
.filter(|named_column| {
projected_schema
.fields()
.find(named_column.name())
.is_some()
})
.map(|named_column| named_column.data_type().column_index());

Thoughts on using a more iterator based approach? Could potentially be simplified further.

@harveyyue
Copy link
Contributor Author

harveyyue commented Dec 31, 2024

Could you also include a short explanation for this fix, to help me understand how it fixes the issue?

The elements from complex data type like map will take up indexes as below
image

@Jefffrey
Copy link
Collaborator

Could you also include a short explanation for this fix, to help me understand how it fixes the issue?

The elements from complex data type like map will take up indexes as below image

But how does this relate to the code changes you made?

@harveyyue
Copy link
Contributor Author

Could you also include a short explanation for this fix, to help me understand how it fixes the issue?

The elements from complex data type like map will take up indexes as below image

But how does this relate to the code changes you made?

We need to use origin projection to get arrow projected schema, and do the mapping to get orc NamedColumn from this arrow field name, will get the real column index.

@Jefffrey Jefffrey merged commit 5b805e7 into datafusion-contrib:main Jan 3, 2025
12 checks passed
@Jefffrey
Copy link
Collaborator

Jefffrey commented Jan 3, 2025

Thanks for this

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Reading multiple complex data types with column number mismatch exception
2 participants