Skip to content

Commit

Permalink
fix: use projected_table_schema for projection in DeltaSchemaAdapter
Browse files Browse the repository at this point in the history
After upgrading from deltalake 0.20.1 to 0.22.3 it looks like Parquet
column projection is  broken when using DeltaTable::scan. Instead of
scanning only the a single column, it looks like all columns are
fetched from storage.

Inspection with a debugger revelas that the adapted_projections are
wrong here:
https://github.com/apache/datafusion/blob/88f58bf929167c5c5e2250ad87caa88d4dff11e5/datafusion/core/src/datasource/physical_plan/parquet/opener.rs#L153-L159
The adapted_projections are obtained in
https://github.com/delta-io/delta-rs/blob/5b2f46b06e0eb508f932a8b39feb11b568a78a32/crates/core/src/delta_datafusion/schema_adapter.rs#L46-L60
Changing line 49 to use the projected_table_schema seems to solve the
problem.

Signed-off-by: Jonas Irgens Kylling <[email protected]>
  • Loading branch information
jkylling committed Dec 18, 2024
1 parent 99e39ca commit 35fdee8
Showing 1 changed file with 6 additions and 1 deletion.
7 changes: 6 additions & 1 deletion crates/core/src/delta_datafusion/schema_adapter.rs
Original file line number Diff line number Diff line change
Expand Up @@ -46,7 +46,12 @@ impl SchemaAdapter for DeltaSchemaAdapter {
let mut projection = Vec::with_capacity(file_schema.fields().len());

for (file_idx, file_field) in file_schema.fields.iter().enumerate() {
if self.table_schema.fields().find(file_field.name()).is_some() {
if self
.projected_table_schema
.fields()
.find(file_field.name())
.is_some()
{
projection.push(file_idx);
}
}
Expand Down

0 comments on commit 35fdee8

Please sign in to comment.