Inconsistent schema handling for int64 columns in Delta Table updated with pandas object type #3034
Labels
bug
Something isn't working
good first issue
Good for newcomers
help wanted
Extra attention is needed
Environment
Delta-rs version: 0.20.2 (also checked on 0.22.0)
Binding: python
Bug
What happened:
If an
int64
column (I haven’t checked other types) is specified in the Delta table schema, and this table is updated using a Pandas DataFrame where that column is ofobject
type, the underlying Parquet file will store the data asstring
. However, when querying the schema, it will showint64
, and the data returned will also be ofint64
type.In this case, there seems to be an inconsistency. Pl look at MRE.
What you expected to happen:
I expect it to:
bool
andstring
):DeltaError: Generic DeltaTable error: type_coercion
; orHow to reproduce it:
Output:
More details:
If one tries to scan such a delta table with polars>=1.13.0, they will see a SchemaError
The text was updated successfully, but these errors were encountered: