You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
fromdeltalakeimportDeltaTableimportpyarrow.parquetaspqif__name__=="__main__":
# read it back with delta-rsdt=DeltaTable("test.parquet")
print("\nDeltaTable schema:")
print(dt.schema().to_pyarrow().to_string())
# read it back with pyarrowtable=pq.read_table("test.parquet")
print("\nPyarrow schema:")
print(table.schema.to_string())
We don't save this metadata in Delta Tables. We could perhaps create a custom key in the configuration field of the table metadata, and preserve that in the Arrow schemas returned (which get passed onto Pandas).
Is this a limitation of Delta Tables or of the client library? I'm specifically wondering in the context of GeoParquet, which uses schema metadata to declare per-column information like geometry type. With pyarrow we'd set schema metadata on the table and then write to Parquet, but it doesn't look like that works here.
@kylebarron if you would like to do this you could hijack the configuration key in the table metadata for it. But it still requires an implementation on the client side to use it.
And it will only work for that specific cliënt
ion-elgreco
changed the title
PyArrow metadata lost in DeltaTable
Keep arrow metadata in Delta Table metadata
Dec 7, 2024
Environment
Delta-rs version:
0.10.0
Binding:
Python
Environment:
Bug
delta-rs
loses metadata for parquet written with pandas (example data is attached).test.parquet.zip
This outputs:
The
schema metadata
part inpyarrow.table
is nowhere to be found inDeltaTable
. Is it present, but not public? How can it be accessed?The text was updated successfully, but these errors were encountered: