You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In a compaction on DataFusion, we set the statistics truncate length in the DataFusion ParquetOptions under max_statistics_size. This property is not actually used.
Steps to reproduce
Set a statistics truncate length in the table property sleeper.table.parquet.statistics.truncate.length
Ingest data that will result in statistics longer than this truncate length
Run a compaction in DataFusion
See the statistics in the file output by the compaction are not truncated, although it was in the input file
Expected behaviour
The statistics truncate length should be applied in DataFusion.
Background
DataFusion has deprecated the configuration option datafusion.execution.parquet.max_statistics_size, because it's not used:
Description
In a compaction on DataFusion, we set the statistics truncate length in the DataFusion ParquetOptions under max_statistics_size. This property is not actually used.
Steps to reproduce
sleeper.table.parquet.statistics.truncate.length
Expected behaviour
The statistics truncate length should be applied in DataFusion.
Background
DataFusion has deprecated the configuration option
datafusion.execution.parquet.max_statistics_size
, because it's not used:WriterProperties
apache/arrow-rs#6884It seems to have been replaced in the Parquet library by
statistics_truncate_length
, added here:There doesn't seem to be a way to set this in DataFusion. We've raised this issue to add an option to configure it:
statistics_truncate_length
in Parquet writer apache/datafusion#14601The text was updated successfully, but these errors were encountered: