Skip to content
This repository has been archived by the owner on Nov 16, 2023. It is now read-only.

Write page statistics to file #316

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

zsolt-haraszti
Copy link

  • Copy page stats from page headers to chunk metadata so it ends up in
    the file trailer.

  • Test case to verify presence of statistics in a generated file

* Copy page stats from page headers to chunk metadata so it ends up in
  the file trailer.

* Test case to verify presence of statistics in a generated file
@achille-roussel achille-roussel self-assigned this Aug 23, 2022
@achille-roussel achille-roussel self-requested a review August 23, 2022 18:54
@achille-roussel
Copy link

Hello @zsolt-haraszti, thanks for submitting a pull request.

The statistics are not generated unless the application opts-in to do so

statistics = c.makePageStatistics(page)

We made it optional because the latest parquet format recommends not to write page statistics as part of the page header since the page index has the same information in a more usable form. This means that in the current form, the option would also control the creation of column chunk statistics.

On a different note, I believe the change you submitted may have an issue with the correctness of statistics set on the column chunk metadata. There may be multiple pages per column chunk, so the column chunk statistics should reflect the aggregate over all pages rather than hold the statistics of the last page written to the chunk.

Let me know if you have any comments on the feedback.

@achille-roussel achille-roussel added the feature New feature or request label Aug 23, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
feature New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants