Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clarify num-nulls handling in Statistics and ColumnIndex #449

Merged
merged 9 commits into from
Aug 23, 2024
6 changes: 5 additions & 1 deletion src/main/thrift/parquet.thrift
Original file line number Diff line number Diff line change
Expand Up @@ -257,7 +257,11 @@ struct Statistics {
*/
1: optional binary max;
2: optional binary min;
/** count of null value in the column */
/**
* count of null value in the column
mapleFU marked this conversation as resolved.
Show resolved Hide resolved
*
* Writers should write this field even if it is zero or in non-null columns.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* Writers should write this field even if it is zero or in non-null columns.
* Writers SHOULD always write this field even if it is zero (a.k.a. no null value)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We might also add the expectation to the reader implementations?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We might also add the expectation to the reader implementations?

I'd like to wait other replying the context in ML firstly

*/
3: optional i64 null_count;
/** count of distinct values occurring */
4: optional i64 distinct_count;
Expand Down