Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clarify parquet-format with respect to repeated fields across boundaries #67

Closed
asfimport opened this issue May 15, 2024 · 0 comments
Closed

Comments

@asfimport
Copy link
Collaborator

asfimport commented May 15, 2024

Several implementors have reported that the parquet spec is currently unclear as to when repeated fields can span page boundaries (aka can a logical record be split across a page and/or row group boundary)

 

Discussion on list: https://lists.apache.org/thread/rd8twnvg4bg3558r507rzpxckcxt5wdn

 

The conclusion seems to be that the records can't be split across boundaries for "v2 data pages" or if there is a page index. 

 

We should clarify the spec to make this clear

Reporter: Andrew Lamb / @alamb
Assignee: Andrew Lamb / @alamb

PRs and other links:

Note: This issue was originally created as PARQUET-2473. Please see the migration documentation for further details.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant