Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create vParquet5 #4694

Open
joe-elliott opened this issue Feb 13, 2025 · 0 comments
Open

Create vParquet5 #4694

joe-elliott opened this issue Feb 13, 2025 · 0 comments
Labels
area/storage keepalive Label to exempt Issues / PRs from stale workflow

Comments

@joe-elliott
Copy link
Member

joe-elliott commented Feb 13, 2025

We have not yet begun on the next iteration of the vParquet format, but I would like to start collecting ideas for what we would include. The past two iterations have been focused on adding columns and features and so I propose we focus on removing columns and cleanup.

I would like to:

  • Remove "well known" columns and rely instead on dedicated columns
    • This would remove a significant amount of complexity in our read/write paths as well reduce footer size
  • Remove the ServiceStats columns in favor of a marshalled proto or json representation. We only return these on search and don't need them broken out into individual columns.
  • Drop .list.element from repeated fields column names. We added this for compatibility with some ?? tooling, but we just broke compatibility with other tooling. Personally I'd prefer the simpler names.
  • Dictionary-less dedicated columns as a place to put fields like json blobs and sql queries. Maybe split the 10 we have into 5 dict and 5 no dict?

Everything is up for discussion. Please put ideas here!

cc @stoewer, @mdisibio, @ndk

@joe-elliott joe-elliott added area/storage keepalive Label to exempt Issues / PRs from stale workflow labels Feb 13, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/storage keepalive Label to exempt Issues / PRs from stale workflow
Projects
None yet
Development

No branches or pull requests

1 participant