Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Move parquet-specific options to parquet::WriterOptions #10470

Closed
wants to merge 1 commit into from

Conversation

pedroerp
Copy link
Contributor

Summary:
Moving parquet-specific options to the new parquet WriterOptions
polymorphic type to remove file format specific configuration code in the
general Hive connector.

Differential Revision: D59710079

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Jul 15, 2024
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D59710079

Copy link

netlify bot commented Jul 15, 2024

Deploy Preview for meta-velox canceled.

Name Link
🔨 Latest commit 6f351a6
🔍 Latest deploy log https://app.netlify.com/sites/meta-velox/deploys/669b2641cb6b61000816c314

std::optional<uint8_t> zlibCompressionLevel;
std::optional<uint8_t> zstdCompressionLevel;

// Writer implementations should provide this function to specify how to
// process writer-specific input session and connector configs.
virtual void processUserConfigs(const Config*) = 0;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This needs to be concrete and we need to translate the format agnostic configurations here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why does it need to be concrete?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because it needs to translate format-agnostic options, and the subclasses will handle the format-specific ones

Copy link
Collaborator

@majetideepak majetideepak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pedroerp thanks for taking this up. I have been wanting something like this for a while now. I have one comment.

@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D59710079

pedroerp added a commit to pedroerp/velox-1 that referenced this pull request Jul 19, 2024
…bator#10470)

Summary:
Pull Request resolved: facebookincubator#10470

Moving parquet-specific options to the new parquet WriterOptions
polymorphic type to remove file format specific configuration code in the
general Hive connector.

Differential Revision: D59710079
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D59710079

pedroerp added a commit to pedroerp/velox-1 that referenced this pull request Jul 19, 2024
…bator#10470)

Summary:
Pull Request resolved: facebookincubator#10470

Moving parquet-specific options to the new parquet WriterOptions
polymorphic type to remove file format specific configuration code in the
general Hive connector.

Differential Revision: D59710079
void WriterOptions::processSessionConfigs(const Config& config) {
if (!parquetWriteTimestampUnit) {
parquetWriteTimestampUnit =
getTimestampUnit(config, "hive.parquet.writer.timestamp_unit");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we have "hive.parquet.writer.timestamp_unit" and "hive.parquet.writer.timestamp-unit" as static constexpr const char* in parquet WriterOptions?

@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D59710079

pedroerp added a commit to pedroerp/velox-1 that referenced this pull request Jul 19, 2024
…bator#10470)

Summary:
Pull Request resolved: facebookincubator#10470

Moving parquet-specific options to the new parquet WriterOptions
polymorphic type to remove file format specific configuration code in the
general Hive connector.

Reviewed By: Yuhta

Differential Revision: D59710079
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D59710079

pedroerp added a commit to pedroerp/velox-1 that referenced this pull request Jul 19, 2024
…bator#10470)

Summary:
Pull Request resolved: facebookincubator#10470

Moving parquet-specific options to the new parquet WriterOptions
polymorphic type to remove file format specific configuration code in the
general Hive connector.

Reviewed By: Yuhta

Differential Revision: D59710079
…bator#10470)

Summary:
Pull Request resolved: facebookincubator#10470

Moving parquet-specific options to the new parquet WriterOptions
polymorphic type to remove file format specific configuration code in the
general Hive connector.

Reviewed By: Yuhta

Differential Revision: D59710079
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D59710079

Copy link
Collaborator

@yingsu00 yingsu00 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @pedroerp
We were trying to do similar things in #10150. We will rebase after this one get merged.

@facebook-github-bot
Copy link
Contributor

This pull request has been merged in 8156d7d.

Copy link

Conbench analyzed the 1 benchmark run on commit 8156d7d4.

There were no benchmark performance regressions. 🎉

The full Conbench report has more details.

@majetideepak
Copy link
Collaborator

@pedroerp Multiple CI jobs are failing with this merge.

@pedroerp
Copy link
Contributor Author

@pedroerp Multiple CI jobs are failing with this merge.

Ughh, looking at it now.

pedroerp added a commit to pedroerp/velox-1 that referenced this pull request Jul 22, 2024
Summary:
Fixing Parquet writer test build broken in
facebookincubator#10470

Differential Revision: D60066916
facebook-github-bot pushed a commit that referenced this pull request Jul 23, 2024
Summary:
Pull Request resolved: #10520

Fixing Parquet writer test build broken in
#10470

Reviewed By: Yuhta

Differential Revision: D60066916

fbshipit-source-id: 65c5d60373c60596db173fccde229261f99cbc10
// through insertTableHandle)
// 2. Otherwise, acquire user defined session properties.
// 3. Lastly, acquire general hive connector configs.
options->processSessionConfigs(*connectorSessionProperties);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we let Parquet writer to convert the common options to Parquet specific settings as dwrf does? Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. fb-exported Merged
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants