Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor HiveConfig #7725

Closed
wants to merge 1 commit into from
Closed

Refactor HiveConfig #7725

wants to merge 1 commit into from

Conversation

kewang1024
Copy link
Contributor

@kewang1024 kewang1024 commented Nov 25, 2023

To solve issue: #7659

Copy link

netlify bot commented Nov 25, 2023

Deploy Preview for meta-velox canceled.

Name Link
🔨 Latest commit 0da2fab
🔍 Latest deploy log https://app.netlify.com/sites/meta-velox/deploys/65762512985f830008aac123

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Nov 25, 2023
@kewang1024 kewang1024 force-pushed the config-test branch 4 times, most recently from 1b6fc88 to ff5a163 Compare November 27, 2023 18:52
@kewang1024 kewang1024 changed the title Config test Refactor HiveConfig Nov 27, 2023
@kewang1024 kewang1024 force-pushed the config-test branch 15 times, most recently from 08aa508 to 8f9fd41 Compare November 28, 2023 08:30
@kewang1024 kewang1024 marked this pull request as ready for review November 28, 2023 08:31
@kewang1024 kewang1024 force-pushed the config-test branch 4 times, most recently from 6dfe58d to a0e6ffd Compare November 28, 2023 22:01
velox/common/file/FileSystems.h Outdated Show resolved Hide resolved
velox/connectors/Connector.h Outdated Show resolved Hide resolved
velox/connectors/Connector.h Outdated Show resolved Hide resolved
velox/connectors/Connector.h Outdated Show resolved Hide resolved
velox/connectors/fuzzer/FuzzerConnector.h Outdated Show resolved Hide resolved
velox/connectors/hive/HiveConnector.h Outdated Show resolved Hide resolved
velox/connectors/tpch/TpchConnector.h Outdated Show resolved Hide resolved
velox/core/QueryCtx.h Outdated Show resolved Hide resolved
velox/connectors/hive/HiveConfig.h Outdated Show resolved Hide resolved
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D51814038

kewang1024 added a commit that referenced this pull request Dec 9, 2023
Summary:
To solve issue: #7659

Pull Request resolved: #7725

Reviewed By: xiaoxmeng

Differential Revision: D51814038

Pulled By: kewang1024

fbshipit-source-id: 9fd8cef21fbb98e594bb0ff05eb7059816cc2b79
@majetideepak
Copy link
Collaborator

majetideepak commented Dec 9, 2023

@majetideepak Can you elaborate?

@kewang1024 I meant HdfsFileSystem and AbfsFileSystem must also be modified similarly to GCS and S3 in this PR to be consistent.

hiveConfig_ = std::make_shared<HiveConfig>(
        std::make_shared<core::MemConfig>(config->values()));

@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D51814038

kewang1024 added a commit that referenced this pull request Dec 9, 2023
Summary:
To solve issue: #7659

Pull Request resolved: #7725

Reviewed By: xiaoxmeng

Differential Revision: D51814038

Pulled By: kewang1024

fbshipit-source-id: bc72ddeaea644b70705edd7bdb0e194c11e79429
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D51814038

kewang1024 added a commit that referenced this pull request Dec 10, 2023
Summary:
To solve issue: #7659

Pull Request resolved: #7725

Reviewed By: xiaoxmeng

Differential Revision: D51814038

Pulled By: kewang1024

fbshipit-source-id: 30e3ae15ad41bd3162fb4610b0a5ad4c99c64dda
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D51814038

kewang1024 added a commit that referenced this pull request Dec 10, 2023
Summary:
To solve issue: #7659

Pull Request resolved: #7725

Reviewed By: xiaoxmeng

Differential Revision: D51814038

Pulled By: kewang1024

fbshipit-source-id: 713733c762dfa549d77ec5cb90077b2672dd5219
Summary:
To solve issue: #7659

Pull Request resolved: #7725

Reviewed By: xiaoxmeng

Differential Revision: D51814038

Pulled By: kewang1024

fbshipit-source-id: 7ffd4a84738303aa5264676fa76dd87e752a30f5
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D51814038

@kewang1024
Copy link
Contributor Author

kewang1024 commented Dec 11, 2023

@majetideepak Can you elaborate?

@kewang1024 I meant HdfsFileSystem and AbfsFileSystem must also be modified similarly to GCS and S3 in this PR to be consistent.

hiveConfig_ = std::make_shared<HiveConfig>(
        std::make_shared<core::MemConfig>(config->values()));

Sure, but looks like the way it's written, it won't break because of this change, but I can enhance those in the following PR

@facebook-github-bot
Copy link
Contributor

@kewang1024 merged this pull request in eb75367.

Copy link

Conbench analyzed the 1 benchmark run on commit eb75367c.

There were no benchmark performance regressions. 🎉

The full Conbench report has more details.

Comment on lines 106 to 109
static constexpr const char* kFileColumnNamesReadAsLowerCase =
"file-column-names-read-as-lower-case";
static constexpr const char* kFileColumnNamesReadAsLowerCaseSession =
"file_column_names_read_as_lower_case";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @kewang1024

The change broke some tests in our project that relies on Velox and then took us some time to debug. Would you like to share the reason why having to rename config key kFileColumnNamesReadAsLowerCase ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, this is an effort to separate config from session property.
And fix the config name convention to be "-" and session property name convention "_", otherwise people are using them sometime "randomly" and it has caused a lot of bugs for us. Also you can refer to the issue: #7659

int32_t HiveConfig::maxCoalescedDistanceBytes(const Config* config) {
return config->get<int32_t>(kMaxCoalescedDistanceBytes, 512 << 10);
bool HiveConfig::isFileColumnNamesReadAsLowerCase(const Config* session) const {
if (session->isValueExists(kFileColumnNamesReadAsLowerCaseSession)) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kewang1024 @majetideepak @xiaoxmeng @mbasmanova

Apart from the issue mentioned by @zhztheplayer here. There is one another issue here. We have currently set kFileColumnNamesReadAsLowerCase to true in Gluten, but it didn't take effect. We must set kFileColumnNamesReadAsLowerCaseSession to true in order for it to work. Based on the code analysis, it seems that if the kFileColumnNamesReadAsLowerCaseSession parameter is not set, it will read the kFileColumnNamesReadAsLowerCase configuration. However, it appears that it is not taking effect.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@JkSelf Would you create GitHub issue to explain the problem you are facing? We can discuss it there.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

kFileColumnNamesReadAsLowerCase should be set when a HiveConnector is created, kFileColumnNamesReadAsLowerCaseSession should be set per query basis in the session property.

can you show me where your code is not working?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kewang1024 The related code is here. We set kFileColumnNamesReadAsLowerCase config to true without this PR. And after this PR, we need to set kFileColumnNamesReadAsLowerCaseSession to true and then the unit test can pass in Gluten. It seems the kFileColumnNamesReadAsLowerCase is not longer taking effect in our case.

Comment on lines +511 to +513
Impl(const Config* config) {
hiveConfig_ = std::make_shared<HiveConfig>(
std::make_shared<core::MemConfig>(config->values()));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

config is nullptr here.

FileSystem was created by FileHandleGenerator, but FileHandleGenerator's properties was set to nullptr in HiveConnector ctor.

facebook-github-bot pushed a commit that referenced this pull request Dec 14, 2023
Summary:
A recent refactor #7725 missed passing the connector config
to the FileHandleGenerator in the HiveConnector.

Pull Request resolved: #8038

Reviewed By: xiaoxmeng

Differential Revision: D52154732

Pulled By: mbasmanova

fbshipit-source-id: df6c0edcc5fb169877f9795f6a17eec2c52c1dd2
@kewang1024 kewang1024 deleted the config-test branch December 15, 2023 17:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. fb-exported Merged
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants