-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor HiveConfig #7725
Refactor HiveConfig #7725
Conversation
✅ Deploy Preview for meta-velox canceled.
|
1b6fc88
to
ff5a163
Compare
08aa508
to
8f9fd41
Compare
6dfe58d
to
a0e6ffd
Compare
This pull request was exported from Phabricator. Differential Revision: D51814038 |
4f9141b
to
9226f1d
Compare
@kewang1024 I meant HdfsFileSystem and AbfsFileSystem must also be modified similarly to GCS and S3 in this PR to be consistent.
|
This pull request was exported from Phabricator. Differential Revision: D51814038 |
9226f1d
to
555cc7d
Compare
This pull request was exported from Phabricator. Differential Revision: D51814038 |
555cc7d
to
be4ee21
Compare
This pull request was exported from Phabricator. Differential Revision: D51814038 |
be4ee21
to
9776e47
Compare
This pull request was exported from Phabricator. Differential Revision: D51814038 |
9776e47
to
0da2fab
Compare
Sure, but looks like the way it's written, it won't break because of this change, but I can enhance those in the following PR |
@kewang1024 merged this pull request in eb75367. |
Conbench analyzed the 1 benchmark run on commit There were no benchmark performance regressions. 🎉 The full Conbench report has more details. |
static constexpr const char* kFileColumnNamesReadAsLowerCase = | ||
"file-column-names-read-as-lower-case"; | ||
static constexpr const char* kFileColumnNamesReadAsLowerCaseSession = | ||
"file_column_names_read_as_lower_case"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @kewang1024
The change broke some tests in our project that relies on Velox and then took us some time to debug. Would you like to share the reason why having to rename config key kFileColumnNamesReadAsLowerCase
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi, this is an effort to separate config from session property.
And fix the config name convention to be "-" and session property name convention "_", otherwise people are using them sometime "randomly" and it has caused a lot of bugs for us. Also you can refer to the issue: #7659
int32_t HiveConfig::maxCoalescedDistanceBytes(const Config* config) { | ||
return config->get<int32_t>(kMaxCoalescedDistanceBytes, 512 << 10); | ||
bool HiveConfig::isFileColumnNamesReadAsLowerCase(const Config* session) const { | ||
if (session->isValueExists(kFileColumnNamesReadAsLowerCaseSession)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kewang1024 @majetideepak @xiaoxmeng @mbasmanova
Apart from the issue mentioned by @zhztheplayer here. There is one another issue here. We have currently set kFileColumnNamesReadAsLowerCase
to true in Gluten, but it didn't take effect. We must set kFileColumnNamesReadAsLowerCaseSession
to true in order for it to work. Based on the code analysis, it seems that if the kFileColumnNamesReadAsLowerCaseSession
parameter is not set, it will read the kFileColumnNamesReadAsLowerCase
configuration. However, it appears that it is not taking effect.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@JkSelf Would you create GitHub issue to explain the problem you are facing? We can discuss it there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
kFileColumnNamesReadAsLowerCase
should be set when a HiveConnector is created, kFileColumnNamesReadAsLowerCaseSession
should be set per query basis in the session property.
can you show me where your code is not working?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kewang1024 The related code is here. We set kFileColumnNamesReadAsLowerCase
config to true without this PR. And after this PR, we need to set kFileColumnNamesReadAsLowerCaseSession
to true and then the unit test can pass in Gluten. It seems the kFileColumnNamesReadAsLowerCase
is not longer taking effect in our case.
Impl(const Config* config) { | ||
hiveConfig_ = std::make_shared<HiveConfig>( | ||
std::make_shared<core::MemConfig>(config->values())); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
config is nullptr here.
FileSystem was created by FileHandleGenerator, but FileHandleGenerator's properties was set to nullptr in HiveConnector ctor.
To solve issue: #7659