-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Support scan filter for ORC decimal reader #11067
base: main
Are you sure you want to change the base?
Conversation
✅ Deploy Preview for meta-velox canceled.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You also need to add some tests to E2EFilterTest
for decimal types
// Fill decimals before applying filter. | ||
fillDecimals(); | ||
|
||
const auto rawNulls = nullsInReadRange_ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Extract this logic to a static function or trait class so that it can be later reused in parquet as well
if constexpr (std::is_same_v<DataT, int64_t>) { | ||
processFilter(filter, rows, rawNulls); | ||
} else { | ||
VELOX_UNSUPPORTED("Unsupported filter: {}.", (int)filterKind); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This could be the case for schema evolution. Just throw VELOX_FAIL
here or VELOX_NYI
for these and add the requested type and file type information to the error message.
@Yuhta I find the E2EFilterTest relies on DWRF writer, but for now bigint-decimal will be written as int64_t, and hugeint-decimal is not supported. Do we need to support them first? velox/velox/dwio/dwrf/writer/ColumnWriter.cpp Lines 2039 to 2041 in 3a1a60a
velox/velox/dwio/dwrf/writer/ColumnWriter.cpp Line 2090 in 3a1a60a
|
@rui-mo Yes it would be ideal to support writing decimals first. Otherwise the tests are very limited and we have no confidence the new code would work correctly. |
2937bac
to
e1b7ef8
Compare
e1b7ef8
to
3117603
Compare
velox/exec/tests/TableScanTest.cpp
Outdated
|
||
auto filePath = facebook::velox::test::getDataFilePath( | ||
"velox/exec/tests", "data/short_decimal.orc"); | ||
auto split = HiveConnectorSplitBuilder(filePath) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should replace with "exec::test::HiveConnectorSplitBuilder" ?
3117603
to
edfb582
Compare
'leafCallToSubfieldFilter' converts decimal filter as Subfield filter, but
'SelectiveDecimalColumnReader::read' rejects scan spec filter. This PR supports
scan filter for ORC decimal reader.