-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: Replace crash with failure in Bridge.exportToArrow when vector is a ComplexVector wrapped in ConstantVector #11932
base: main
Are you sure you want to change the base?
Conversation
✅ Deploy Preview for meta-velox canceled.
|
d052924
to
062a1b6
Compare
…in a ConstantVector and ArrowOptions.flattenConstant is true
062a1b6
to
63eb777
Compare
@pedroerp Would you please take a look at this? Thanks. |
@mbasmanova Would you mind to take a look at this? Thanks |
if (vec->encoding() == VectorEncoding::Simple::CONSTANT && | ||
options.flattenConstant) { | ||
VELOX_CHECK( | ||
vec->isScalar(), "Flattening is only supported for scalar types."); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mbasmanova Thanks for your review. I don't know the reason too, but it seems this limitation is already there for 2 years:
velox/velox/vector/arrow/Bridge.cpp
Line 1088 in 6171b52
VELOX_CHECK(vec.isScalar(), "Flattening is only supported for scalar types."); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@whutjs Would you like to look into what would it take to support constant vectors? If not, please, update PR description to clarify that the change replaces a crash with a failure. Otherwise, it is not clear that constant vectors continue to be not supported after this change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mbasmanova The idea is that for complex types we should not even try to flatten it, we should implement the proper constant/dictionary encoding in arrow
The whole flatten thing in arrow conversion is a temporary hack added to support Parquet writer; it should not be used outside parquet writer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mbasmanova I agree with @Yuhta, so I prefer to just change the PR description.
It is not quite reasonable to flatten a map vector with constant encoding. In fact, in my opinion, I don't think a map vector with constant encoding is actually useful in production environment.
I ran into this bug because of this line of code:
velox/velox/expression/EvalCtx.cpp
Line 372 in 0c61b5e
result = BaseVector::createNullConstant(type, rows.end(), context.pool()); |
In our scenario, the
type
is a map type and the result
is expected to be a map vector. And when there is no result yet, EvalCtx
will create a map vector with constant encoding. We don't want this kind of behavior, so we not only fix the crash in Bridge.cpp
, but also modify the EvalCtx::addNulls
method internally in this way:
// static
void EvalCtx::addNulls(
const SelectivityVector& rows,
const uint64_t* rawNulls,
EvalCtx& context,
const TypePtr& type,
VectorPtr& result) {
// If there's no `result` yet, return a NULL ContantVector.
if (!result) {
if (type->isPrimitiveType()) {
// Only wrap primitive type in a ConstantVector
result = BaseVector::createNullConstant(type, rows.end(), context.pool());
} else {
// Create an empty complex type vector
result = BaseVector::create(type, rows.end(), context.pool());
result->addNulls(rawNulls, rows);
}
return;
}
....
}
I am not sure whether the velox community would accept this changes, so I just open this PR to fix the crash problem.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Yuhta We have to flatten the constant vector in our scenario, because the Java library doesn't support run-length encoding yet, and velox will try to export the constant encoding vector to a run-length encoding in arrow. So we flatten the constant vector to work around this issue: apache/arrow#44065
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@whutjs Thank you for explaining. Would you update comments for exportToArrow method in .h file to describe this limitation? Let's also add tests for arrays and structs.
Please, create a GitHub issue to describe the crash and provide the debugging information that's currently in the PR description. Then, update PR description to (1) replace detailed problem description with a reference to the issue; (2) include a description of the actual change.
Thanks.
For example, when a
ConstantVector(MapVector)
is passed toBridge.exportToArrow()
witharrowOptions.flattenConstant=true
, the velox will crash.The reason is that with given input, the
type->kind() == TypeKind::MAP
is true, but thevec
is aConstantVector
actually, and the result ofauto& maps = *vec->asUnchecked<MapVector>();
is corrupted, as show below:This change replaces a crash with a failure: "Flattening is only supported for scalar types".