Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: Replace crash with failure in Bridge.exportToArrow when vector is a ComplexVector wrapped in ConstantVector #11932

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

whutjs
Copy link
Contributor

@whutjs whutjs commented Dec 22, 2024

For example, when a ConstantVector(MapVector) is passed to Bridge.exportToArrow() with arrowOptions.flattenConstant=true, the velox will crash.
The reason is that with given input, the type->kind() == TypeKind::MAP is true, but the vec is a ConstantVector actually, and the result of auto& maps = *vec->asUnchecked<MapVector>(); is corrupted, as show below:
image
image

This change replaces a crash with a failure: "Flattening is only supported for scalar types".

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Dec 22, 2024
Copy link

netlify bot commented Dec 22, 2024

Deploy Preview for meta-velox canceled.

Name Link
🔨 Latest commit 63eb777
🔍 Latest deploy log https://app.netlify.com/sites/meta-velox/deploys/6767accbf051c90008a29601

@whutjs whutjs force-pushed the fix_crash_in_arrow_bridge branch from d052924 to 062a1b6 Compare December 22, 2024 05:53
@whutjs whutjs changed the title Fix crash in Bridge.exportToArrow when vector is a ComplexVector wrapped in ConstantVector fix: crash in Bridge.exportToArrow when vector is a ComplexVector wrapped in ConstantVector Dec 22, 2024
@whutjs whutjs changed the title fix: crash in Bridge.exportToArrow when vector is a ComplexVector wrapped in ConstantVector fix: Crash in Bridge.exportToArrow when vector is a ComplexVector wrapped in ConstantVector Dec 22, 2024
…in a ConstantVector and ArrowOptions.flattenConstant is true
@whutjs whutjs force-pushed the fix_crash_in_arrow_bridge branch from 062a1b6 to 63eb777 Compare December 22, 2024 06:08
@whutjs
Copy link
Contributor Author

whutjs commented Dec 22, 2024

@pedroerp Would you please take a look at this? Thanks.

@whutjs
Copy link
Contributor Author

whutjs commented Dec 24, 2024

@mbasmanova Would you mind to take a look at this? Thanks

@mbasmanova mbasmanova requested review from Yuhta and xiaoxmeng January 2, 2025 14:09
if (vec->encoding() == VectorEncoding::Simple::CONSTANT &&
options.flattenConstant) {
VELOX_CHECK(
vec->isScalar(), "Flattening is only supported for scalar types.");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this limitation? Can we not implement flattening for all types?

CC: @Yuhta @pedroerp

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mbasmanova Thanks for your review. I don't know the reason too, but it seems this limitation is already there for 2 years:

VELOX_CHECK(vec.isScalar(), "Flattening is only supported for scalar types.");

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@whutjs Would you like to look into what would it take to support constant vectors? If not, please, update PR description to clarify that the change replaces a crash with a failure. Otherwise, it is not clear that constant vectors continue to be not supported after this change.

Copy link
Contributor

@Yuhta Yuhta Jan 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mbasmanova The idea is that for complex types we should not even try to flatten it, we should implement the proper constant/dictionary encoding in arrow

The whole flatten thing in arrow conversion is a temporary hack added to support Parquet writer; it should not be used outside parquet writer.

Copy link
Contributor Author

@whutjs whutjs Jan 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mbasmanova I agree with @Yuhta, so I prefer to just change the PR description.
It is not quite reasonable to flatten a map vector with constant encoding. In fact, in my opinion, I don't think a map vector with constant encoding is actually useful in production environment.
I ran into this bug because of this line of code:

result = BaseVector::createNullConstant(type, rows.end(), context.pool());

image
In our scenario, the type is a map type and the result is expected to be a map vector. And when there is no result yet, EvalCtx will create a map vector with constant encoding. We don't want this kind of behavior, so we not only fix the crash in Bridge.cpp, but also modify the EvalCtx::addNulls method internally in this way:

// static
void EvalCtx::addNulls(
    const SelectivityVector& rows,
    const uint64_t* rawNulls,
    EvalCtx& context,
    const TypePtr& type,
    VectorPtr& result) {
  // If there's no `result` yet, return a NULL ContantVector.
  if (!result) {
    if (type->isPrimitiveType()) {
      // Only wrap primitive type in a ConstantVector
      result = BaseVector::createNullConstant(type, rows.end(), context.pool());
    } else {
      // Create an empty complex type vector
      result = BaseVector::create(type, rows.end(), context.pool());
      result->addNulls(rawNulls, rows);
    }
    return;
  }
....
}

I am not sure whether the velox community would accept this changes, so I just open this PR to fix the crash problem.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Yuhta We have to flatten the constant vector in our scenario, because the Java library doesn't support run-length encoding yet, and velox will try to export the constant encoding vector to a run-length encoding in arrow. So we flatten the constant vector to work around this issue: apache/arrow#44065

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@whutjs Thank you for explaining. Would you update comments for exportToArrow method in .h file to describe this limitation? Let's also add tests for arrays and structs.

Please, create a GitHub issue to describe the crash and provide the debugging information that's currently in the PR description. Then, update PR description to (1) replace detailed problem description with a reference to the issue; (2) include a description of the actual change.

Thanks.

@whutjs whutjs changed the title fix: Crash in Bridge.exportToArrow when vector is a ComplexVector wrapped in ConstantVector fix: Replace crash with failure in Bridge.exportToArrow when vector is a ComplexVector wrapped in ConstantVector Jan 8, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants