Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix checksum Presto aggregate function #6910

Closed
wants to merge 1 commit into from

Conversation

mbasmanova
Copy link
Contributor

PrestoHasher used to ignore null flags for structs producing incorrect hashes.

Fixes #6715 and #6449

@netlify
Copy link

netlify bot commented Oct 5, 2023

Deploy Preview for meta-velox canceled.

Name Link
🔨 Latest commit b4f66e6
🔍 Latest deploy log https://app.netlify.com/sites/meta-velox/deploys/651f084da04a5f0008d36563

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Oct 5, 2023
@facebook-github-bot
Copy link
Contributor

@mbasmanova has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

Copy link
Contributor

@kgpai kgpai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM - I have a corner case, but I am not sure if its a valid case.

AlignedBuffer::allocate<int64_t>(elementRows.end(), baseRow->pool());
auto* rawCombinedChildHashes = combinedChildHashes->asMutable<int64_t>();
std::fill_n(rawCombinedChildHashes, rows.end(), 1);

std::fill_n(rawHashes, rows.end(), 1);

for (int i = 0; i < baseRow->childrenSize(); i++) {
children_[i]->hash(baseRow->childAt(i), elementRows, childHashes);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we reset childHashes , before computing for every child.
Can we have a child vector that has the null bit set at a point but the parent doesnt have the null bit set, in that case we will not compute hash at that point for child and reuse and older hash value which might be wrong.

Let me know if this is not possible.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The expectation is that hash() function will compute hashes for all rows requested regardless whether they are null or not.

row->setNull(1, true);
assertChecksum(row, "DpXXC2Pzbjw=");

row = makeRowVector({
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: can we add a nested row test case ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added.

@facebook-github-bot
Copy link
Contributor

@mbasmanova has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Contributor

@mbasmanova merged this pull request in f747c38.

@conbench-facebook
Copy link

Conbench analyzed the 1 benchmark run on commit f747c384.

There were no benchmark performance regressions. 🎉

The full Conbench report has more details.

ericyuliu pushed a commit to ericyuliu/velox that referenced this pull request Oct 12, 2023
Summary:
PrestoHasher used to ignore null flags for structs producing incorrect hashes.

Fixes facebookincubator#6715 and facebookincubator#6449

Pull Request resolved: facebookincubator#6910

Reviewed By: kgpai

Differential Revision: D49953433

Pulled By: mbasmanova

fbshipit-source-id: 828476371bc649af7b1bd29d80a072cd4bab739a
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. Merged
Projects
None yet
Development

Successfully merging this pull request may close these issues.

checksum(ROW) doesn't handle null values correctly
3 participants