-
Notifications
You must be signed in to change notification settings - Fork 28.6k
[SPARK-52233][SQL] Fix map_zip_with for Floating Point Types #50967
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
getKeysWithIndexesBruteForce | ||
keyType match { | ||
case properEqualsType if TypeUtils.typeWithProperEquals(properEqualsType) => | ||
getKeysWithIndexesFast |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shall we simply changge getKeysWithIndexesFast
to normalize the float/double values before putting into the hashmap?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed the code now, could you take one more look?
case None => | ||
val indexes = Array[Option[Int]](None, None) | ||
indexes(z) = Some(i) | ||
hashMap.put(key, indexes) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My proposal is to simply do
val normalized = if (key == Float.NaN) {
Float.NaN
} else if (key == Double.NaN) {
Double.NaN
} else {
key
}
hashMap.put(key, indexes)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The idea is to use the same NaN instance, so that the hash map can work out of the box.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So either we abstract separate method for Float/Double or we improve getKeysWithIndexesFast in this way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we can also normalize the key before lookup?
What changes were proposed in this pull request?
Fix to
map_zip_with
expression while handling floating point numbers.Why are the changes needed?
Previously we would run
getKeysWithIndexesFast
which would use LinkedHashMap, which does not use proper equality on keys for floating point numbers. All NaNs would be treated in a different way. This PR aims to fix this behaviour.Example:
Output before:
Output after:
Does this PR introduce any user-facing change?
Yes, fixing the way expression works.
How was this patch tested?
Added tests to golden files.
Was this patch authored or co-authored using generative AI tooling?
No.