-
Notifications
You must be signed in to change notification settings - Fork 447
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[VL] Vanilla Spark broadcast exchange + R2C is slow sometimes #5136
Comments
Thank you @zhztheplayer It's a good point, columnar broadcast would broadcast the origin binary data but vanilla Spark would broadcast hash relation. So I think this issue is a common case even if there is no r2c. Is it possbile to create a new pr for this issue ? |
I don't have dedicated UTs for it so it was incorporated into the other PR. Still I can open one for it if you think it's needed: #5141. The change was already tested so I will proceed to merge after code style check is passed if it's OK to you. |
The major issue I have found is that the The following fix (the same with #5141) can work but I didn't dive into it deeply to find the root reason of the inconsistency (maybe related to private def reconstructRows(relation: HashedRelation): Iterator[InternalRow] = {
// It seems that LongHashedRelation and UnsafeHashedRelation don't follow the same
// criteria while getting values from them.
// Should review the internals of this part of code.
relation match {
case relation: LongHashedRelation if relation.keyIsUnique =>
relation.keys().map(k => relation.getValue(k))
case relation: LongHashedRelation if !relation.keyIsUnique =>
relation.keys().flatMap(k => relation.get(k))
case other => other.valuesWithKeyIndex().map(_.getValue)
}
} |
Fixed in #5141. I assume we can close this now. |
Backend
VL (Velox)
Bug description
This is because the code to convert vanilla Spark's hashed relation to Gluten's sometimes produced duplicated rows.
The fix will be incorporated in #5058 (at commit 96c3fc7) since it can be tested by the ACBO changes.
The text was updated successfully, but these errors were encountered: