[VL] Vanilla Spark broadcast exchange + R2C is slow sometimes #5136

zhztheplayer · 2024-03-27T00:36:54Z

Backend

VL (Velox)

Bug description

This is because the code to convert vanilla Spark's hashed relation to Gluten's sometimes produced duplicated rows.

The fix will be incorporated in #5058 (at commit 96c3fc7) since it can be tested by the ACBO changes.

The text was updated successfully, but these errors were encountered:

ulysses-you · 2024-03-27T03:51:14Z

Thank you @zhztheplayer It's a good point, columnar broadcast would broadcast the origin binary data but vanilla Spark would broadcast hash relation. So I think this issue is a common case even if there is no r2c.

Is it possbile to create a new pr for this issue ?

zhztheplayer · 2024-03-27T04:37:36Z

I don't have dedicated UTs for it so it was incorporated into the other PR.

Still I can open one for it if you think it's needed: #5141.

The change was already tested so I will proceed to merge after code style check is passed if it's OK to you.

zhztheplayer · 2024-03-27T04:41:37Z

The major issue I have found is that the flatMap approach would cause UnsafeHashedRelation to produce duplicated rows in my case (TPCDS q14a with current version of ACBO)
While the map approach would cause LongHashedRelation to lose rows (TPCDS q2).

The following fix (the same with #5141) can work but I didn't dive into it deeply to find the root reason of the inconsistency (maybe related to keyIsUnique? I am not sure).

  private def reconstructRows(relation: HashedRelation): Iterator[InternalRow] = {
    // It seems that LongHashedRelation and UnsafeHashedRelation don't follow the same
    //  criteria while getting values from them.
    // Should review the internals of this part of code.
    relation match {
      case relation: LongHashedRelation if relation.keyIsUnique =>
        relation.keys().map(k => relation.getValue(k))
      case relation: LongHashedRelation if !relation.keyIsUnique =>
        relation.keys().flatMap(k => relation.get(k))
      case other => other.valuesWithKeyIndex().map(_.getValue)
    }
  }

zhztheplayer · 2024-03-27T07:30:00Z

Fixed in #5141. I assume we can close this now.

zhztheplayer added bug Something isn't working triage labels Mar 27, 2024

zhztheplayer changed the title ~~[VL] Vanilla Spark broadcast exchange + C2R is slow sometimes~~ [VL] Vanilla Spark broadcast exchange + R2C is slow sometimes Mar 27, 2024

zhztheplayer mentioned this issue Mar 27, 2024

[GLUTEN-5136][VL] Duplicated output from Spark-to-Velox broadcast relation conversion #5141

Merged

zhztheplayer closed this as completed Mar 27, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[VL] Vanilla Spark broadcast exchange + R2C is slow sometimes #5136

[VL] Vanilla Spark broadcast exchange + R2C is slow sometimes #5136

zhztheplayer commented Mar 27, 2024 •

edited

Loading

ulysses-you commented Mar 27, 2024

zhztheplayer commented Mar 27, 2024

zhztheplayer commented Mar 27, 2024 •

edited

Loading

zhztheplayer commented Mar 27, 2024

[VL] Vanilla Spark broadcast exchange + R2C is slow sometimes #5136

[VL] Vanilla Spark broadcast exchange + R2C is slow sometimes #5136

Comments

zhztheplayer commented Mar 27, 2024 • edited Loading

Backend

Bug description

ulysses-you commented Mar 27, 2024

zhztheplayer commented Mar 27, 2024

zhztheplayer commented Mar 27, 2024 • edited Loading

zhztheplayer commented Mar 27, 2024

zhztheplayer commented Mar 27, 2024 •

edited

Loading

zhztheplayer commented Mar 27, 2024 •

edited

Loading