Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[VL] table scan fallback gets wrong query plan #6612

Closed
FelixYBW opened this issue Jul 27, 2024 · 2 comments · Fixed by #6627
Closed

[VL] table scan fallback gets wrong query plan #6612

FelixYBW opened this issue Jul 27, 2024 · 2 comments · Fixed by #6627
Assignees
Labels
bug Something isn't working triage

Comments

@FelixYBW
Copy link
Contributor

Backend

VL (Velox)

Bug description

A query has parquet scan with complex data type fallbacked.
If I set spark.sql.parquet.enableVectorizedReader = False and run the query twice, the first one is correct, but the second one wrongly add ColumnartoRow, caused error UnsafeRow cannot be cast to org.apache.spark.sql.vectorized.ColumnarBatch

First query plan:

+- ^ FilterExecTransformer (5)
   +- ^ InputIteratorTransformer (4)
      +- RowToVeloxColumnar (2)
         +- Scan parquet  (1)

The second query plan:

+- ^ InputIteratorTransformer (5)
   +- RowToVeloxColumnar (3)
      +- * ColumnarToRow (2)
          +- Scan parquet  (1)

Similarly if I set spark.sql.parquet.enableVectorizedReader = True, the first run reports error and the second run successed.

Spark version

Spark-3.2.x

Spark configurations

No response

System information

No response

Relevant logs

No response

@FelixYBW FelixYBW added bug Something isn't working triage labels Jul 27, 2024
@FelixYBW
Copy link
Contributor Author

So there are two issues:

  1. wrongly recognized output of vectorized parquet scan as row format
  2. some variables isn't reset after a query

@FelixYBW
Copy link
Contributor Author

FelixYBW commented Jul 29, 2024

  1. because customer backport the vectorized parquet scan from spark3.3. ParquetFileFormat.scala in shimlayer needs to be ported.
  2. The same issue as [GLUTEN-6151] Reset local property after finishing write operator #6163. isNativeAppliable is set during first query. the local property is used by parquet scan as well. 6163 only reset parquet write.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working triage
Projects
None yet
2 participants