Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

aggreate with varbinary type fail with #385

Closed
gaoyangxiaozhu opened this issue Aug 7, 2023 · 1 comment
Closed

aggreate with varbinary type fail with #385

gaoyangxiaozhu opened this issue Aug 7, 2023 · 1 comment
Labels
bug Something isn't working

Comments

@gaoyangxiaozhu
Copy link

gaoyangxiaozhu commented Aug 7, 2023

Bug description

When run below spark query which has aggregate based on varbianry type. The sql will fail with

Caused by: java.lang.RuntimeException: Exception: VeloxRuntimeError
Error Source: RUNTIME
Error Code: INVALID_STATE
Reason: Unknown input type for min aggregation VARBINARY
Retriable: False
Expression: false
Function: operator()
File: /home/gayangya/Work/Git/OSS/velox/velox/functions/prestosql/aggregates/MinMaxAggregates.cpp
Line: 516
Stack trace:

import org.apache.spark.sql.Row
import org.apache.spark.sql.types.{BinaryType, BooleanType, ByteType, DateType, Decimal, DecimalType, DoubleType, FloatType, IntegerType, LongType, ShortType, StringType, StructField, StructType, TimestampType}

import java.sql.{Date, Timestamp}

implicit class StringToDate(s: String) {
  def date: Date = Date.valueOf(s)
}

implicit class StringToTs(s: String) {
  def ts: Timestamp = Timestamp.valueOf(s)
}

val rows =
  Seq(
    Row(
      "sparkSQL",
      "Spark SQL".getBytes),
    Row(
      "parquet",
      "Parquet".getBytes),
    Row(
      "sparkML",
      "SparkML".getBytes)
  )

val schema = StructType(List(
  StructField("StringCol", StringType, true),
  StructField("BinaryCol", BinaryType, false)).toArray)

val rdd = sc.parallelize(rows)

spark.createDataFrame(rdd, schema).write.format("parquet").save("/tmp/spark1/datatest1")

spark.read.format("parquet").load("/tmp/spark1/datatest1").createOrReplaceTempView("test")
val df = sql("SELECT min(BinaryCol) FROM test")
df.collect

Expected behavior:

It run spark sql successfully.

System information

ubuntu 20.04

Relevant logs

Caused by: java.lang.RuntimeException: Exception: VeloxRuntimeError
Error Source: RUNTIME
Error Code: INVALID_STATE
Reason: Unknown input type for min aggregation VARBINARY
Retriable: False
Expression: false
Function: operator()
File: /home/gayangya/Work/Git/OSS/velox/velox/functions/prestosql/aggregates/MinMaxAggregates.cpp
Line: 516
Stack trace:

This is also causes glutet ut aggregate push down - different data types of GlutenParquetV2AggregatePushDownSuite fail which traced by another issue apache/incubator-gluten#2169

@gaoyangxiaozhu
Copy link
Author

close with PR merged

marin-ma pushed a commit to marin-ma/velox-oap that referenced this issue Dec 15, 2023
* Minor: Reorganize Scala plan validation code
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant