aggreate with varbinary type fail with #385

gaoyangxiaozhu · 2023-08-07T15:13:03Z

Bug description

When run below spark query which has aggregate based on varbianry type. The sql will fail with

Caused by: java.lang.RuntimeException: Exception: VeloxRuntimeError
Error Source: RUNTIME
Error Code: INVALID_STATE
Reason: Unknown input type for min aggregation VARBINARY
Retriable: False
Expression: false
Function: operator()
File: /home/gayangya/Work/Git/OSS/velox/velox/functions/prestosql/aggregates/MinMaxAggregates.cpp
Line: 516
Stack trace:

import org.apache.spark.sql.Row
import org.apache.spark.sql.types.{BinaryType, BooleanType, ByteType, DateType, Decimal, DecimalType, DoubleType, FloatType, IntegerType, LongType, ShortType, StringType, StructField, StructType, TimestampType}

import java.sql.{Date, Timestamp}

implicit class StringToDate(s: String) {
  def date: Date = Date.valueOf(s)
}

implicit class StringToTs(s: String) {
  def ts: Timestamp = Timestamp.valueOf(s)
}

val rows =
  Seq(
    Row(
      "sparkSQL",
      "Spark SQL".getBytes),
    Row(
      "parquet",
      "Parquet".getBytes),
    Row(
      "sparkML",
      "SparkML".getBytes)
  )

val schema = StructType(List(
  StructField("StringCol", StringType, true),
  StructField("BinaryCol", BinaryType, false)).toArray)

val rdd = sc.parallelize(rows)

spark.createDataFrame(rdd, schema).write.format("parquet").save("/tmp/spark1/datatest1")

spark.read.format("parquet").load("/tmp/spark1/datatest1").createOrReplaceTempView("test")
val df = sql("SELECT min(BinaryCol) FROM test")
df.collect

Expected behavior:

It run spark sql successfully.

System information

ubuntu 20.04

Relevant logs

Caused by: java.lang.RuntimeException: Exception: VeloxRuntimeError
Error Source: RUNTIME
Error Code: INVALID_STATE
Reason: Unknown input type for min aggregation VARBINARY
Retriable: False
Expression: false
Function: operator()
File: /home/gayangya/Work/Git/OSS/velox/velox/functions/prestosql/aggregates/MinMaxAggregates.cpp
Line: 516
Stack trace:

This is also causes glutet ut aggregate push down - different data types of GlutenParquetV2AggregatePushDownSuite fail which traced by another issue apache/incubator-gluten#2169

The text was updated successfully, but these errors were encountered:

gaoyangxiaozhu · 2023-08-10T03:17:06Z

close with PR merged

* Minor: Reorganize Scala plan validation code

gaoyangxiaozhu added the bug Something isn't working label Aug 7, 2023

gaoyangxiaozhu mentioned this issue Aug 7, 2023

[VL] [Bug fix] support min max aggregation with varbinary type #386

Merged

gaoyangxiaozhu closed this as completed Aug 10, 2023

marin-ma pushed a commit to marin-ma/velox-oap that referenced this issue Dec 15, 2023

[OPPRO-256] Reorganize Scala plan validation codes (oap-project#385)

8ab9ff3

* Minor: Reorganize Scala plan validation code

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

aggreate with varbinary type fail with #385

aggreate with varbinary type fail with #385

gaoyangxiaozhu commented Aug 7, 2023 •

edited

Loading

gaoyangxiaozhu commented Aug 10, 2023

aggreate with varbinary type fail with #385

aggreate with varbinary type fail with #385

Comments

gaoyangxiaozhu commented Aug 7, 2023 • edited Loading

Bug description

System information

Relevant logs

gaoyangxiaozhu commented Aug 10, 2023

gaoyangxiaozhu commented Aug 7, 2023 •

edited

Loading