Spark needs micro precision in the 'in' function #11755

rui-mo · 2024-12-05T03:54:36Z

Bug description

~~The InPredicate is registered by Spark sql and might need to provide configurable precision to adapt to both Presto and Spark.~~

velox/velox/functions/prestosql/InPredicate.cpp

Line 161 in 8b4663d

values.emplace_back(simpleValues->valueAt(i).toMillis());

In Velox, the Spark 'in' implementation has nanoseconds precision. In vanilla Spark, the timestamp precision is microseconds.

velox/velox/functions/sparksql/In.cpp

Line 78 in 1bd480e

struct InFunctionOuter {

System information

Velox System Info v0.0.2
Commit: edfb582
CMake Version: 3.28.3
System: Linux-5.4.0-200-generic
Arch: x86_64
C++ Compiler: /usr/bin/c++
C++ Compiler Version: 11.1.0
C Compiler: /usr/bin/cc
C Compiler Version: 11.1.0
CMake Prefix Path: /usr/local;/usr;/;/usr/local/lib/python3.8/dist-packages/cmake/data;/usr/local;/usr/X11R6;/usr/pkg;/opt

Relevant logs

No response

The text was updated successfully, but these errors were encountered:

zuyu · 2024-12-06T18:49:14Z

Probably toPrecision from #11719 would help.

rui-mo · 2024-12-10T08:59:11Z

Is the issue here that the precision of a Timestamp in Spark is in microseconds, and converting it to milliseconds for comparison would result in a loss of precision? For example, Timestamp(1, 999000) and Timestamp(1, 998000) would be considered the same?

@liujiayi771 I took further look and found Spark had its specific implementation. Fixing it in #11812 to adapt to Spark's micro precision.

liujiayi771 · 2024-12-10T09:05:09Z

@rui-mo Do we still need to adapt Spark's micro in Iceberg equality deletes reader?

rui-mo · 2024-12-10T13:31:01Z

@rui-mo Do we still need to adapt Spark's micro in Iceberg equality deletes reader?

@liujiayi771 I think so, because if milli precision is used, the micros are lost and timestamps with the same millis but different micros will be treated the same.

rui-mo added bug Something isn't working triage Newly created issue that needs attention. labels Dec 5, 2024

rui-mo mentioned this issue Dec 5, 2024

feat(connector): Support reading Iceberg split with equality deletes #11088

Open

rui-mo changed the title ~~Spark needs micro precision in the InPredicate~~ Spark needs micro precision in the 'in' function Dec 10, 2024

rui-mo linked a pull request Dec 10, 2024 that will close this issue

fix(function): Fix the timestamp precision of Spark 'in' function #11812

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Spark needs micro precision in the 'in' function #11755

Spark needs micro precision in the 'in' function #11755

rui-mo commented Dec 5, 2024 •

edited

Loading

zuyu commented Dec 6, 2024

rui-mo commented Dec 10, 2024 •

edited

Loading

liujiayi771 commented Dec 10, 2024

rui-mo commented Dec 10, 2024

Spark needs micro precision in the 'in' function #11755

Spark needs micro precision in the 'in' function #11755

Comments

rui-mo commented Dec 5, 2024 • edited Loading

Bug description

System information

Relevant logs

zuyu commented Dec 6, 2024

rui-mo commented Dec 10, 2024 • edited Loading

liujiayi771 commented Dec 10, 2024

rui-mo commented Dec 10, 2024

rui-mo commented Dec 5, 2024 •

edited

Loading

rui-mo commented Dec 10, 2024 •

edited

Loading