Add support for allow-precision-loss in decimal operations (facebooki…

…ncubator#10383) Summary: Each of the decimal operation functions is registered as two functions such as `add_deny_precision_loss` and `add`. When allowing precision loss, establishing the result type of an arithmetic operation happens according to Hive behavior and SQL ANSI 2011 specification, i.e. rounding the decimal part of the result if an exact representation is not possible. Otherwise, NULL is returned in those cases, as previously. When not allowing precision loss, not rounding the decimal part. For example, | decimal(38, 7) + decimal(10, 0) result type | 1.1232154 + 1| decimal(38, 18) * decimal(38, 18)| 0.1234567891011 * 1234.1 -- | -- | -- | -- | -- allow precision loss | decimal(38, 6) | 2.123215 | decimal(38, 6) | 152.358023 deny precision loss | decimal(38, 7) | 2.1232154 | decimal(38, 36) | NULL ``` spark-sql (default)> set spark.sql.decimalOperations.allowPrecisionLoss=true; spark-sql (default)> select cast(0.1234567891011 as decimal(38, 18)) * cast(1234.1 as decimal(38, 18)); 152.358023 spark-sql (default)> set spark.sql.decimalOperations.allowPrecisionLoss=false; spark-sql (default)> select cast(0.1234567891011 as decimal(38, 18)) * cast(1234.1 as decimal(38, 18)); NULL ``` Spark implementation: https://github.com/apache/spark/blob/branch-3.5/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/arithmetic.scala#L814 Pull Request resolved: facebookincubator#10383 Reviewed By: pedroerp Differential Revision: D65612198 Pulled By: kevinwilfong fbshipit-source-id: 4910aaeb0e375dbe8817c5f3fb41185c67c6dd5b
Yuhta · Nov 12, 2024 · 3884939 · 3884939
1 parent fcce674
commit 3884939
Show file tree

Hide file tree

Showing 5 changed files with 332 additions and 68 deletions.
diff --git a/velox/docs/functions/spark/decimal.rst b/velox/docs/functions/spark/decimal.rst
@@ -2,49 +2,115 @@
 Decimal Operators
 =================
 
-When calculating the result precision and scale of arithmetic operators,
-the formulas follow Hive which is based on the SQL standard and MS SQL:
+The result precision and scale computation of arithmetic operators contains two stages.
+First stage computes precision and scale using formulas based on the SQL standard and Hive when allow-precision-loss is true.
+The result may exceed maximum allowed precision of 38.
+
+Second stage caps precision at 38 and either reduces the scale or not depending on allow-precision-loss flag.
+
+For example, addition of decimal(38, 7) and decimal(10, 0) requires precision of 39 and scale of 7.
+Since precision exceeds 38 it needs to be capped. When allow-precision-loss, precision is capped at 38 and scale is reduced by 1 to 6.
+When allow-precision-loss is false, precision is capped at 38 as well, but scale is kept at 7.
+With allow-precision-loss all additions will succeed, but accuracy (number of digits after period) of some operations will be reduced.
+Without allow-precision-loss, some additions will return NULL.
+
+For example,
+
+The following queries keep accuracy or return NULL when allow-precision-loss is false:
+
+::
+
+    select cast('1.1232154' as decimal(38, 7)) + cast('1' as decimal(10, 0)); -- 2.123215
+    select cast('9999999999999999999999999999999.2345678' as decimal(38, 7)) + cast('1' as decimal(10, 0)); -- NULL
+
+These same operations succeed when allow-precision-loss is true:
+
+::
+
+    select cast('1.1232154' as decimal(38, 7)) + cast('1' as decimal(10, 0)); -- 2.12321, lost the last digit
+    select cast('9999999999999999999999999999999.2345678' as decimal(38, 7)) + cast('1' as decimal(10, 0)); -- 10000000000000000000000000000000.234568
+
+Decimal Precision and Scale Computation Formulas
+------------------------------------------------
+
+The HiveQL behavior:
 
 https://cwiki.apache.org/confluence/download/attachments/27362075/Hive_Decimal_Precision_Scale_Support.pdf
 
-https://msdn.microsoft.com/en-us/library/ms190476.aspx
+Additionally, the computation of decimal division adapts to the allow-precision-loss flag,
+while the decimal addition, subtraction, and multiplication do not.
 
 Addition and Subtraction
-------------------------
+~~~~~~~~~~~~~~~~~~~~~~~~
 
 ::
 
 	p = max(p1 - s1, p2 - s2) + max(s1, s2) + 1
 	s = max(s1, s2)
 
 Multiplication
---------------
+~~~~~~~~~~~~~~
 
 ::
 
 	p = p1 + p2 + 1
 	s = s1 + s2
 
 Division
---------
+~~~~~~~~
+When allow-precision-loss is true:
 
 ::
 
     p = p1 - s1 + s2 + max(6, s1 + p2 + 1)
     s = max(6, s1 + p2 + 1)
 
-For above arithmetic operators, when the precision of result exceeds 38,
-caps p at 38 and reduces the scale, in order to prevent the truncation of
-the integer part of the decimals. Below formula illustrates how the result
-precision and scale are adjusted.
+When allow-precision-loss is false:
+
+::
+
+    wholeDigits = min(38, p1 - s1 + s2);
+    fractionalDigits = min(38, max(6, s1 + p2 + 1));
+    p = wholeDigits + fractionalDigits
+    s = fractionalDigits
+
+Decimal Precision and Scale Adjustment
+--------------------------------------
+
+When allow-precision-loss is true, rounds the decimal part of the result if an exact representation is not possible.
+Otherwise, returns NULL.
+Notice: some operations succeed if precision loss is allowed and return NULL if not.
+
+For example,
+
+::
+
+    select cast(0.1234567891011 as decimal(38, 18)) * cast(1234.1 as decimal(38, 18));
+    -- 152.358023 if allow-precision-loss, NULL otherwise.
+
+Below formula illustrates how the result precision and scale are adjusted.
 
 ::
 
     precision = 38
     scale = max(38 - (p - s), min(s, 6))
 
-Users experience runtime errors when the actual result cannot be represented
-with the calculated decimal type.
+When precision loss is not allowed, caps p at 38, and keeps scale as is.
+The below formula shows how the precision and scale are adjusted for decimal addition, subtraction, and multiplication.
+
+::
+
+    precision = 38
+    scale = min(38, s)
+
+Decimal division uses a different formula:
+
+::
+
+    precision = 38
+    scale = fractionalDigits - (wholeDigits + fractionalDigits - 38) / 2 - 1
+
+Returns NULL when the actual result cannot be represented with the calculated decimal type.
 
 Decimal Functions
 -----------------