Skip to content

Commit

Permalink
Add skewness Spark agg function (#7513)
Browse files Browse the repository at this point in the history
Summary:
There are some inconsistencies between the skewness calculations in Spark and
Presto. In Presto, the skewness calculation requires `count >= 3` to produce
a result, whereas in Spark, `count >= 1` is required. Additionally, Spark
also has a requirement for `m2 != 0`.

Therefore, it is necessary to move `CentralMomentsAggregates` to the
`functions/lib` directory for reuse by both Spark and Presto. Spark and
Presto can then implement their own respective `SkewnessResultAccessor`.

Spark skewness:
https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/CentralMomentAgg.scala#L291-L309

In addition, the algorithm for calculating kurtosis in Spark is different
from Presto, so currently they cannot be reused. However, there are plans to
continue working on adapting it in the future.

Pull Request resolved: #7513

Reviewed By: pedroerp

Differential Revision: D54699558

Pulled By: Yuhta

fbshipit-source-id: 1e9cbaecabd59d98b706d9a7de1c7bb747cbd9d4
  • Loading branch information
liujiayi771 authored and facebook-github-bot committed Mar 11, 2024
1 parent c5d3b93 commit 17f0ed8
Show file tree
Hide file tree
Showing 12 changed files with 739 additions and 460 deletions.
6 changes: 6 additions & 0 deletions velox/docs/functions/spark/aggregate.rst
Original file line number Diff line number Diff line change
Expand Up @@ -103,6 +103,12 @@ General Aggregate Functions

Returns b

.. spark:function:: skewness(x) -> double
Returns the skewness of all input values. When the count of `x` is greater than or equal to 1,
a non-null output will be generated. When the value of `m2` in the accumulator is 0, a null
output will be generated.

.. spark:function:: sum(x) -> bigint|double|real
Returns the sum of `x`.
Expand Down
6 changes: 4 additions & 2 deletions velox/functions/lib/aggregates/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -12,8 +12,10 @@
# See the License for the specific language governing permissions and
# limitations under the License.

add_library(velox_functions_aggregates SingleValueAccumulator.cpp
AverageAggregateBase.cpp ValueSet.cpp)
add_library(
velox_functions_aggregates
AverageAggregateBase.cpp CentralMomentsAggregatesBase.cpp
SingleValueAccumulator.cpp ValueSet.cpp)

target_link_libraries(velox_functions_aggregates velox_exec
velox_presto_serializer Folly::folly)
Expand Down
52 changes: 52 additions & 0 deletions velox/functions/lib/aggregates/CentralMomentsAggregatesBase.cpp
Original file line number Diff line number Diff line change
@@ -0,0 +1,52 @@
/*
* Copyright (c) Facebook, Inc. and its affiliates.
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/

#include "velox/functions/lib/aggregates/CentralMomentsAggregatesBase.h"

namespace facebook::velox::functions::aggregate {

void checkAccumulatorRowType(
const TypePtr& type,
const std::string& errorMessage) {
VELOX_CHECK_EQ(type->kind(), TypeKind::ROW, "{}", errorMessage);
VELOX_CHECK_EQ(
type->childAt(kCentralMomentsIndices.count)->kind(),
TypeKind::BIGINT,
"{}",
errorMessage);
VELOX_CHECK_EQ(
type->childAt(kCentralMomentsIndices.m1)->kind(),
TypeKind::DOUBLE,
"{}",
errorMessage);
VELOX_CHECK_EQ(
type->childAt(kCentralMomentsIndices.m2)->kind(),
TypeKind::DOUBLE,
"{}",
errorMessage);
VELOX_CHECK_EQ(
type->childAt(kCentralMomentsIndices.m3)->kind(),
TypeKind::DOUBLE,
"{}",
errorMessage);
VELOX_CHECK_EQ(
type->childAt(kCentralMomentsIndices.m4)->kind(),
TypeKind::DOUBLE,
"{}",
errorMessage);
}

} // namespace facebook::velox::functions::aggregate
Loading

0 comments on commit 17f0ed8

Please sign in to comment.