-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Add fluent expression builder API #11414
base: main
Are you sure you want to change the base?
Conversation
This pull request was exported from Phabricator. Differential Revision: D65371064 |
✅ Deploy Preview for meta-velox canceled.
|
Cc: @mbasmanova @xiaoxmeng @majetideepak @kgpai @Yuhta @bikramSingh91 @rui-mo What do you guys think about something like this as the API? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@pedroerp this looks nice!
I wonder if it is more convenient to use f""
for fields and l*""
for literals.
eg. f"col" and lf"1.0" for literal float, ld"1.0" for literal double.
Something like the Arrow CDataInterface.
https://arrow.apache.org/docs/format/CDataInterface.html
@@ -44,7 +45,42 @@ class IExpr { | |||
return alias_; | |||
} | |||
|
|||
std::optional<std::string>& alias() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: return string_view?
velox/exec/tests/utils/PlanBuilder.h
Outdated
@@ -255,7 +252,7 @@ class PlanBuilder { | |||
} | |||
|
|||
/// @param tableHandle Optional tableHandle. Other builder arguments such as | |||
/// the subfieldFilters and remainingFilter will be ignored. | |||
/// the `subfieldFilters` and `remainingFilter` will be ignored. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: "``" is not used consistently in the comments throughout.
return head; | ||
} | ||
|
||
// Macro to reduce builerplate when overloading C++ operators. The template |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typo: boilerplate
@majetideepak thanks for the quick review.
You mean using the same arrow identifiers as custom C++ literals? That is a great idea for consistency, but I don't think C++ let's you define the literals as a prefix; if needs to be a trailing "_xx". But we could make the trailing characters align with arrow for literals. |
5acbf33
to
b2d958d
Compare
This pull request was exported from Phabricator. Differential Revision: D65371064 |
@pedroerp You are right. It has to be a suffix https://en.cppreference.com/w/cpp/language/user_literal. |
I learned from the above link that we can also have user-defined integer literals. The Arrow CDataInterface will allow us to define NULL literals as well. |
/// Conjuncts and "between": | ||
/// | ||
/// > ExprPtr e = "a"_f && "b"_f || "c"_f; // "a AND b OR c" | ||
/// > ExprPtr e = between("a"_f, 0L, 10L); // "a between 0 and 10" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What would 'in' look like?
/// execution using type binding from `core::Expressions::inferTypes()`. | ||
/// | ||
/// The API provided is as close to the actual expression trees as possible. | ||
/// Filters, arithmetics, function calls, literals, aliases, and more are |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does 'cast' look like?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Haven't written it yet, but it would follow the exact pattern in the CastExpr node, so something like
cast(BIGINT(), "col"_f)
This would actually be better than SQL as we can be very precise about the type being cast.
/// C++ literals are automatically converted into ConstantExpr (expression | ||
/// literals) when part of an expression. To explicitly create a literal: | ||
/// | ||
/// > ExprPtr e = literal(10.3); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would you add some examples of complex types (arrays, maps and structs)?
Would be nice to clarify how to define literals of various integer types.
/// | ||
/// > ExprPtr e = literal(10.3); | ||
/// > ExprPtr e = literal("str"); | ||
/// > ExprPtr e = literalTimestamp("1970-01-01 01:30:00"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What would timestamp with timezone look like?
/// When building long expressions, be careful about C++ operator precedence. | ||
/// For example: | ||
/// | ||
/// > ExprPtr e = "col"_f + 5 * 100; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you cautioning about plus(col, multiply(5, 100)) vs. plus(col, 500)? This wasn't obvious right away.
/// > ExprPtr e = "col"_f + 5 * literal(100); | ||
/// > ExprPtr e = "col"_f + literal(5) * 100; | ||
/// | ||
/// Both will generate "col + 5 * 100", which is "plus(col, multiply(5, 100))". |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this implicitly convert + to "plus" function call? Isn't this assuming "semantics"? Perhaps, list all such implicit conversions somewhere.
using core::CallExpr; | ||
using core::ConstantExpr; | ||
using core::ExprPtr; | ||
using core::FieldAccessExpr; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would you add an example of accessing a field in a struct?
TEST(ExpressionBuilderTest, fieldAccess) { | ||
ExprWrapper result; | ||
|
||
result = field("col"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I noticed that applications using Velox library often do not have the luxury of using Velox symbols unqualified and forced to use velox::BIGINT() vs. just BIGINT().
Since this API is all about brevity and readability, how do you think applications can avoid having to prefix all these names with velox::expr::builder?
velox::expr::builder::field("f") vs. field("f")
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good question. We can naturally move this to the top-level namespace. My concern was to prevent inadvertent symbols clashes by requiring API users to explicitly import namespace. For example, maybe there could be other velox::field()
symbols defined somewhere else? This was out of abundant caution though; more likely than not moving to top-level should be fine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Curious what is the reason preventing applications from doing using namespace velox::expr::builder
. using namespace velox
is probably a bit too broad but using namespace velox::expr::builder
probably is fine.
In the worst case they can also do namespace VEB = velox::expr::builder
and use VEB::field()
TEST(ExpressionBuilderTest, combined) { | ||
ExprWrapper result; | ||
|
||
result = 10L * "c1"_f > call("func", 3.4, "g"_f / "h"_f, call("j")); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This expression is not nearly as complex as expressions commonly found in production, yet, even that is notably less readable as compared to SQL.
I think it is useful to have this API, but, perhaps, not as a replacement for SQL.
@mbasmanova thanks for the comments. Will address them in the next few days. Overall I just wanted to prototype how this API could look like and get a feeling about how the community feels about it, but if we are aligned that this might be useful, I'm happy to complete it add support for the missing expressions you pointed out (complex types, casts, lambdas, etc) |
} | ||
|
||
// Define C++ operator overload for comparisons (filters). | ||
VELOX_EXPR_BUILDER_OPERATOR(operator==, "eq"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have namespace prefix for functions (e.g. presto.default.eq
) so this need to be configurable. Maybe a global variable for prefix is enough. But I would guess there will be other things people want to configure in the future. So a global singleton of config object probably is better.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1. Besides the prefix, I believe Spark functions have different names, e.g. 'eq' in Presto is called 'equalto' in Spark.
TEST(ExpressionBuilderTest, fieldAccess) { | ||
ExprWrapper result; | ||
|
||
result = field("col"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Curious what is the reason preventing applications from doing using namespace velox::expr::builder
. using namespace velox
is probably a bit too broad but using namespace velox::expr::builder
probably is fine.
In the worst case they can also do namespace VEB = velox::expr::builder
and use VEB::field()
|
||
/// Field access (references to columns). | ||
inline ExprWrapper field(const std::string& name) { | ||
return {std::make_shared<FieldAccessExpr>(name, std::nullopt)}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The field access could contain 'input', which makes it a dereference expression. Do we need to support it? Thanks.
Summary: Adding an API to allow developers to fluently create (untyped) expression trees without having to rely on a SQL parser. Details and extensively examples provided in code comments and unit tests. Part of facebookincubator#11383 Differential Revision: D65371064
b2d958d
to
996f0af
Compare
This pull request was exported from Phabricator. Differential Revision: D65371064 |
Summary:
Adding an API to allow developers to fluently create (untyped)
expression trees without having to rely on a SQL parser. Details and
extensively examples provided in code comments and unit tests
Differential Revision: D65371064
Part of #11383