-
Notifications
You must be signed in to change notification settings - Fork 227
proposal 210
Introduce support for numeric comparisons as though on a continuous number line like Zeta SQL, and improve the type-inferences when working with JSON data.
Author(s) | Tristan Swadell |
Reviewer(s) | Alfred Fuller, Matthais Blume |
Status | Draft | Ready for Review | Approved |
Last-Modified | 2021-10-21 |
In CEL, equality (_==_
) is declared as a parameterized function where the types from both sides must agree, e.g. T == T → bool
for all types T. CEL forbids type-coercion as this is a common source of bugs where expressions such as json.value == 1
might be trivially false or prone to type-coercion failures at runtime. Similarly mismatched expectations may also occur for numeric comparisons (<, <=, >=, >).
The strict type-signature ensures that most problematic expressions are caught during static analysis. The CEL runtimes enforce the same behavior as the type-checker, where comparing two values of differing types will result in a no_matching_overload
error; thus, ensuring that parse-only expressions and type-checked expressions produce a consistent set of errors. However, when a type is dynamic the type-checker cannot infer whether the comparison is valid. In these cases the type-check is deferred until evaluation. Writing valid runtime expressions using dynamically typed and nullable typed data has proven to be a pain point.
Provide equality and ordering comparisons aligned with the behavior of other C-like languages and Zeta SQL. Such improvements will ensure that operations on JSON and Protobuf wrapper values better match user expectations at runtime.
- Ensure dynamically typed and nullable values are easy to use correctly with equality and comparison operations
- Improve the semantic consistency between CEL and Zeta SQL
- Propose solutions which minimize breaking changes to stored expressions
The following issues are out of scope for this proposal:
- Standardize NaN handling
- Address semantic inconsistencies between CEL and Proto equality
- Ensure CEL's 64-bit float literals compare as expected to 32-bit float protobuf fields
- Introduce comparison overloads (<, <=, >=, >) for all permutations of int, uint, and double.
- Support numeric equality tests (==, !=) between all numeric types at runtime.
- Equality across types should be consistent with the expression !(x < y || x > y).
In Zeta SQL, numeric values are treated as though they exist on a continuous number line. This aligns nicely with user expectations where 1 < 1.5 and 1 >= 1.0. Adding comparison overloads to CEL for these cases is trivial and requires no changes to the existing CEL type-checking behavior.
For better or worse, CEL's equality function is expressed as a parametric operation, meaning the types on both sides must agree. This catches a lot of potential errors, but means that custom overloads for numeric equality cannot be expressed without overhauling CEL's concept of equality as the custom definitions would collide with the parametric definition. At runtime, however, dynamic dispatch can be used to address this case correctly.
CEL Expression | SQL | Current Eval | SQL
Eval |
Proposed Eval |
Comparisons | ||||
1.0 == 1
|
SELECT 1.0 == 1;
|
Error
|
TRUE
|
TRUE
|
-1 != 1u
|
SELECT -1 != CAST(1 AS UINT64);
|
Error
|
TRUE
|
TRUE
|
-1 != uint(-1)
|
SELECT -1 != CAST(-1 AS UINT64);
|
Error
|
Error
|
Error
|
-1 != 18446744073709551615u
|
SELECT -1 != 18446744073709551615;
|
Error
|
TRUE
|
TRUE
|
1 != 1u
|
SELECT 1 != CAST(1 AS UINT64);
|
Error
|
FALSE
|
FALSE
|
1u > -2
|
SELECT CAST(1 AS UINT64) > -2
|
Error
|
TRUE
|
TRUE
|
uint64max > -2
|
SELECT 18446744073709551615 > -2
|
Error
|
TRUE
|
TRUE
|
int64Min < 42u
|
SELECT -9223372036854775808 < CAST(42 AS UINT64);
|
Error
|
TRUE
|
TRUE
|
-0.0 == +0.0
|
Error
|
TRUE
|
TRUE
|
|
-0.0 > 0
|
SELECT -0.0 > 0;
|
Error
|
FALSE
|
FALSE
|
-0.0 >= 0
|
SELECT -0.0 >= 0;
|
Error
|
TRUE
|
TRUE
|
-0.0 < 0
|
SELECT -0.0 < 0;
|
Error
|
FALSE
|
FALSE
|
1u >= 0.5
|
SELECT CAST(1 AS UINT64) >= 0.5;
|
Error
|
TRUE
|
TRUE
|
1.0 <= 1
|
SELECT 1.0 <= 1;
|
Error
|
TRUE
|
TRUE
|
1.0 < 1
|
SELECT 1.0 < 1;
|
Error
|
FALSE
|
FALSE
|
pb.Any{payload: pb.FloatValue{value: 1.0}} == 1
|
???
|
Error
|
???
|
TRUE
|
pb.FloatValue{value: 1.33} == 1.33
|
???
|
FALSE
|
???
|
FALSE
|
Set Membership | ||||
1 in [1.0, 2.1]
|
SELECT 1 IN UNNEST([1.0, 2.1]);
|
Error
|
TRUE
|
TRUE
|
1.0 in [1, 2, 3.1]
|
SELECT 1.0 IN UNNEST([1, 2, 3.1]);
|
Error
|
TRUE
|
TRUE
|
1.0 in [1, 2, 3]
|
SELECT 1.0 IN UNNEST([1, 2, 3]);
|
Error
|
TRUE
|
TRUE
|
1 in [1.1, -2, 3.2]
|
SELECT 1 IN UNNEST([1.1, -2, -4.2]);
|
Error
|
FALSE
|
FALSE
|
1 in [-4.2, "42"]
|
SELECT 1 IN UNNEST([-4.2, "42"]);
|
Error
|
Error
|
Error
|
2u in {
|
SELECT CAST(2 AS UINT64) IN (
|
Error
|
TRUE
|
TRUE
|
Access | ||||
{1: 'hello', 2: 'world'}[1.0]
|
SELECT val
|
Error
|
hello
|
hello
|
{1: 'hello', 2: 'world'}[1.1]
|
SELECT val
|
Error
|
|
Error
|
{1:'hello',
|
SELECT val FROM UNNEST([
|
Error
|
!
|
!
|
{1u: true, 2: false}[1.0]
|
SELECT val
|
Error
|
TRUE
|
TRUE
|
- Permit the comparison of null to any other value at runtime.
- Express equality as
type(a) == null_type && type(b) == null_type
- Express inequality as
type(a) != null_type || type(b) != null_type
The expression json.value != null
should also work as the user expects, meaning that the homogeneous equality constraint for CEL should be relaxed at runtime for numeric values and null values. This will also align with SQL's treatment of NULL as well, while also solving a usability issue for wrapper type values.
CEL Expression | SQL | Current Eval | SQL
Eval |
Proposed Eval |
Comparisons | ||||
0 == null
|
SELECT 0 IS NULL;
|
Error
|
FALSE
|
FALSE
|
"" == null
|
SELECT "" IS NULL;
|
Error
|
FALSE
|
FALSE
|
[] != null
|
SELECT [] IS NOT NULL;
|
Error
|
TRUE
|
TRUE
|
{} != null
|
SELECT STRUCT() IS NOT NULL;
|
Error
|
TRUE
|
TRUE
|
Msg{}.string_value == null
|
SELECT string_value IS NULL FROM UNNEST([NEW Msg()]);
|
TRUE
|
TRUE
|
TRUE
|
Msg{string_value: pb.StringValue{}} == null
|
SELECT string_value IS NULL FROM UNNEST([
|
Error
|
FALSE
|
FALSE
|
Msg{string_value: pb.StringValue{}} != null
|
SELECT string_value IS NOT NULL FROM UNNEST([
|
Error
|
TRUE
|
TRUE
|
Set Membership | ||||
null in [1, 2, 3]
|
???
|
Error
|
???
|
FALSE
|
- Introduce a JSON type, as it is not accurate to refer to it as a DYN type.
- The JSON type will decay to map<string, JSON>, list<JSON>, string, double, bool, and null.
Introducing a JSON type would provide higher fidelity type-inferences when working with dynamic data. Meaning that json.value == 1
would rightly be flagged as an error prompting the user to more accurately write json.value == 1.0
instead.
Another caveat to consider is that not all JSON values will be provided via protobuf.Struct values, and so there are no guarantees that the parser used to generate the input will parse numeric values as floating point values. Some parsers treat a {"field": 1} on the wire as a request to generate a map(string, int). Having flexibility in the runtime to make comparisons and equality checks between all numeric types is important for papering over minor semantic differences of the libraries that CEL depends upon.
Support for a JSON type requires an update to the builtin union types supported by CEL's type-checkers. Currently, wrapper types are the only union type (primitive | null) supported by CEL. The mechanism exists, but needs to be generalized.
Note: JSON comparisons will be both safer and easier to use correctly, but comparison of JSON values will not mirror user expectations for comparison of JSON values in JavaScript unless heterogeneous equality is supported at runtime.
CEL Expression | Current Check | Proposed Check | Current Eval | Proposed Eval |
Comparisons | ||||
json.value == 1
|
OK
|
Error
|
Error
|
Boolean
(float|null)
|
json.value != null
|
OK
|
OK
|
Boolean
(if null)
|
Boolean
(always)
|
json.value == protobuf.Msg{}
|
OK
|
Error
|
Error
|
Error
|
json.value < 20
|
OK
|
Error
|
Error
|
Boolean
(float)
|
json.value >= "string"
|
OK
|
OK
|
Boolean
(string)
|
Boolean
(string)
|
Set Membership | ||||
json.value in [1, 2, 3]
|
OK
|
Error
|
Error
|
Boolean
(float|null)
|
Access | ||||
[1, 2, 3][json.value]
|
OK
|
Error
|
Error
|
Boolean
(float)
|
Zeta SQL may coerce literals during static analysis time if it assists with overload resolution. Comparisons and equality checks between uint64 and int64 values are special-cased to ensure values are compared as if on a continuous number line. Signed and unsigned integers are promoted to doubles at analysis time if they are used in a comparison with a double value.
Coercion improves type-agreement automatically. It is unlikely that someone will compare a floating point value with an integer literal, but if they do, the right thing just happens. This is a great user experience and it optimizes some of the checks and conversions which might otherwise happen at runtime. That said, coercion is more of an optimization in the sense that the same behaviors could be achieved by other means. The complexity of type coercion and the possible bugs that could be introduced through it are riskier than the language council is willing to consider at this time.
Note: Zeta SQL supports other literal coercions which are out of scope for this discussion, though notably coercion from STRING value to Duration or Timestamp would improve ease of use and static validation: request.time < "2021-01-01T12:34:56Z", request.time + "1h".
Logically AND type equality with value equality at runtime: type(a) == type(b) && a == b. This is the simplest definition of equality and most consistently lines up with Leibnitz rule equality. However, maps and protobuf.Any values break this rule already as they require special equality treatment to meet user expectations. Considering special treatment for numeric values then is not so much of a stretch.
At a future juncture we may consider heterogeneous equality, but the equality test will be significantly more complex than the simple rule stated above:
is_numeric(a) && is_numeric(b) ? numeric_equals(a, b)
: is_proto(a) && is_proto(b) ? proto_equals(a, b)
: type(a) == type(b) && a == b
At present, the changes for numeric and null equality are compatible with the more complex heterogeneous equality rule stated above.
Note: Given the nature of the changes under consideration, if the more complex form of heterogeneous equality were accepted, it would be reasonable to introduce this sooner rather than later in the runtimes. The only consideration is that proto_equals has not yet been consistently defined for CEL or Zeta SQL yet.
There is a fair amount of complexity dedicated to the handling of protobuf wrapper types in the CEL type-checker, and they are part of the motivation for better runtime support of null. The Java-like unboxing to a scalar value or null also promotes the use of null within CEL outside of the context of JSON which is the only null that CEL understands.
However, for the following reasons removing wrapper type support is not under consideration at this time:
-
Does not remove the need for comparing JSON-values to null.
-
Breaking change for stored type-checked expressions and would require coordination to fix end-user expressions.
-
Complicates the user experience slightly:
wrapper == "value" v. has(wrapper.value) && wrapper.value == <scalar>
Removing special handling for wrapper types may be a good idea, but its removal is orthogonal to fixing the JSON user experience.
Concerns | Yes | No |
Alters AST representation | ✓ | |
Alters type-check semantics | ✓ | |
Alters evaluation semantics | ✓ | |
Impacts evaluation performance | ✓ | |
Introduces new runtime function | ✓ |
Heterogeneous null and numeric equality changes will change runtime semantics; however, they will convert error states to non-error states. This is generally regarded as safe. This feature will be introduced with a flag, and can be default enabled within a month.
Numeric comparisons will require additional runtime functions to be implemented in all stacks, followed by the addition of new function declarations in the CEL standard environment. The addition of the runtime functions should happen before the exposure of new function declarations in the type-checkers. The function declarations should be flag guarded with a long opt-in period before they are default enabled since many applications use different implementations of CEL in their data plane and control plane binaries.
JSON as a type will turn some successfully type-checked expressions into errors. This will surprise users even if their expressions are currently broken at runtime; however, if those errors are detected it means that the expressions are likely broken at runtime. Rollout of this as a standard may take a lot of customer interaction to fix broken expressions. Note, the expressions which are currently broken at runtime, may be fixed by the introduction of heterogeneous null and numeric equality changes.