-
Notifications
You must be signed in to change notification settings - Fork 583
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CESQL v1 review changes #1286
CESQL v1 review changes #1286
Conversation
Signed-off-by: Calum Murray <[email protected]>
cesql/spec.md
Outdated
@@ -200,6 +200,8 @@ _Boolean_. | |||
When addressing an attribute not included in the input event, the subexpression referencing the missing attribute MUST evaluate to `false`. | |||
For example, `true AND (missingAttribute = "")` would evaluate to `false` as the subexpression `missingAttribute = ""` would be false. | |||
|
|||
When addressing an attribute represented by the _Integer_ type, if the value of that attribute does not fit into a 32 bit signed integer, an error MUST be returned. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A CloudEvent Integer attribute can't have such a value though... from the CE Spec:
A whole number in the range -2,147,483,648 to +2,147,483,647 inclusive. This is the range of a signed, 32-bit, twos-complement encoding. Event formats do not have to use this encoding, but they MUST only use Integer values in this range.
To put it another way: if we've got this for integers, do we need something similar for attributes represented by the Boolean type but with a value which isn't true or false?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, if it isn't possible for this value to be in an event then I think we can remove this - I hadn't been aware that that was the case. Does this range also apply to any extension attributes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it's the CE type system rather than about specific attributes.
* Link to ISO 9075 * Fix some typos * Clarify return types and errors * Simplify ebnf definitions for value identifiers * Clarify type casting and behaviour for operators Signed-off-by: Calum Murray <[email protected]>
cesql/spec.md
Outdated
transparency][referential-transparency-wiki]. The primary output of a CESQL expression evaluation is always a _boolean_, an _integer_ or a _string_. | ||
The secondary output of a CESQL expression evaluation is a set of errors which occurred during evaluation. This set MAY be empty, indicating that no | ||
error occurred during execution of the expression. The value used by CESQL engines to represent an empty set of errors is out of the scope of this | ||
specification. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like this better. Do you think we need to add something like:
Generation of an error does not halt processing. This specification does not mandate how these errors are exposed to a user.
? Is the first sentence true?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not 100% sure here, if you look at section 4.1 it states that:
When evaluating an expression, the evaluator can operate in two modes, in relation to error handling:
- Fail fast mode: When an error is triggered, the evaluation is interrupted and returns the error, without any result.
- Complete evaluation mode: When an error is triggered, the evaluation is continued, and the evaluation of the expression returns both the result and the error(s).
But, as far as I can tell the SDKs implement the fail fast mode instead of the complete evaluation mode.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
w/o any result... is that true even in the SDKs and Knative? I was assuming the was result 'false'.
I'm having a memory of us discussing what happens when a nested expression errors when that expression is part of an AND - doesn't it result in a false for the nested expression and then false again due to that side of the AND being false? Or am I thinking about some other situation?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In terms of w/o any result, I think the problem (from a spec standpoint at least) is what type should the return value have? For example, if my query is LEFT(missingattribute, 10)
, then should I get (false, missing attribute error)
as my return value? Does it make more sense to return ("", missing attribute error)
?
But, similarly if my expression is LEFT(missingattribute, 10) LIKE prefix%
, then having a return value of (false, missing attribute error)
would be what I would expect while a return value of ("", missing attribute error)
would not be what I expect.
Perhaps it makes sense to say something along the lines of "Fail fast mode: When an error is triggered, the evaluation is interrupted and returns the zero value for the return type of the root operation, along with the error". WDYT @duglin ?
Before we ship this - can we wrap it at 80 columns to match the other specs? |
Signed-off-by: Calum Murray <[email protected]>
Signed-off-by: Calum Murray <[email protected]>
Signed-off-by: Calum Murray <[email protected]>
I'm unlikely to get time to review this again I'm afraid. |
No worries, thank you so much for taking the time the last few weeks to review this! |
Signed-off-by: Calum Murray <[email protected]>
cesql/spec.md
Outdated
transparency][referential-transparency-wiki]. The output of a CESQL expression evaluation is always a _boolean_, an _integer_ or a _string_, and it might include an error. | ||
transparency][referential-transparency-wiki]. The primary output of a CESQL expression evaluation is always a _boolean_, an _integer_ or a _string_. | ||
The secondary output of a CESQL expression evaluation is a set of errors which occurred during evaluation. This set MAY be empty, indicating that no | ||
error occurred during execution of the expression. The value used by CESQL engines to represent an empty set of errors is out of the scope of this |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't the entire representation of, or even access to, errors out of scope of the spec, not just the empty set of errors?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah that's a good point - I was trying to not specify that as I wrote the error stuff (apart from specifying the types of errors you could get). Let me clarify that!
cesql/spec.md
Outdated
- _MissingFunctionError_: An error that occurs due to a call to a function that has not been registered with the CESQL engine | ||
- _FunctionEvaluationError_: An error that occurs during the evaluation of a function | ||
|
||
Whenever an operator or function encounters an error, it MUST return a return value as well as an error value or values. In cases where there is not an obvious return value for the expression, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See if this is easier to read:
...it MUST result in a "return value" as well as one or more "error values".
Up to you
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is a missing attribute an error? Kind of feel like it should be and therefore should have an error type, no? I could be swayed into believing that it's special, but if so we may want to say that a missing attribute isn't an error in the spec so people don't think we overlooked it.
Also, I can't help but wonder if there should be some kind of "generic error" define for cases that you didn't think of above.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I keep going back to whether or not a missing attribute should be an error. I think with the introduction of zero values for each type along with the short circuiting of AND/OR, it makes sense to have missing attributes as an error again. A big part of the argument for not having an error was to try and understand an expression like:
missingAttribute AND true
as well as
true OR missingAttribute
But, with these new changes we have made I think that these can be reasoned about so including the error (with the zero type) makes more sense
cesql/spec.md
Outdated
@@ -197,8 +207,13 @@ Unless otherwise specified, every attribute and extension MUST be represented by | |||
Through implicit type casting, the user can convert the addressed value instances to _Integer_ and |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"... MUST be represented by the String type as its initial type"... I'm curious what this really means and why it's here. Does the user who is creating the expression need to do something with this info? Or is this for an implementation/SDK ? And if so, what do they do with it? If someone defines an extension called myint (an int), why would it matter if it starts out (internally) as an int or a string if it will be type cast as needed?
The spec says: The primary output of a CESQL expression evaluation is always a boolean, an integer or a string.
Does this mean that an expression of just myint
MUST result in a string? And if so, it's kind of interesting that the spec doesn't define an int() function to allow people to covert it back to an int if that's what they want to result to be.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm yeah that is interesting, I'd somehow missed this when looking at the spec earlier.
I think part of what this does is it allows us to add in support for other types (e.g. timestamp) in the future without breaking anything, since the values would still start as a string representation of the timestamp, but then implicitly cast to the actual timestamp
But I think that always representing a type as a string is also awkward for users of the spec. Maybe the best approach would be to say that Unless otherwise specified or the value of the attribute or extension is one of the built in CESQL types, every attribute and extension MUST be represented by the _String_ type as its initial type
. This semantics is also what the SDKs currently do
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm still not following. If I define an extension as an int, why must it be converted to a string ?
Especially if the cesql expression if myint = 5
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let me ask this... what breaks if you just remove these sentences and this notion of "initial type"? Things will get converted automatically when needed. I'm just not seeing the point of (or the meaning of) the phrase "initial type".
Thinking about this more... is this exposing some implementation detail because the SDKs convert all attributes to strings initially? If so, then it really shouldn't be in the spec since the spec should focus on the cesql language syntax and resulting semantics, not how the code goes from "syntax" to "result".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about "If the value of the attribute or extension is not one of the primitive CESQL types, it MUST be represented by the String type"
1. If the function is n-ary with `n > 1`: | ||
1. If it's not ambiguous, cast all the arguments to the corresponding parameter types. | ||
1. If it's ambiguous, raise an error and the cast results are undefined. | ||
1. Cast all the arguments to the corresponding parameter types. | ||
1. If the operator is n-ary with `n > 2`: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm sure there a reason, but why "n>2" and not "n>0" ?
And I guess on the previous one, why "n > 1" and not "n > 0" ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For "n>2" instead of "n>0" this is because we already handled the "n=1" and "n=2" earlier with the unary and binary operator/function sections.
As for why we only handle binary operator resolution and not binary function resolution (the "n>1" vs "n>2"), I am not sure, this comes from before my time working on the spec. I'm inclined to believe there was a reason for it though
Signed-off-by: Calum Murray <[email protected]>
Signed-off-by: Calum Murray <[email protected]>
is set to be the zero value for the type of the missing attribute. Rather, the subexpression with the missingAttribute returns the zero value of the return type of the subexpression. | ||
As an example, `1 / missingAttribute` does not raise a _MathError_ due to the division by zero, instead it returns `0, (missingAttributeError)` as that is the zero value for the return type of the subexpression. | ||
In cases where the return type of the subexpression cannot be determined by the CESQL engine, the CESQL engine MUST assume a return type of _Boolean_. In such cases, the return value would | ||
therefore be `false, (missingAttributeError)`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These examples really help clarify things, even if I have to read it a few times to really let it sink in :-)
cesql/spec.md
Outdated
1. Function invocations | ||
1. Unary operators | ||
1. NOT unary operator | ||
1. `-` unary operator |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Signed-off-by: Calum Murray <[email protected]>
Signed-off-by: Calum Murray <[email protected]>
To me it seems like we are converging, not sure if @duglin or @jskeet have other opinions. My thoughts would be that we should:
|
cesql/spec.md
Outdated
@@ -56,7 +56,11 @@ producer, or in an intermediary, and it can be implemented using any technology | |||
The CloudEvents Expression Language assumes the input always includes, but is not limited to, a single valid and | |||
type-checked CloudEvent instance. An expression MUST NOT mutate the value of the input CloudEvent instance, nor any of | |||
the other input values. The evaluation of an expression observes the concept of [referential | |||
transparency][referential-transparency-wiki]. The output of a CESQL expression evaluation is always a _boolean_, an _integer_ or a _string_, and it might include an error. | |||
transparency][referential-transparency-wiki]. This means that any part of an expression can be replaced with it's output |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's => its
cesql/spec.md
Outdated
|
||
#### 3.5.3 Function Errors | ||
|
||
As specified in 3.3, in the event of an error a function MUST still return a valid return value for it's defined return type. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's => its
Signed-off-by: Calum Murray <[email protected]>
Approved on the 5/30 call. The current plans are to complete the final review by next week's call, and then (if the group approves), to start the official 1-week voting process. |
Addresses the comments on #1282
Proposed Changes
Look at the discussion on #1282