Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CESQL v1 review changes #1286

Merged
merged 11 commits into from
May 30, 2024
Merged

CESQL v1 review changes #1286

merged 11 commits into from
May 30, 2024

Conversation

Cali0707
Copy link
Contributor

Addresses the comments on #1282

Proposed Changes

Look at the discussion on #1282

@Cali0707
Copy link
Contributor Author

cc @jskeet @duglin @pierDipi

cesql/spec.md Outdated
@@ -200,6 +200,8 @@ _Boolean_.
When addressing an attribute not included in the input event, the subexpression referencing the missing attribute MUST evaluate to `false`.
For example, `true AND (missingAttribute = "")` would evaluate to `false` as the subexpression `missingAttribute = ""` would be false.

When addressing an attribute represented by the _Integer_ type, if the value of that attribute does not fit into a 32 bit signed integer, an error MUST be returned.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A CloudEvent Integer attribute can't have such a value though... from the CE Spec:

A whole number in the range -2,147,483,648 to +2,147,483,647 inclusive. This is the range of a signed, 32-bit, twos-complement encoding. Event formats do not have to use this encoding, but they MUST only use Integer values in this range.

To put it another way: if we've got this for integers, do we need something similar for attributes represented by the Boolean type but with a value which isn't true or false?

Copy link
Contributor Author

@Cali0707 Cali0707 May 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, if it isn't possible for this value to be in an event then I think we can remove this - I hadn't been aware that that was the case. Does this range also apply to any extension attributes?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it's the CE type system rather than about specific attributes.

* Link to ISO 9075
* Fix some typos
* Clarify return types and errors
* Simplify ebnf definitions for value identifiers
* Clarify type casting and behaviour for operators

Signed-off-by: Calum Murray <[email protected]>
cesql/spec.md Outdated
transparency][referential-transparency-wiki]. The primary output of a CESQL expression evaluation is always a _boolean_, an _integer_ or a _string_.
The secondary output of a CESQL expression evaluation is a set of errors which occurred during evaluation. This set MAY be empty, indicating that no
error occurred during execution of the expression. The value used by CESQL engines to represent an empty set of errors is out of the scope of this
specification.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this better. Do you think we need to add something like:

Generation of an error does not halt processing. This specification does not mandate how these errors are exposed to a user.

? Is the first sentence true?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not 100% sure here, if you look at section 4.1 it states that:

When evaluating an expression, the evaluator can operate in two modes, in relation to error handling:

  • Fail fast mode: When an error is triggered, the evaluation is interrupted and returns the error, without any result.
  • Complete evaluation mode: When an error is triggered, the evaluation is continued, and the evaluation of the expression returns both the result and the error(s).

But, as far as I can tell the SDKs implement the fail fast mode instead of the complete evaluation mode.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

w/o any result... is that true even in the SDKs and Knative? I was assuming the was result 'false'.

I'm having a memory of us discussing what happens when a nested expression errors when that expression is part of an AND - doesn't it result in a false for the nested expression and then false again due to that side of the AND being false? Or am I thinking about some other situation?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In terms of w/o any result, I think the problem (from a spec standpoint at least) is what type should the return value have? For example, if my query is LEFT(missingattribute, 10), then should I get (false, missing attribute error) as my return value? Does it make more sense to return ("", missing attribute error)?

But, similarly if my expression is LEFT(missingattribute, 10) LIKE prefix%, then having a return value of (false, missing attribute error) would be what I would expect while a return value of ("", missing attribute error) would not be what I expect.

Perhaps it makes sense to say something along the lines of "Fail fast mode: When an error is triggered, the evaluation is interrupted and returns the zero value for the return type of the root operation, along with the error". WDYT @duglin ?

cesql/spec.md Outdated Show resolved Hide resolved
cesql/spec.md Outdated Show resolved Hide resolved
cesql/spec.md Outdated Show resolved Hide resolved
@duglin
Copy link
Collaborator

duglin commented May 23, 2024

Before we ship this - can we wrap it at 80 columns to match the other specs?

@Cali0707
Copy link
Contributor Author

@duglin @jskeet would you be able to take another look when you have time? I think I should have addressed all your feedback (minus wrapping at 80 characters), but I'm happy to make further fixes or address anything I missed.

Thanks again for all of your feedback so far!!

@jskeet
Copy link
Contributor

jskeet commented May 28, 2024

I'm unlikely to get time to review this again I'm afraid.

@Cali0707
Copy link
Contributor Author

I'm unlikely to get time to review this again I'm afraid.

No worries, thank you so much for taking the time the last few weeks to review this!

Signed-off-by: Calum Murray <[email protected]>
cesql/spec.md Outdated
transparency][referential-transparency-wiki]. The output of a CESQL expression evaluation is always a _boolean_, an _integer_ or a _string_, and it might include an error.
transparency][referential-transparency-wiki]. The primary output of a CESQL expression evaluation is always a _boolean_, an _integer_ or a _string_.
The secondary output of a CESQL expression evaluation is a set of errors which occurred during evaluation. This set MAY be empty, indicating that no
error occurred during execution of the expression. The value used by CESQL engines to represent an empty set of errors is out of the scope of this
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't the entire representation of, or even access to, errors out of scope of the spec, not just the empty set of errors?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah that's a good point - I was trying to not specify that as I wrote the error stuff (apart from specifying the types of errors you could get). Let me clarify that!

cesql/spec.md Outdated Show resolved Hide resolved
cesql/spec.md Outdated
- _MissingFunctionError_: An error that occurs due to a call to a function that has not been registered with the CESQL engine
- _FunctionEvaluationError_: An error that occurs during the evaluation of a function

Whenever an operator or function encounters an error, it MUST return a return value as well as an error value or values. In cases where there is not an obvious return value for the expression,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See if this is easier to read:
...it MUST result in a "return value" as well as one or more "error values".

Up to you

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is a missing attribute an error? Kind of feel like it should be and therefore should have an error type, no? I could be swayed into believing that it's special, but if so we may want to say that a missing attribute isn't an error in the spec so people don't think we overlooked it.

Also, I can't help but wonder if there should be some kind of "generic error" define for cases that you didn't think of above.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I keep going back to whether or not a missing attribute should be an error. I think with the introduction of zero values for each type along with the short circuiting of AND/OR, it makes sense to have missing attributes as an error again. A big part of the argument for not having an error was to try and understand an expression like:

missingAttribute AND true

as well as

true OR missingAttribute

But, with these new changes we have made I think that these can be reasoned about so including the error (with the zero type) makes more sense

cesql/spec.md Outdated Show resolved Hide resolved
cesql/spec.md Outdated Show resolved Hide resolved
cesql/spec.md Outdated Show resolved Hide resolved
cesql/spec.md Outdated Show resolved Hide resolved
cesql/spec.md Outdated
@@ -197,8 +207,13 @@ Unless otherwise specified, every attribute and extension MUST be represented by
Through implicit type casting, the user can convert the addressed value instances to _Integer_ and
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"... MUST be represented by the String type as its initial type"... I'm curious what this really means and why it's here. Does the user who is creating the expression need to do something with this info? Or is this for an implementation/SDK ? And if so, what do they do with it? If someone defines an extension called myint (an int), why would it matter if it starts out (internally) as an int or a string if it will be type cast as needed?

The spec says: The primary output of a CESQL expression evaluation is always a boolean, an integer or a string.

Does this mean that an expression of just myint MUST result in a string? And if so, it's kind of interesting that the spec doesn't define an int() function to allow people to covert it back to an int if that's what they want to result to be.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm yeah that is interesting, I'd somehow missed this when looking at the spec earlier.

I think part of what this does is it allows us to add in support for other types (e.g. timestamp) in the future without breaking anything, since the values would still start as a string representation of the timestamp, but then implicitly cast to the actual timestamp

But I think that always representing a type as a string is also awkward for users of the spec. Maybe the best approach would be to say that Unless otherwise specified or the value of the attribute or extension is one of the built in CESQL types, every attribute and extension MUST be represented by the _String_ type as its initial type. This semantics is also what the SDKs currently do

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm still not following. If I define an extension as an int, why must it be converted to a string ?
Especially if the cesql expression if myint = 5 ?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let me ask this... what breaks if you just remove these sentences and this notion of "initial type"? Things will get converted automatically when needed. I'm just not seeing the point of (or the meaning of) the phrase "initial type".

Thinking about this more... is this exposing some implementation detail because the SDKs convert all attributes to strings initially? If so, then it really shouldn't be in the spec since the spec should focus on the cesql language syntax and resulting semantics, not how the code goes from "syntax" to "result".

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about "If the value of the attribute or extension is not one of the primitive CESQL types, it MUST be represented by the String type"

cesql/spec.md Outdated Show resolved Hide resolved
cesql/spec.md Outdated Show resolved Hide resolved
cesql/spec.md Outdated Show resolved Hide resolved
cesql/spec.md Outdated Show resolved Hide resolved
cesql/spec.md Outdated Show resolved Hide resolved
cesql/spec.md Outdated Show resolved Hide resolved
cesql/spec.md Show resolved Hide resolved
cesql/spec.md Outdated Show resolved Hide resolved
cesql/spec.md Outdated Show resolved Hide resolved
1. If the function is n-ary with `n > 1`:
1. If it's not ambiguous, cast all the arguments to the corresponding parameter types.
1. If it's ambiguous, raise an error and the cast results are undefined.
1. Cast all the arguments to the corresponding parameter types.
1. If the operator is n-ary with `n > 2`:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm sure there a reason, but why "n>2" and not "n>0" ?

And I guess on the previous one, why "n > 1" and not "n > 0" ?

Copy link
Contributor Author

@Cali0707 Cali0707 May 29, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For "n>2" instead of "n>0" this is because we already handled the "n=1" and "n=2" earlier with the unary and binary operator/function sections.

As for why we only handle binary operator resolution and not binary function resolution (the "n>1" vs "n>2"), I am not sure, this comes from before my time working on the spec. I'm inclined to believe there was a reason for it though

@Cali0707 Cali0707 requested a review from duglin May 29, 2024 19:28
is set to be the zero value for the type of the missing attribute. Rather, the subexpression with the missingAttribute returns the zero value of the return type of the subexpression.
As an example, `1 / missingAttribute` does not raise a _MathError_ due to the division by zero, instead it returns `0, (missingAttributeError)` as that is the zero value for the return type of the subexpression.
In cases where the return type of the subexpression cannot be determined by the CESQL engine, the CESQL engine MUST assume a return type of _Boolean_. In such cases, the return value would
therefore be `false, (missingAttributeError)`.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These examples really help clarify things, even if I have to read it a few times to really let it sink in :-)

@markpeek
Copy link
Contributor

I appreciate the review and refinement that @jskeet @duglin @Cali0707 have been doing in this PR. Have the review and changes converged yet? If so, it might make sense for another week for the rest of the team to review before voting on the v1. Thoughts?

cesql/spec.md Outdated
1. Function invocations
1. Unary operators
1. NOT unary operator
1. `-` unary operator
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think you got the indenting you were hoping for, I see this:
image

and this new direction is much clearer than what you had before.

@Cali0707
Copy link
Contributor Author

Have the review and changes converged yet? If so, it might make sense for another week for the rest of the team to review before voting on the v1. Thoughts?

To me it seems like we are converging, not sure if @duglin or @jskeet have other opinions. My thoughts would be that we should:

  1. If everyone is okay with it in the meeting, merge these changes this week
  2. Have another week for everyone to review the updated spec after these changes
  3. Have the vote next Thursday

cesql/spec.md Outdated
@@ -56,7 +56,11 @@ producer, or in an intermediary, and it can be implemented using any technology
The CloudEvents Expression Language assumes the input always includes, but is not limited to, a single valid and
type-checked CloudEvent instance. An expression MUST NOT mutate the value of the input CloudEvent instance, nor any of
the other input values. The evaluation of an expression observes the concept of [referential
transparency][referential-transparency-wiki]. The output of a CESQL expression evaluation is always a _boolean_, an _integer_ or a _string_, and it might include an error.
transparency][referential-transparency-wiki]. This means that any part of an expression can be replaced with it's output
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's => its

cesql/spec.md Outdated

#### 3.5.3 Function Errors

As specified in 3.3, in the event of an error a function MUST still return a valid return value for it's defined return type.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's => its

Signed-off-by: Calum Murray <[email protected]>
@Cali0707 Cali0707 requested a review from duglin May 30, 2024 16:14
@duglin
Copy link
Collaborator

duglin commented May 30, 2024

Approved on the 5/30 call.
Despite there being late changes to this PR (meaning, after Tuesday), we decided to merge it anyway so that people only need to look at the latest version of the spec to do their "v1.0" review. So, please look over the "main" branch version of the spec and add any comments to the "review issue": #1282

The current plans are to complete the final review by next week's call, and then (if the group approves), to start the official 1-week voting process.

@duglin duglin merged commit e30acf3 into cloudevents:main May 30, 2024
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants