Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

clarify type casting in CESQL spec #1281

Merged
33 changes: 32 additions & 1 deletion cesql/spec.md
Original file line number Diff line number Diff line change
Expand Up @@ -268,7 +268,7 @@ Similarly, whenever the left operand of the OR operation evaluates to `true`, th
| `x LIKE pattern: String x String -> Boolean` | Returns `true` if the value x matches the `pattern` |
| `x NOT LIKE pattern: String x String -> Boolean` | Same as `NOT (x LIKE PATTERN)` |

The pattern of the `LIKE` operator can contain:
The pattern of the `LIKE` operator MUST be a string literal, and can contain:

- `%` represents zero, one, or multiple characters
- `_` represents a single character
Expand All @@ -278,6 +278,10 @@ For example, the pattern `_b*` will accept values `ab`, `abc`, `abcd1` but won't
Both `%` and `_` can be escaped with `\`, in order to be matched literally. For example, the pattern `abc\%` will match
`abc%` but won't match `abcd`.

In cases where the left operand is not a `String`, it MUST be cast to a `String` before the comparison is made.
The pattern of the `LIKE` operator (that is, the right operand of the operator) MUST be a valid string literal without casting,
otherwise the parser MUST return a parse error.

#### 3.4.4. Exists operator

| Definition | Semantics |
Expand Down Expand Up @@ -353,6 +357,33 @@ left operand of the OR operation evalues to `true`, the right operand MUST NOT b

#### 3.7. Type casting

The following table indicates which type casts a CESQL engine MUST or MUST NOT support:

| Type | Integer | String | Boolean |
| ------- | -------- | ------ | -------- |
| Integer | N/A | MUST | MUST NOT |
| String | MUST | N/A | MUST |
| Boolean | MUST NOT | MUST | N/A |
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a valid choice, but I'm curious why you didn't choose to map 1 with true and 0 with false ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tbh, I just based this off the current implementation in the SDKs...

The 1 mapping to true and 0 to false sounds reasonable though, especially because then the zero value for integers would be cast to false.

Which do you think is better?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm partial to allowing it, then everything can be implicitly cast to anything else - and I think you can then remove an error type.

@jskeet any concerns?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm okay with it - personally I do like being clear about expressions in terms of types, but that's mostly in C# etc. I haven't done significant amounts of SQL in years, so I don't know how it would "feel". If others are happy, that's fine by me.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@duglin if we do this:

map 1 with true and 0 with false

then how would we handle an integer value like 200? Would that be an error? Or to avoid errors would it be better to say that when casting from integer to boolean, any non-zero integer becomes true, and 0 becomes false.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup- I like that. Int->bool is zero vs non-zero. But bool->int is 0 vs 1


For all of the type casts which a CESQL engine MUST support, the semantics which the engine MUST use are defined as follows:

| Definition | Semantics |
| -------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ |
| `Integer -> String` | Returns the string representation of the integer value in base 10, without leading `0`s. If the value is less than 0, the '-' character is prepended to the result. |
| `String -> Integer` | Returns the result of interpreting the string as a 32 bit base 10 integer. The string MAY begin with a leading sign '+' or '-'. If the result will overflow or the string is not a valid integer an error is returned along with a value of `0`. |
| `String -> Boolean` | Returns `true` or `false` if the lower case representation of the string is exactly "true" or "false, respectively. Otherwise returns an error along with a value of `false` |
| `Boolean -> String` | Returns `"true"` if the boolean is `true`, and `"false"` if the boolean is `false`. |
Copy link
Collaborator

@duglin duglin May 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should have a 3x3 table (or something) that shows all combinations in one spot. Then it's easy to see what's missing, like bool->int, int->bool


An example of how _Boolean_ values cast to _String_ combines with the case insensitivity of CESQL keywords is that:
```
TRUE = "true" AND FALSE = "false"
```
will evaluate to `true`, while
```
TRUE = "TRUE" OR FALSE = "FALSE"
```
will evaluate to `false`.

When the argument types of an operator/function invocation don't match the signature of the operator/function being invoked, the CESQL engine MUST try to perform an implicit cast.

This section defines an **ambiguous** operator/function as an operator/function that is overloaded with another
Expand Down
Loading