-
Notifications
You must be signed in to change notification settings - Fork 62
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[v1] Add IS [NOT] TRUE|FALSE|UNKNOWN; make IS NULL and IS MISSING separate AST nodes #1679
Conversation
CROSS-ENGINE-REPORT ❌
Testing Details
Result Details
Now FAILING Tests ❌The following 5 test(s) were previously PASSING in BASE but are now FAILING in TARGET: Click here to see
Now IGNORED Tests ❌The complete list can be found in GitHub CI summary, either from Step Summary or in the Artifact. Now Passing Tests180 test(s) were previously failing in BASE (LEGACY-V0.14.8) but now pass in TARGET (EVAL-C08A6E9). Before merging, confirm they are intended to pass. The complete list can be found in GitHub CI summary, either from Step Summary or in the Artifact. CROSS-COMMIT-REPORT ✅
Testing DetailsResult Details
|
// TODO remove `NULL` and `MISSING` variants from DataType | ||
// <absent types> | ||
public static final int NULL = 1; | ||
public static final int MISSING = 2; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(self-review) removed NULL
and MISSING
from DataType
. IS [NOT] NULL|MISSING
handled by different node than ExprIsType
.
Rest of changes in file are to fix the numbering.
*/ | ||
@Builder(builderClassName = "Builder") | ||
@EqualsAndHashCode(callSuper = false) | ||
public class ExprBoolTest extends Expr { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(self-review) there are a few different ways to model bool test predicate and null/missing predicate. Open to some other suggestions on modeling if it could be improved. Some alternatives I considered
- separate nodes for all (i.e. ExprIsTrue, ExprIsFalse, ExprIsUnknown, ExprIsNull, ExprIsMissing)? <- seemed a bit redundant w/ all the visitors + rewriter methods that would get added
- modeling
IS [NOT] NULL
andIS [NOT] MISSING
together as an absent node? <- creates the same number of classes (i.e. ExprIsAbsent and AbsentEnum) as just having them separated out. I feel like we won't be adding more absent predicates? But if we will, then an enum might be better.
public static final int UNKNOWN = 0; | ||
public static final int TRUE = 1; | ||
public static final int FALSE = 2; | ||
public static final int UNK = 3; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interesting! Maybe OTHER, or maybe we don't need to define the UNK
variant for this. It would still force non-exhaustive and we have -1 reserved. Maybe you make it
// implicit uknown is 0
TRUE = 1
FALSE = 2
UNKNOWN = 3
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wasn't too sure what to do w/ the name collision. Ideally for the sake of consistency, I would like to keep the unknown/other variant the same across AstEnum
's.
Perhaps we could rename all of the unknown/other variants something that won't have a collision like _UNKNOWN
or _OTHER
? The static functions could then be named _UNKNOWN()
/_OTHER()
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think I have a preference on the exact name so let's check in with John on Tuesday's meeting. I think the important thing is avoiding current and future collisions with what you have suggested.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I've been thinking about this. If there is good separation between our library APIs and serialization, then I believe we can remove the UNKNOWN variants of the other enums.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, let's discuss this tomorrow. I'll need to check w/ the original rationale for including an UNKNOWN
variant for all of the enums. Perhaps it's required for serde? But we could always add the variant and relevant functions back in the future.
input = "'foo' IS TRUE", | ||
expected = Datum.missing(), | ||
mode = Mode.PERMISSIVE() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Something interesting is Postgres does
'1' IS TRUE -- true
'foo' IS TRUE -- err!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So in strict mode, 'foo' IS TRUE
should error (assuming #1680 gets fixed) since the expression for IS <truth value>
expects a boolean value.
This is also consistent w/ what we do for the other boolean constructs like AND
, OR
, NOT
(i.e. data type mismatch error in strict, missing in permissive).
Postgres allows certain strings as input to functions that expect booleans. From their docs:
The datatype input function for type
boolean
accepts these string representations for the “true” state:
- true
- yes
- on
- 1
and these representations for the “false” state:
- false
- no
- off
- 0
Unique prefixes of these strings are also accepted, for example
t
orn
. Leading or trailing whitespace is ignored, and case does not matter.
-- in Postgres
-- IS TRUE
SELECT 'no' IS TRUE -- false
SELECT 'foo' IS TRUE -- Query Error: invalid input syntax for type boolean: "foo"
-- IS FALSE
SELECT 'no' IS FALSE -- true
SELECT 'foo' IS FALSE -- Query Error: invalid input syntax for type boolean: "foo"
-- IS UNKNOWN
SELECT 'no' IS UNKNOWN -- false
SELECT 'foo' IS UNKNOWN -- Query Error: invalid input syntax for type boolean: "foo"
-- other boolean exprs
SELECT NOT 'no' -- true
SELECT NOT 'foo' -- Query Error: invalid input syntax for type boolean: "foo"
SELECT true AND 'no' -- false
SELECT true AND 'foo' -- Query Error: invalid input syntax for type boolean: "foo"
SELECT false OR 'no' -- false
SELECT false OR 'foo' -- Query Error: invalid input syntax for type boolean: "foo"
de2bb2c
to
7c20a3a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm approving AS-IS to keep you moving forward. I've created a follow-up for tues. #1682
Relevant Issues
Description
IS [NOT] TRUE|FALSE|UNKNOWN
boolean test predicateNULL
andMISSING
out ofDataType
and creates dedicated null and missing predicate nodesOther Information
Updated Unreleased Section in CHANGELOG: [NO]
v1
yet to be released.Any backward-incompatible changes? [YES]
DataType
AST values and functions (forNULL
andMISSING
);v1
not yet released so not an issue.Any new external dependencies? [NO]
Do your changes comply with the Contributing Guidelines
and Code Style Guidelines? [YES]
License Information
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.