-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Kernel] Add STARTS_WITH
expression and default impl
#3991
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great!
minor comments.
kernel/kernel-api/src/main/java/io/delta/kernel/expressions/Predicate.java
Outdated
Show resolved
Hide resolved
.../src/main/java/io/delta/kernel/defaults/internal/expressions/DefaultExpressionEvaluator.java
Outdated
Show resolved
Hide resolved
.../src/main/java/io/delta/kernel/defaults/internal/expressions/DefaultExpressionEvaluator.java
Outdated
Show resolved
Hide resolved
.../src/main/java/io/delta/kernel/defaults/internal/expressions/DefaultExpressionEvaluator.java
Outdated
Show resolved
Hide resolved
"LIKE", | ||
Arrays.asList( | ||
leftResult.expression, | ||
right.getValue() == null |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can this be null?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Technically we could define a null literal with string types
https://github.com/delta-io/delta/blob/master/kernel/kernel-api/src/main/java/io/delta/kernel/expressions/Literal.java#L190-L195
Functional wise
SELECT startswith('SparkSQL', NULL);
NULL
[1]https://learn.microsoft.com/en-us/azure/databricks/sql/language-manual/functions/startswith#examples
.../src/main/java/io/delta/kernel/defaults/internal/expressions/DefaultExpressionEvaluator.java
Outdated
Show resolved
Hide resolved
...ults/src/main/java/io/delta/kernel/defaults/internal/expressions/DefaultExpressionUtils.java
Outdated
Show resolved
Hide resolved
val input = new DefaultColumnarBatch(col1.getSize, schema, Array(col1, col2)) | ||
|
||
val startsWithExpressionLiteral = startsWith(new Column("col1"), Literal.ofString("t%")) | ||
val expOutputVectorLiteral = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you check in spark the behavior of starts_with(null, 't%')
is null as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ran
SELECT startswith(NULL,'t%'); -> NULL
SELECT startswith('t%',NULL); -> NULL
also found the below documentation "If expr or startExpr is NULL, the result is NULL."
https://learn.microsoft.com/en-us/azure/databricks/sql/language-manual/functions/startswith#examples
STARTS_WITH
expression
STARTS_WITH
expressionSTARTS_WITH
expression and default impl
@@ -101,6 +101,11 @@ | |||
* <li>SQL semantic: <code>expr1 IS NOT DISTINCT FROM expr2</code> | |||
* <li>Since version: 3.3.0 | |||
* </ul> | |||
* <li>Name: <code>STARTS WITH</code> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
* <li>Name: <code>STARTS WITH</code> | |
* <li>Name: <code>STARTS_WITH</code> |
if (!(StringType.STRING.equivalent(leftResult.outputType) | ||
&& StringType.STRING.equivalent(rightResult.outputType))) { | ||
throw unsupportedExpressionException( | ||
startsWith, "'STARTS_WITH' is expects STRING type inputs"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
startsWith, "'STARTS_WITH' is expects STRING type inputs"); | |
startsWith, "'STARTS_WITH' expects STRING type inputs"); |
for (int i = 0; i < len; i++) { | ||
char c = input.charAt(i); | ||
if (c == escapeChar) { | ||
escapedString.append('\\'); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should this be escapeChar
instead of \\
?
@@ -183,4 +183,18 @@ private static String escapeLikeRegex(String pattern, char escape) { | |||
} | |||
return "(?s)" + javaPattern; | |||
} | |||
|
|||
/** Escapes characters escapeChar in the input String */ | |||
static String escape(String input, char escapeChar) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is the _
in the input be escaped as well? Otherwise it will be treated as LIKE _
(match any single characters)?
Which Delta project/connector is this regarding?
Description
Initial implementation of STARTS_WITH expression following the idea mentioned in #2539 (comment), i.e. transform the STARTS_WITH(a, b) with LIKE(a, b+"%"), at this moment, we only support b as literal expression.
This is 1/n for addressing #2539, the logic of data skipping will be done in the following PRs
How was this patch tested?
Added test cases in DefaultExpressionEvaluatorSuite.scala
Does this PR introduce any user-facing changes?
No