Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Kernel] Add STARTS_WITH expression and default impl #3991

Closed
wants to merge 6 commits into from

Conversation

huan233usc
Copy link
Contributor

Which Delta project/connector is this regarding?

  • Spark
  • Standalone
  • Flink
  • Kernel
  • Other (fill in here)

Description

Initial implementation of STARTS_WITH expression following the idea mentioned in #2539 (comment), i.e. transform the STARTS_WITH(a, b) with LIKE(a, b+"%"), at this moment, we only support b as literal expression.

This is 1/n for addressing #2539, the logic of data skipping will be done in the following PRs

How was this patch tested?

Added test cases in DefaultExpressionEvaluatorSuite.scala

Does this PR introduce any user-facing changes?

No

Copy link
Collaborator

@vkorukanti vkorukanti left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great!

minor comments.

"LIKE",
Arrays.asList(
leftResult.expression,
right.getValue() == null
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can this be null?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

val input = new DefaultColumnarBatch(col1.getSize, schema, Array(col1, col2))

val startsWithExpressionLiteral = startsWith(new Column("col1"), Literal.ofString("t%"))
val expOutputVectorLiteral =
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you check in spark the behavior of starts_with(null, 't%') is null as well?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ran
SELECT startswith(NULL,'t%'); -> NULL
SELECT startswith('t%',NULL); -> NULL

also found the below documentation "If expr or startExpr is NULL, the result is NULL."
https://learn.microsoft.com/en-us/azure/databricks/sql/language-manual/functions/startswith#examples

@vkorukanti vkorukanti changed the title [Kernel] Initial implementation of STARTS_WITH expression [Kernel] Add STARTS_WITH expression Dec 30, 2024
@vkorukanti vkorukanti changed the title [Kernel] Add STARTS_WITH expression [Kernel] Add STARTS_WITH expression and default impl Dec 30, 2024
@@ -101,6 +101,11 @@
* <li>SQL semantic: <code>expr1 IS NOT DISTINCT FROM expr2</code>
* <li>Since version: 3.3.0
* </ul>
* <li>Name: <code>STARTS WITH</code>
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* <li>Name: <code>STARTS WITH</code>
* <li>Name: <code>STARTS_WITH</code>

if (!(StringType.STRING.equivalent(leftResult.outputType)
&& StringType.STRING.equivalent(rightResult.outputType))) {
throw unsupportedExpressionException(
startsWith, "'STARTS_WITH' is expects STRING type inputs");
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
startsWith, "'STARTS_WITH' is expects STRING type inputs");
startsWith, "'STARTS_WITH' expects STRING type inputs");

for (int i = 0; i < len; i++) {
char c = input.charAt(i);
if (c == escapeChar) {
escapedString.append('\\');
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should this be escapeChar instead of \\?

@@ -183,4 +183,18 @@ private static String escapeLikeRegex(String pattern, char escape) {
}
return "(?s)" + javaPattern;
}

/** Escapes characters escapeChar in the input String */
static String escape(String input, char escapeChar) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is the _ in the input be escaped as well? Otherwise it will be treated as LIKE _ (match any single characters)?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants