Skip to content

chore: use DF scalar functions for StartsWith, EndsWith, Contains, DF LikeExpr #1887

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

mbutrovich
Copy link
Contributor

Which issue does this PR close?

Closes #.

Rationale for this change

The existing code uses deprecated kernels. I am working on Utf8View support for Comet, and found my string filters exploding because they don't support Utf8View. Rather than try to extend the code we have, let's just use the new kernels.

What changes are included in this PR?

  • Remove Comet definitions for Like, StartsWith, EndsWith, and Contains.

How are these changes tested?

Existing tests.

@codecov-commenter
Copy link

codecov-commenter commented Jun 14, 2025

Codecov Report

Attention: Patch coverage is 85.71429% with 3 lines in your changes missing coverage. Please review.

Project coverage is 58.76%. Comparing base (f09f8af) to head (03bdaf6).
Report is 275 commits behind head on main.

Files with missing lines Patch % Lines
...cala/org/apache/comet/parquet/ParquetFilters.scala 75.00% 0 Missing and 3 partials ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##               main    #1887      +/-   ##
============================================
+ Coverage     56.12%   58.76%   +2.63%     
- Complexity      976     1141     +165     
============================================
  Files           119      130      +11     
  Lines         11743    12830    +1087     
  Branches       2251     2415     +164     
============================================
+ Hits           6591     7539     +948     
- Misses         4012     4065      +53     
- Partials       1140     1226      +86     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@mbutrovich
Copy link
Contributor Author

There's one Spark SQL test that isn't binding attribute references correctly. I'll try to figure that out and maybe I'll get lucky and it'll fix the TPC-H correctness failure too.

@mbutrovich
Copy link
Contributor Author

It turns out the reason is because the strings come out of the scan dictionary encoded, which the scalar UDF can't handle. I will need to rethink the approach, or maybe stick a CopyExec somewhere.

@mbutrovich
Copy link
Contributor Author

Converting back to draft for now since I likely won't sort out the dictionary unpacking for a couple of weeks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants