Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Better support for string matching in Snowflake #1406

Merged
merged 5 commits into from
Dec 20, 2023

Conversation

nathanhaigh
Copy link
Contributor

  • Added support for str_starts() and str_ends() by using Snowflake's REGEXP_INSTR() function
  • Refactored str_detect() to use Snowflake's CONTAINS() function instead of hacking with REGEXP() which anchors the start and end by default

Satisfies #1405

 * Added support for str_starts() and str_ends() by using Snowflake's
REGEXP_INSTR() function
 * Refactored str_detect() to use Snowflake's CONTAINS() function
instead of hacking with REGEXP() which anchors the start and end by
default
@fh-afrachioni
Copy link
Contributor

Thank you for working on Snowflake support!

CONTAINS does not support regular expressions. I would argue pattern matching is an expected and important behavior of str_detect(), worth the minor hack to get us there from REGEXP.

@nathanhaigh
Copy link
Contributor Author

nathanhaigh commented Nov 10, 2023

Thank you for working on Snowflake support!

CONTAINS does not support regular expressions. I would argue pattern matching is an expected and important behavior of str_detect(), worth the minor hack to get us there from REGEXP.

Agreed about pattern matching! Serves me right for working on this late a night and not reading the Snowflake document page for CONTAINS properly!

I wonder if swapping over to REGEXP_INSTR might be a less hacky alternative to REGEXP?

@fh-afrachioni
Copy link
Contributor

I wonder if swapping over to REGEXP_INSTR might be a less hacky alternative to REGEXP?

I like it! I think the need to compare the result of REGEXP_INSTR with 0 is cleaner than the existing pattern-padding.

I especially like it because I just realized REGEXP_INSTR takes an optional <regexp_parameters> argument, for which one option is

i: case-insensitive.

This isn't important for str_detect(), but it does open the door to grepl(..., ignore.case = TRUE), which has been an annoyance here for awhile. How would you feel about updating grepl() as well?

@nathanhaigh
Copy link
Contributor Author

This isn't important for str_detect(), but it does open the door to grepl(..., ignore.case = TRUE), which has been an annoyance here for awhile. How would you feel about updating grepl() as well?

Sure - I can take a look at that.

Move from using `REGEXP` to `REGEXP_INSTR` for cleaner syntax and support for
case-insensitive search capabilities.
@nathanhaigh
Copy link
Contributor Author

@fh-afrachioni Done - please take a look and let me know if there is anything further that might need adding.

@nathanhaigh nathanhaigh changed the title Better support for stringr in Snowflake Better support for string matching in Snowflake Nov 14, 2023
@fh-afrachioni
Copy link
Contributor

Hi @nathanhaigh , this looks great! I'm excited to finally have ignore.case working.

One observation: I think we might need another level of backslash escaping. In order to use \D, for example, I apparently need to write grepl('\\\\D', col_name) in my R IDE (using \ as the R escape character, to encode two literal \ in R code, which are sent to Snowflake without being escaped, and interpreted as a single, backslash-escaped, \). Snowflake needs the backslash character to be escaped in string literals.

@nathanhaigh
Copy link
Contributor Author

Hi @nathanhaigh , this looks great! I'm excited to finally have ignore.case working.

One observation: I think we might need another level of backslash escaping. In order to use \D, for example, I apparently need to write grepl('\\\\D', col_name) in my R IDE (using \ as the R escape character, to encode two literal \ in R code, which are sent to Snowflake without being escaped, and interpreted as a single, backslash-escaped, \). Snowflake needs the backslash character to be escaped in string literals.

Good catch @fh-afrachioni - fixed in the latest commit

@nathanhaigh
Copy link
Contributor Author

Is there anything further I need to do to get this PR merged?

@fh-afrachioni
Copy link
Contributor

Thanks for making the update! I'm not a maintainer, but maybe @mgirlich would be willing to consider this lightweight change.

Copy link
Member

@hadley hadley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Can you please add a bullet to the top of NEWS.md? It should briefly describe the change and end with (@yourname, #issuenumber).

@nathanhaigh
Copy link
Contributor Author

@hadley done :)

@hadley hadley merged commit e89aa96 into tidyverse:main Dec 20, 2023
13 checks passed
@hadley
Copy link
Member

hadley commented Dec 20, 2023

Thanks!

@yuhenghuang
Copy link

Came all the way here for the same issue in #1405 .

I got confused that main branch (the source code I was checking) == 2.4.0 branch (the dbplyr I was using) and couldn't find why the code seems to work whereas the error's still there.

Now that I am relieved this will be fixed in the next stable release and all I should do now is wait.

Great thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants