Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

str_detect() and str_replace() not treating period (.) detection same in snowflake translation #1472

Closed
hild0146 opened this issue Mar 5, 2024 · 3 comments

Comments

@hild0146
Copy link

hild0146 commented Mar 5, 2024

str_detect and str_replace not treating periods (.) same in dbplyr with a snowlfake connection

str_replace() recognizes "\\." as looking for and replacing the period character, but str_detect() does not recognize it the same.

Seems to be a translation issue, when looking at the show_query details for both

library(tidyverse)
#> Warning: package 'tidyverse' was built under R version 4.3.3
#> Warning: package 'ggplot2' was built under R version 4.3.3
#> Warning: package 'tidyr' was built under R version 4.3.3
#> Warning: package 'readr' was built under R version 4.3.3
#> Warning: package 'dplyr' was built under R version 4.3.2
#> Warning: package 'stringr' was built under R version 4.3.3
library(DBI)
#> Warning: package 'DBI' was built under R version 4.3.3
library(dbplyr)
#> Warning: package 'dbplyr' was built under R version 4.3.3
#> 
#> Attaching package: 'dbplyr'
#> The following objects are masked from 'package:dplyr':
#> 
#>     ident, sql

con <- 
  dbConnect(
    odbc::odbc(),
    driver     = *removed*,
    uid        = *removed*, 
    pwd        = *removed*,
    server     = *removed*,
    port       = *removed*,
    database   = *removed*,
    Warehouse  = *removed*,
    schema     = *removed*,
  )

tmp <- 
  copy_to(
    con, 
    tibble(var = c("string.period", "string_no_period")), 
    name = "test", 
    overwrite = TRUE
  )


# expected results
tmp %>% 
  mutate(across(var, ~ str_replace(.x, "\\.", "FILLER"), .names = "var_new")) 
#> # Source:   SQL [2 x 2]
#> # Database: Snowflake 8.8.4[philden@Snowflake/HEALTHVERITY]
#>   var              var_new           
#>   <chr>            <chr>             
#> 1 string.period    stringFILLERperiod
#> 2 string_no_period string_no_period

# not detecting as expected
tmp %>% 
  mutate(across(var, ~ str_detect(.x, "\\."), .names = "var_new")) 
#> # Source:   SQL [2 x 2]
#> # Database: Snowflake 8.8.4[philden@Snowflake/HEALTHVERITY]
#>   var              var_new
#>   <chr>            <lgl>  
#> 1 string.period    TRUE   
#> 2 string_no_period TRUE



tmp %>% 
  mutate(across(var, ~ str_replace(.x, "\\.", "FILLER"))) %>%
  show_query()
#> <SQL>
#> SELECT REGEXP_REPLACE("var", '\\.', 'FILLER', 1.0, 1.0) AS "var"
#> FROM "test"


# seems to be missing a \ based on the above
tmp %>% 
  mutate(across(var, ~ str_detect(.x, "\\."))) %>% 
  show_query()
#> <SQL>
#> SELECT ("var") REGEXP ('.*' || '\.' || '.*') AS "var"
#> FROM "test"


# seems to require double escape
tmp %>% 
  mutate(across(var, ~ str_detect(.x, "\\\\."))) %>% show_query()
#> <SQL>
#> SELECT ("var") REGEXP ('.*' || '\\.' || '.*') AS "var"
#> FROM "test"


# expected result
tmp %>% 
  mutate(across(var, ~ str_detect(.x, "\\\\."), .names = "var_new"))
#> # Source:   SQL [2 x 2]
#> # Database: Snowflake 8.8.4[philden@Snowflake/HEALTHVERITY]
#>   var              var_new
#>   <chr>            <lgl>  
#> 1 string.period    TRUE   
#> 2 string_no_period FALSE

Created on 2024-03-05 with reprex v2.1.0

@fh-mthomson
Copy link
Contributor

I believe this is resolved in the dev version of dbplyr, per #1406. After installing, I get consistent results for str_detect(). Can you confirm?

# 1. str_replace() ----
tmp %>% 
  mutate(across(var, ~ str_replace(.x, "\\.", "FILLER"), .names = "var_new")) 
#> # Source:   SQL [2 x 2]
#> # Database: Snowflake 8.10.1[@Snowflake/]
#>   var              var_new           
#>   <chr>            <chr>             
#> 1 string.period    stringFILLERperiod
#> 2 string_no_period string_no_period

tmp %>% 
  collect() %>% 
  mutate(across(var, ~ str_replace(.x, "\\.", "FILLER"), .names = "var_new")) 
#> # A tibble: 2 × 2
#>   var              var_new           
#>   <chr>            <chr>             
#> 1 string.period    stringFILLERperiod
#> 2 string_no_period string_no_period

# 2. str_detect() ----
# corrected:
tmp %>% 
  mutate(across(var, ~ str_detect(.x, "\\."), .names = "var_new")) 
#> # Source:   SQL [2 x 2]
#> # Database: Snowflake 8.10.1[@Snowflake/]
#>   var              var_new
#>   <chr>            <lgl>  
#> 1 string.period    TRUE   
#> 2 string_no_period FALSE

# same as local
tmp %>% 
  collect() %>% 
  mutate(across(var, ~ str_detect(.x, "\\."), .names = "var_new")) 
#> # A tibble: 2 × 2
#>   var              var_new
#>   <chr>            <lgl>  
#> 1 string.period    TRUE   
#> 2 string_no_period FALSE

cc: @fh-afrachioni

@hadley
Copy link
Member

hadley commented Mar 15, 2024

It should be possible to create a reprex that doesn't require an active snowflake connection using the advice in https://dbplyr.tidyverse.org/articles/reprex.html

@PHilden-RP
Copy link

Got it, super helpful thank you for that.

I see the correction made in dev, thanks the resolution on this!

@hadley hadley closed this as completed Mar 15, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants