Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Spark SQL translate_sql for "add_years" is double-quoting column name 'foo' #1510

Open
bschulth opened this issue Jun 4, 2024 · 2 comments

Comments

@bschulth
Copy link

bschulth commented Jun 4, 2024

When translating sql on a Spark SQL connection, the 'add_year' translation is double quoting the column name:

# SQL:  add_years(dob, 1)
<SQL> ADD_MONTHS('`dob`', 1.0 * 12.0)

This fails on Spark. I would expect single quotes like:

# SQL:  add_years(dob, 1)
<SQL> ADD_MONTHS(`dob`, 1.0 * 12.0)

This seems to be coming from the triple-bang expression here !!!x:

sql_expr(add_months(!!!x, !!n*12))

       add_years = function(x, n, ...) {
         check_dots_empty()
         sql_expr(add_months(!!!x, !!n*12))
       }

There even seems to be a bad unit test for this here:

https://github.com/tidyverse/dbplyr/blob/8f2fcb0852994afe11cc6792ef59ef5ffda84936/tests/testthat/test-backend-spark-sql.R#L3C35-L3C50

expect_equal(test_translate_sql(add_years(x, 1)), sql("ADD_MONTHS('`x`', 1.0 * 12.0)"))

If I change the triple-bang !!!x to a double-bang !!x:

       add_years = function(x, n, ...) {
         check_dots_empty()
         # sql_expr(add_months(!!!x, !!n*12))
         sql_expr(add_months(!!x, !!n*12))
       }

it seems to produce the correct output.

Here is the reprex:

dbplyr:::local_con(dbplyr:::simulate_spark_sql())
dbplyr:::test_translate_sql(add_years(dob, 1))
#> <SQL> ADD_MONTHS('`dob`', 1.0 * 12.0)
reprex::reprex()
#> ℹ Non-interactive session, setting `html_preview = FALSE`.
#> CLIPR_ALLOW has not been set, so clipr will not run interactively
#> Error in switch(where, expr = stringify_expression(x_expr), clipboard = ingest_clipboard(), : EXPR must be a length 1 vector
@ablack3
Copy link
Contributor

ablack3 commented Jun 5, 2024

Here is a reprex on spark showing correct SQL works and current SQL does not.

con <- DBI::dbConnect(
  odbc::databricks(),
  httpPath = Sys.getenv("DATABRICKS_HTTPPATH"),
  useNativeQuery = FALSE,
  bigint = "numeric"
)


DBI::dbGetQuery(con,
                "select observation_period_start_date, ADD_MONTHS('`observation_period_start_date`', 1.0 * 12.0) as day 
                from gibleed.observation_period limit 5")
#>   observation_period_start_date  day
#> 1                    1963-12-31 <NA>
#> 2                    2009-04-26 <NA>
#> 3                    2002-01-30 <NA>
#> 4                    1971-10-14 <NA>
#> 5                    2009-05-30 <NA>


DBI::dbGetQuery(con,
                "select observation_period_start_date, ADD_MONTHS(`observation_period_start_date`, 1.0 * 12.0) as day 
                from gibleed.observation_period limit 5")
#>   observation_period_start_date        day
#> 1                    1963-12-31 1964-12-31
#> 2                    2009-04-26 2010-04-26
#> 3                    2002-01-30 2003-01-30
#> 4                    1971-10-14 1972-10-14
#> 5                    2009-05-30 2010-05-30


DBI::dbDisconnect(con)

Created on 2024-06-05 with reprex v2.1.0

@simonpcouch
Copy link
Collaborator

On current main:

library(odbc)
library(clock)
library(dplyr)
con <- dbConnect(databricks(), HTTPPath = "...")

df <- data.frame(x = as.POSIXct("2000-01-01"))
copy_to(con, df)

q <- tbl(con, "df") %>% mutate(y = add_years(x, 1))

q %>% show_query()
#> <SQL>
#> SELECT `df`.*, ADD_MONTHS('`x`', 1.0 * 12.0) AS `y`
#> FROM `df`

# note y is NA
q %>% collect()
#> # A tibble: 1 × 2
#>   x                   y     
#>   <dttm>              <date>
#> 1 2000-01-01 06:00:00 NA

Created on 2024-08-27 with reprex v2.1.1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants