Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Supporting persisted tables for Spark SQL backend #1502

Closed
zacdav-db opened this issue May 16, 2024 · 1 comment
Closed

Supporting persisted tables for Spark SQL backend #1502

zacdav-db opened this issue May 16, 2024 · 1 comment

Comments

@zacdav-db
Copy link
Contributor

zacdav-db commented May 16, 2024

Spark SQL (in this case, against Databricks) should be able to support non-temporary writes, currently this errors like so:

> results <- tbl(con, I("samples.nyctaxi.trips")) %>%
+   group_by(pickup_zip) %>%
+   summarise(avg_trip_dist = mean(trip_distance))

> compute(results, I("zacdav.default.avg_trip_dist"), temporary = FALSE)
Error in `db_compute()`:
! Spark SQL only support temporary tables
Run `rlang::last_trace()` to see where the error occurred.
Warning message:
Missing values are always removed in SQL aggregation functions.
Use `na.rm = TRUE` to silence this warning
This warning is displayed once every 8 hours. 

> rlang::last_trace(drop = FALSE)
<error/rlang_error>
Error in `db_compute()`:
! Spark SQL only support temporary tables
---
Backtrace:1. ├─dplyr::compute(results, I("zacdav.default.avg_trip_dist"), temporary = FALSE)
 2. └─dbplyr:::compute.tbl_sql(...)
 3.   ├─dbplyr::db_compute(...)
 4.   └─dbplyr:::`db_compute.Spark SQL`(...)
 5.     └─cli::cli_abort("Spark SQL only support temporary tables")
 6.       └─rlang::abort(...)

It looks like the following functions likely need to be adjusted:

  • Adjust db_copy_to.Spark SQL to not invoke NextMethod and directly invole db_compute (code)
  • Adjust db_compute.Spark SQL to conditionally generate CTAS (code)
@zacdav-db
Copy link
Contributor Author

This is now resolved via #1514

zacdav-db pushed a commit to zacdav-db/dbplyr that referenced this issue Jun 11, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant