Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ODBC with Databricks - Temp Tables vs Views #1371

Closed
kmishra9 opened this issue Sep 27, 2023 · 1 comment
Closed

ODBC with Databricks - Temp Tables vs Views #1371

kmishra9 opened this issue Sep 27, 2023 · 1 comment

Comments

@kmishra9
Copy link

kmishra9 commented Sep 27, 2023

Hey there,

One of the primary recommendations for interfacing with Databricks is via dbplyr translation to Spark SQL through the Databricks ODBC driver (using it in a similar way to any other backend). However, Spark SQL apparently doesn't support temporary tables, only temporary views, which is a major breaking change to all compute() calls within my code (and possibly some joins and other things I've not yet run into as well).

Error in `db_save_query.DBIConnection()`:
! Can't save query to "dbplyr_001".
Caused by error in `new_result()`:
! nanodbc/nanodbc.cpp:1691: 00000: [Simba][Hardy] (80) Syntax or semantic analysis error thrown in server while executing query. Error message from server: org.apache.hive.service.cli.HiveSQLException: Error running query: [_LEGACY_ERROR_TEMP_0035] org.apache.spark.sql.catalyst.parser.ParseException: 
Operation not allowed: CREATE TEMPORARY TABLE ... AS ..., use CREATE TEMPORARY VIEW instead.(line 1, pos 0)

== SQL ==
CREATE TEMPORARY TABLE 
^^^
`dbplyr_001` AS
SELECT DISTINCT
  `member_id`,
  `claim_id`,
  `claimline_nbr`,
  `claimline_status`,
  `place_of_service_cd`,
  `hcpcs_cd`,
  `revenue_cd`,
  CAST(`allowed_amt` AS DOUBLE) AS `allowed_amt`,
  CAST(`paid_amt` AS DOUBLE) AS `paid_amt`,
  CAST(`claim_start_dt` AS DATE) AS `claim_start_dt`,
  CAST(`claim_end_dt` AS DATE) AS `claim_end_dt`
FROM `predictive_rewrite`.`upmc`.`claimline`

	at org.apache.spark.sql.hive.thriftserver.HiveThriftServerErrors$.runningQueryError(HiveThriftServerErrors.scala:48)
	at org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation.$anonfun$execute$1(SparkExecuteStatementOperation.scala:693)
	at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
	at com.databricks.unity.UCSEphemeralState$Handle.runWith(UCSEphemeralState.scala:41)
	at com.databricks.unity.HandleImpl.runWith(UCSHandle.scala:99)
	at org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation.org$apache$spark$sql$hive$thriftserver$SparkExecuteStatementOperation$$execute(SparkExecuteStatementOperation.scala:571)
	at org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation$$anon$2$$anon$3.$anonfun$run$2(SparkExecuteStatementOperation.scala:422)
	at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
	at com.databricks.logging.UsageLogging.withAttributionContext(UsageLogging.scala:420)
	at com.databricks.logging.UsageLogging.withAttributionContext$(UsageLogging.scala:418)
	at com.databricks.spark.util.PublicDBLogging.withAttributionContext(DatabricksSparkUsageLogger.scala:25)
	at com.databricks.logging.UsageLogging.withAttributionTags(UsageLogging.scala:470)
	at com.databricks.logging.UsageLogging.withAttributionTags$(UsageLogging.scala:455)
	at com.databricks.spark.util.PublicDBLogging.withAttributionTags(DatabricksSparkUsageLogger.scala:25)
	at com.databricks.spark.util.PublicDBLogging.withAttributionTags0(DatabricksSparkUsageLogger.scala:70)
	at com.databricks.spark.util.DatabricksSparkUsageLogger.withAttributionTags(DatabricksSparkUsageLogger.scala:170)
	at com.databricks.spark.util.UsageLogging.$anonfun$withAttributionTags$1(UsageLogger.scala:495)
	at com.databricks.spark.util.UsageLogging$.withAttributionTags(UsageLogger.scala:607)
	at com.databricks.spark.util.UsageLogging$.withAttributionTags(UsageLogger.scala:616)
	at com.databricks.spark.util.UsageLogging.withAttributionTags(UsageLogger.scala:495)
	at com.databricks.spark.util.UsageLogging.withAttributionTags$(UsageLogger.scala:493)
	at org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation.withAttributionTags(SparkExecuteStatementOperation.scala:64)
	at org.apache.spark.sql.hive.thriftserver.ThriftLocalProperties.$anonfun$withLocalProperties$8(ThriftLocalProperties.scala:161)
	at com.databricks.spark.util.IdentityClaim$.withClaim(IdentityClaim.scala:48)
	at org.apache.spark.sql.hive.thriftserver.ThriftLocalProperties.withLocalProperties(ThriftLocalProperties.scala:160)
	at org.apache.spark.sql.hive.thriftserver.ThriftLocalProperties.withLocalProperties$(ThriftLocalProperties.scala:51)
	at org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation.withLocalProperties(SparkExecuteStatementOperation.scala:64)
	at org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation$$anon$2$$anon$3.run(SparkExecuteStatementOperation.scala:400)
	at org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation$$anon$2$$anon$3.run(SparkExecuteStatementOperation.scala:385)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:422)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1878)
	at org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation$$anon$2.run(SparkExecuteStatementOperation.scala:434)
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:750)
Caused by: org.apache.spark.sql.catalyst.parser.ParseException: 
Operation not allowed: CREATE TEMPORARY TABLE ... AS ..., use CREATE TEMPORARY VIEW instead.(line 1, pos 0)

No great way to make this a reprex, but wanted to understand any workarounds that may exist and/or request an argument or package-level option to switch between temporary views and tables interchangeably to avoid issues with compute(). The naive workaround at the moment is to just override dbplyr::compute() in my code with a user-defined function that takes and returns the generated SQL query, thus allowing the compute() calls to remain and not actually compute anything.

The other way of connecting to Databricks remotely, via Sparklyr and Databricks-Connect, is also having its own fair share of issues at the moment, otherwise that would be the workaround I'd adopt.

@hadley
Copy link
Member

hadley commented Nov 2, 2023

Fixed in #1379 😄

@hadley hadley closed this as completed Nov 2, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants