ODBC with Databricks - Temp Tables vs Views #1371

kmishra9 · 2023-09-27T21:55:19Z

Hey there,

One of the primary recommendations for interfacing with Databricks is via dbplyr translation to Spark SQL through the Databricks ODBC driver (using it in a similar way to any other backend). However, Spark SQL apparently doesn't support temporary tables, only temporary views, which is a major breaking change to all compute() calls within my code (and possibly some joins and other things I've not yet run into as well).

Error in `db_save_query.DBIConnection()`:
! Can't save query to "dbplyr_001".
Caused by error in `new_result()`:
! nanodbc/nanodbc.cpp:1691: 00000: [Simba][Hardy] (80) Syntax or semantic analysis error thrown in server while executing query. Error message from server: org.apache.hive.service.cli.HiveSQLException: Error running query: [_LEGACY_ERROR_TEMP_0035] org.apache.spark.sql.catalyst.parser.ParseException: 
Operation not allowed: CREATE TEMPORARY TABLE ... AS ..., use CREATE TEMPORARY VIEW instead.(line 1, pos 0)

== SQL ==
CREATE TEMPORARY TABLE 
^^^
`dbplyr_001` AS
SELECT DISTINCT
  `member_id`,
  `claim_id`,
  `claimline_nbr`,
  `claimline_status`,
  `place_of_service_cd`,
  `hcpcs_cd`,
  `revenue_cd`,
  CAST(`allowed_amt` AS DOUBLE) AS `allowed_amt`,
  CAST(`paid_amt` AS DOUBLE) AS `paid_amt`,
  CAST(`claim_start_dt` AS DATE) AS `claim_start_dt`,
  CAST(`claim_end_dt` AS DATE) AS `claim_end_dt`
FROM `predictive_rewrite`.`upmc`.`claimline`

	at org.apache.spark.sql.hive.thriftserver.HiveThriftServerErrors$.runningQueryError(HiveThriftServerErrors.scala:48)
	at org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation.$anonfun$execute$1(SparkExecuteStatementOperation.scala:693)
	at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
	at com.databricks.unity.UCSEphemeralState$Handle.runWith(UCSEphemeralState.scala:41)
	at com.databricks.unity.HandleImpl.runWith(UCSHandle.scala:99)
	at org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation.org$apache$spark$sql$hive$thriftserver$SparkExecuteStatementOperation$$execute(SparkExecuteStatementOperation.scala:571)
	at org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation$$anon$2$$anon$3.$anonfun$run$2(SparkExecuteStatementOperation.scala:422)
	at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
	at com.databricks.logging.UsageLogging.withAttributionContext(UsageLogging.scala:420)
	at com.databricks.logging.UsageLogging.withAttributionContext$(UsageLogging.scala:418)
	at com.databricks.spark.util.PublicDBLogging.withAttributionContext(DatabricksSparkUsageLogger.scala:25)
	at com.databricks.logging.UsageLogging.withAttributionTags(UsageLogging.scala:470)
	at com.databricks.logging.UsageLogging.withAttributionTags$(UsageLogging.scala:455)
	at com.databricks.spark.util.PublicDBLogging.withAttributionTags(DatabricksSparkUsageLogger.scala:25)
	at com.databricks.spark.util.PublicDBLogging.withAttributionTags0(DatabricksSparkUsageLogger.scala:70)
	at com.databricks.spark.util.DatabricksSparkUsageLogger.withAttributionTags(DatabricksSparkUsageLogger.scala:170)
	at com.databricks.spark.util.UsageLogging.$anonfun$withAttributionTags$1(UsageLogger.scala:495)
	at com.databricks.spark.util.UsageLogging$.withAttributionTags(UsageLogger.scala:607)
	at com.databricks.spark.util.UsageLogging$.withAttributionTags(UsageLogger.scala:616)
	at com.databricks.spark.util.UsageLogging.withAttributionTags(UsageLogger.scala:495)
	at com.databricks.spark.util.UsageLogging.withAttributionTags$(UsageLogger.scala:493)
	at org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation.withAttributionTags(SparkExecuteStatementOperation.scala:64)
	at org.apache.spark.sql.hive.thriftserver.ThriftLocalProperties.$anonfun$withLocalProperties$8(ThriftLocalProperties.scala:161)
	at com.databricks.spark.util.IdentityClaim$.withClaim(IdentityClaim.scala:48)
	at org.apache.spark.sql.hive.thriftserver.ThriftLocalProperties.withLocalProperties(ThriftLocalProperties.scala:160)
	at org.apache.spark.sql.hive.thriftserver.ThriftLocalProperties.withLocalProperties$(ThriftLocalProperties.scala:51)
	at org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation.withLocalProperties(SparkExecuteStatementOperation.scala:64)
	at org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation$$anon$2$$anon$3.run(SparkExecuteStatementOperation.scala:400)
	at org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation$$anon$2$$anon$3.run(SparkExecuteStatementOperation.scala:385)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:422)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1878)
	at org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation$$anon$2.run(SparkExecuteStatementOperation.scala:434)
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:750)
Caused by: org.apache.spark.sql.catalyst.parser.ParseException: 
Operation not allowed: CREATE TEMPORARY TABLE ... AS ..., use CREATE TEMPORARY VIEW instead.(line 1, pos 0)

No great way to make this a reprex, but wanted to understand any workarounds that may exist and/or request an argument or package-level option to switch between temporary views and tables interchangeably to avoid issues with compute(). The naive workaround at the moment is to just override dbplyr::compute() in my code with a user-defined function that takes and returns the generated SQL query, thus allowing the compute() calls to remain and not actually compute anything.

The other way of connecting to Databricks remotely, via Sparklyr and Databricks-Connect, is also having its own fair share of issues at the moment, otherwise that would be the workaround I'd adopt.

The text was updated successfully, but these errors were encountered:

hadley · 2023-11-02T19:33:39Z

Fixed in #1379 😄

hadley closed this as completed Nov 2, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ODBC with Databricks - Temp Tables vs Views #1371

ODBC with Databricks - Temp Tables vs Views #1371

kmishra9 commented Sep 27, 2023 •

edited

Loading

hadley commented Nov 2, 2023

ODBC with Databricks - Temp Tables vs Views #1371

ODBC with Databricks - Temp Tables vs Views #1371

Comments

kmishra9 commented Sep 27, 2023 • edited Loading

hadley commented Nov 2, 2023

kmishra9 commented Sep 27, 2023 •

edited

Loading