You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
If we try to save and empty DataFrame with qbeast format, we throw the following error:
java.lang.RuntimeException:TheDataFrame is empty, why are you trying to index an empty dataset?
at io.qbeast.spark.index.DoublePassOTreeDataAnalyzer$.analyze(OTreeDataAnalyzer.scala:351)
Instead, when we do the same with Delta, the library creates a folder in the path with the first commit information and no error is shown.
How to reproduce?
1. Code that triggered the bug, or steps to reproduce:
On the spark shell run org.apache.hadoop.util.VersionInfo.getVersion().
3.3.4
5. How are you running Spark?
Are you running Spark inside a container? Are you launching the app on a remote K8s cluster? Or are you just running the tests in a local computer?
Locally
6. Stack trace:
Trace of the log/error messages.
java.lang.RuntimeException: The DataFrame is empty, why are you trying to index an empty dataset?
at io.qbeast.spark.index.DoublePassOTreeDataAnalyzer$.analyze(OTreeDataAnalyzer.scala:351)
at io.qbeast.spark.index.SparkOTreeManager$.index(SparkOTreeManager.scala:89)
at io.qbeast.spark.index.SparkOTreeManager$.index(SparkOTreeManager.scala:38)
at io.qbeast.spark.index.SparkOTreeManager$.index(SparkOTreeManager.scala:26)
at io.qbeast.spark.table.IndexedTableImpl.$anonfun$doWrite$2(IndexedTable.scala:467)
at io.qbeast.spark.delta.DeltaMetadataWriter.$anonfun$writeWithTransaction$5(DeltaMetadataWriter.scala:113)
at io.qbeast.spark.delta.DeltaMetadataWriter.$anonfun$writeWithTransaction$5$adapted(DeltaMetadataWriter.scala:108)
at org.apache.spark.sql.delta.DeltaLog.withNewTransaction(DeltaLog.scala:223)
at io.qbeast.spark.delta.DeltaMetadataWriter.writeWithTransaction(DeltaMetadataWriter.scala:108)
at io.qbeast.spark.delta.SparkDeltaMetadataManager$.updateWithTransaction(SparkDeltaMetadataManager.scala:45)
at io.qbeast.spark.delta.SparkDeltaMetadataManager$.updateWithTransaction(SparkDeltaMetadataManager.scala:31)
at io.qbeast.spark.table.IndexedTableImpl.doWrite(IndexedTable.scala:466)
at io.qbeast.spark.table.IndexedTableImpl.$anonfun$write$3(IndexedTable.scala:429)
at io.qbeast.spark.table.IndexedTableImpl.$anonfun$write$3$adapted(IndexedTable.scala:421)
at io.qbeast.core.keeper.Keeper.withWrite(Keeper.scala:55)
at io.qbeast.core.keeper.Keeper.withWrite$(Keeper.scala:52)
at io.qbeast.core.keeper.LocalKeeper$.withWrite(LocalKeeper.scala:27)
at io.qbeast.spark.table.IndexedTableImpl.write(IndexedTable.scala:421)
at io.qbeast.spark.table.IndexedTableImpl.save(IndexedTable.scala:383)
at io.qbeast.spark.internal.sources.QbeastDataSource.createRelation(QbeastDataSource.scala:125)
at org.apache.spark.sql.execution.datasources.SaveIntoDataSourceCommand.run(SaveIntoDataSourceCommand.scala:48)
at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:75)
at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:73)
at org.apache.spark.sql.execution.command.ExecutedCommandExec.executeCollect(commands.scala:84)
at org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.$anonfun$applyOrElse$1(QueryExecution.scala:107)
at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$6(SQLExecution.scala:125)
at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:201)
at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:108)
at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:900)
at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:66)
at org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.applyOrElse(QueryExecution.scala:107)
at org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.applyOrElse(QueryExecution.scala:98)
at org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDownWithPruning$1(TreeNode.scala:461)
at org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(origin.scala:76)
at org.apache.spark.sql.catalyst.trees.TreeNode.transformDownWithPruning(TreeNode.scala:461)
at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.org$apache$spark$sql$catalyst$plans$logical$AnalysisHelper$$super$transformDownWithPruning(LogicalPlan.scala:32)
at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDownWithPruning(AnalysisHelper.scala:267)
at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDownWithPruning$(AnalysisHelper.scala:263)
at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDownWithPruning(LogicalPlan.scala:32)
at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDownWithPruning(LogicalPlan.scala:32)
at org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:437)
at org.apache.spark.sql.execution.QueryExecution.eagerlyExecuteCommands(QueryExecution.scala:98)
at org.apache.spark.sql.execution.QueryExecution.commandExecuted$lzycompute(QueryExecution.scala:85)
at org.apache.spark.sql.execution.QueryExecution.commandExecuted(QueryExecution.scala:83)
at org.apache.spark.sql.execution.QueryExecution.assertCommandExecuted(QueryExecution.scala:142)
at org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:859)
at org.apache.spark.sql.DataFrameWriter.saveToV1Source(DataFrameWriter.scala:388)
at org.apache.spark.sql.DataFrameWriter.saveInternal(DataFrameWriter.scala:355)
at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:240)
... 47 elided
The text was updated successfully, but these errors were encountered:
What went wrong?
If we try to save and empty DataFrame with qbeast format, we throw the following error:
Instead, when we do the same with Delta, the library creates a folder in the path with the first commit information and no error is shown.
How to reproduce?
1. Code that triggered the bug, or steps to reproduce:
2. Branch and commit id:
main 6e6b5b4
3. Spark version:
On the spark shell run
spark.version
.3.5.0
4. Hadoop version:
On the spark shell run
org.apache.hadoop.util.VersionInfo.getVersion()
.3.3.4
5. How are you running Spark?
Are you running Spark inside a container? Are you launching the app on a remote K8s cluster? Or are you just running the tests in a local computer?
Locally
6. Stack trace:
Trace of the log/error messages.
The text was updated successfully, but these errors were encountered: