-
Notifications
You must be signed in to change notification settings - Fork 2.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Rest Catalog: spark catalog api fails to work with rest based catalog #11741
Comments
I think use this we can reproduce the problem in a unit test from @TestTemplate
public void testCreateTable() {
assumeThat(catalogName).isEqualTo(SparkCatalogConfig.REST.catalogName());
assertThat(validationCatalog.tableExists(tableIdent))
.as("Table should not already exist")
.isFalse();
sql("CREATE TABLE %s (id BIGINT NOT NULL, data STRING) USING iceberg", tableName);
Table table = validationCatalog.loadTable(tableIdent);
assertThat(table).as("Should load the new table").isNotNull();
StructType expectedSchema =
StructType.of(
NestedField.required(1, "id", Types.LongType.get()),
NestedField.optional(2, "data", Types.StringType.get()));
assertThat(table.schema().asStruct())
.as("Should have the expected schema")
.isEqualTo(expectedSchema);
assertThat(table.spec().fields()).as("Should not be partitioned").hasSize(0);
assertThat(table.properties().get(TableProperties.DEFAULT_FILE_FORMAT))
.as("Should not have the default format set")
.isNull();
spark.sessionState().catalogManager().setCurrentCatalog(catalogName);
assertThat(spark.catalog().tableExists(tableIdent.toString())).isTrue(); // success
assertThat(spark.catalog().tableExists(tableIdent.namespace().toString(), tableIdent.name())).isTrue(); //failure
} I am wondering if anyone run into such when using REST base catalog? |
FYI @szehon-ho @huaxingao and @stevenzwu |
@dramaticlly Thanks for pinging me. This seems to be a Spark bug. I'll investigate further. |
Yeah, it's more like a Spark bug, probably the Iceberg REST catalog didn't impl this method in class
|
After taking a closer look at the Java Doc, I found that the API is intended only for the Hive Metastore. As specified in the Java Doc
It's a user error to use |
thank you @huaxingao and appreciated your finding, FYI @sunny1154 @stevenzwu looks like spark also explicitly mentioned these function is only meant for hive metastore
|
thanks @huaxingao for looking into this. is I can open another bug for |
Just to add @huaxingao's point
For the Spark master branch (future Spark 4.0)
this is due to backward compatibility. Is it possible for you to to just use the dot notation? |
@sunny1154 I think you would need to specify the catalog in Otherwise, Spark tries to use
|
@sunny1154 |
Apache Iceberg version
1.5.0
Query engine
Spark
Please describe the bug 🐞
Hi,
I am observing issues when working with rest based catalog.
my spark session has default catalog defined which is based of REST based catalog.
SparkSession.catalog api fails to work with rest based catalog.
tested with Spark 3.4.
${SPARK_HOME}/bin/spark-shell --master local[*]
--driver-memory 2g
--conf spark.hadoop.fs.s3a.aws.credentials.provider=org.apache.hadoop.fs.s3a.TemporaryAWSCredentialsProvider
--conf spark.sql.catalog.iceberg.uri=https://xx.xxx.xxxx.domain.com
--conf spark.sql.warehouse.dir=$SQL_WAREHOUSE_DIR
--conf spark.sql.defaultCatalog=iceberg
--conf spark.sql.catalog.iceberg=org.apache.iceberg.spark.SparkCatalog
--conf spark.sql.catalog.iceberg.catalog-impl=org.apache.iceberg.rest.RESTCatalog \
scala> spark.catalog.currentCatalog
res1: String = iceberg
scala> spark.sql("select * from restDb.restTable").show
+---+----------+
| id| data|
+---+----------+
| 1|some_value|
+---+----------+
scala> spark.catalog.tableExists("restDb.restTable")
res3: Boolean = true
scala> spark.catalog.tableExists("restDb", "restTable")
res4: Boolean = false
other API also fail like
spark.catalog.getTable("restDb", "restTable")
-- fails with database not found
spark.catalog.getTable("restDb.restTable")
-- returns table object
spark.catalog.tableExists("restDb", "restTable")
-- return false (even though table exists)
spark.catalog.tableExists("restDb.restTable")
-- return true (if table exists and registered with rest catalog)
spark.catalog.listColumns("restDb", "restTable")
-- fails with database not found
spark.catalog.listColumns("restDb.restTable")
-- return list of columns
Willingness to contribute
The text was updated successfully, but these errors were encountered: