Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEATURE REQUEST] Support for creating nested namespaces recursively #543

Open
Karthilearns opened this issue Dec 12, 2024 · 7 comments
Open
Labels
enhancement New feature or request

Comments

@Karthilearns
Copy link

Is your feature request related to a problem? Please describe.

Currently , Polaris doesn't support creating namespaces which do not exists recursively . for spark.sql('create namespace n1.n2.n3') to work , n1 and n2 should already be there in place , or else the SQL would fail.

To bring in compatibility with other catalog's SQL, would it make sense to support creating nested namespaces recursively? Or is there any reason behind not doing this?

Describe the solution you'd like

allow users to create nested namespaces with a single SQL command,

Describe alternatives you've considered

The only alternative we have is constructing multiple SQL queries which is difficult compared to what other catalog offers.

Additional context

No response

@Karthilearns Karthilearns added the enhancement New feature or request label Dec 12, 2024
@Karthilearns Karthilearns changed the title [FEATURE REQUEST] Support for creating namespaces recursively [FEATURE REQUEST] Support for creating nested namespaces recursively Dec 12, 2024
@jbonofre
Copy link
Member

You are using Spark with the Iceberg REST client I guess right ?

So, I guess you are referencing using foo/bar/john namespace name, we should create the namespaces foo, bar, john recursively (one call), right ?

@Karthilearns
Copy link
Author

@jbonofre yes , I'm using Spark with Polaris REST catalog.

Yes , this should happen in one call.

@Karthilearns
Copy link
Author

Karthilearns commented Dec 12, 2024

If this cannot be done or not in Polaris's scope of nested catalogs , I would like to know the reason behind. The one of reason I'm concerned about is,

Say i have my application writing iceberg tables to a REST based catalog. here with Polaris's way of handling nested namespaces , I cannot migrate my application code to an another rest catalog implementation. The migration use case of iceberg catalogs actually fails here.

@Karthilearns
Copy link
Author

@jbonofre - im happy to contribute a PR incase of feature approval.

@flyrain
Copy link
Contributor

flyrain commented Dec 13, 2024

Hi @Karthilearns, would you mind sharing the error message?

@Karthilearns
Copy link
Author

Karthilearns commented Dec 13, 2024

Hi @flyrain ,

session.sql("use quickstart_catalog"); session.sql("show namespaces").show(); session.sql("create namespace toplevel.second");

Error :

Exception in thread "main" org.apache.iceberg.exceptions.NoSuchNamespaceException: Namespace does not exist: toplevel at org.apache.iceberg.rest.ErrorHandlers$NamespaceErrorHandler.accept(ErrorHandlers.java:173) at org.apache.iceberg.rest.ErrorHandlers$NamespaceErrorHandler.accept(ErrorHandlers.java:166) at org.apache.iceberg.rest.HTTPClient.throwFailure(HTTPClient.java:211) at org.apache.iceberg.rest.HTTPClient.execute(HTTPClient.java:323) at org.apache.iceberg.rest.HTTPClient.execute(HTTPClient.java:262) at org.apache.iceberg.rest.HTTPClient.post(HTTPClient.java:368) at org.apache.iceberg.rest.RESTClient.post(RESTClient.java:112) at org.apache.iceberg.rest.RESTSessionCatalog.createNamespace(RESTSessionCatalog.java:538) at org.apache.iceberg.catalog.BaseSessionCatalog$AsCatalog.createNamespace(BaseSessionCatalog.java:128) at org.apache.iceberg.rest.RESTCatalog.createNamespace(RESTCatalog.java:223) at org.apache.iceberg.spark.SparkCatalog.createNamespace(SparkCatalog.java:482) at org.apache.spark.sql.execution.datasources.v2.CreateNamespaceExec.run(CreateNamespaceExec.scala:47) at org.apache.spark.sql.execution.datasources.v2.V2CommandExec.result$lzycompute(V2CommandExec.scala:43) at org.apache.spark.sql.execution.datasources.v2.V2CommandExec.result(V2CommandExec.scala:43) at org.apache.spark.sql.execution.datasources.v2.V2CommandExec.executeCollect(V2CommandExec.scala:49) at org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.$anonfun$applyOrElse$1(QueryExecution.scala:107) at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$6(SQLExecution.scala:125) at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:201) at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:108) at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:900) at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:66) at org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.applyOrElse(QueryExecution.scala:107) at org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.applyOrElse(QueryExecution.scala:98) at org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDownWithPruning$1(TreeNode.scala:461) at org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(origin.scala:76) at org.apache.spark.sql.catalyst.trees.TreeNode.transformDownWithPruning(TreeNode.scala:461) at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.org$apache$spark$sql$catalyst$plans$logical$AnalysisHelper$$super$transformDownWithPruning(LogicalPlan.scala:32) at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDownWithPruning(AnalysisHelper.scala:267) at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDownWithPruning$(AnalysisHelper.scala:263) at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDownWithPruning(LogicalPlan.scala:32) at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDownWithPruning(LogicalPlan.scala:32) at org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:437) at org.apache.spark.sql.execution.QueryExecution.eagerlyExecuteCommands(QueryExecution.scala:98) at org.apache.spark.sql.execution.QueryExecution.commandExecuted$lzycompute(QueryExecution.scala:85) at org.apache.spark.sql.execution.QueryExecution.commandExecuted(QueryExecution.scala:83) at org.apache.spark.sql.Dataset.<init>(Dataset.scala:220) at org.apache.spark.sql.Dataset$.$anonfun$ofRows$2(Dataset.scala:100) at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:900) at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:97) at org.apache.spark.sql.SparkSession.$anonfun$sql$4(SparkSession.scala:691) at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:900) at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:682) at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:713) at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:744) at com.striim.PolarisCatalog.main(PolarisCatalog.java:17)

@flyrain
Copy link
Contributor

flyrain commented Dec 13, 2024

It failed at when Polaris is trying to check the parent namespace's privilege, https://github.com/polaris-catalog/polaris/blob/main/service/common/src/main/java/org/apache/polaris/service/catalog/PolarisCatalogHandlerWrapper.java#L248-L248.

Basically, the behavior you want here is to create the parent namespace if it doesn't exist, then create the sub namespace. If Polaris allows this behavior, it actually breaks the assumption that one REST API call only creates one namespace. Checking this spec for details, https://github.com/polaris-catalog/polaris/blob/main/spec/rest-catalog-open-api.yaml#L4080-L4080. I think it's more suitable as a client side change, like this pseudo code shows:

# create namespace n1.n2
if( n1 not exists) {
   create n1
   create n2
} else {
   create n2
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants