Skip to content

Conversation

@XJDKC
Copy link
Member

@XJDKC XJDKC commented Nov 1, 2025

Currently, RESTCatalog allows users to replace components such as RESTClient, FileIO, AuthManager, and MetricsReporter. However, one dependent component that remains non-injectable is RESTTableOperations.

This PR adds support for injecting custom implementations of table and view operations in RESTCatalog, enabling users to extend and customize REST catalog behavior more easily. It doesn't change any functionalities.

@github-actions github-actions bot added the core label Nov 1, 2025
@XJDKC XJDKC force-pushed the rxing-rest-operations-builder branch from c5a8e9a to d849fce Compare November 1, 2025 16:53
* @param endpoints the set of supported REST endpoints
* @return a new RESTViewOperations instance
*/
default RESTViewOperations createViewOperations(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How is fileIO handled for view opertions?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIUC, FileIO is not required for view operations, because Iceberg views are logical objects that contain only metadata (SQL definitions, schemas, and versions) and do not read or write any physical files.

When a user runs a query against a view, the query engine expands the view's SQL definition, compiles it into a query plan, and resolves the underlying tables. At that point, the engine loads the actual table objects (which include TableOperations and FileIO) to read the physical data files.

@XJDKC
Copy link
Member Author

XJDKC commented Nov 6, 2025

cc: @flyrain @stevenzwu @huaxingao Could you pls take a look when you get a chance? Thanks! 🙏

Copy link
Contributor

@flyrain flyrain left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @XJDKC for the change. Left some comments.


public RESTCatalog(Function<Map<String, String>, RESTClient> clientBuilder) {
this(SessionCatalog.SessionContext.createEmpty(), clientBuilder);
this(SessionCatalog.SessionContext.createEmpty(), clientBuilder, null, null);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
this(SessionCatalog.SessionContext.createEmpty(), clientBuilder, null, null);
this(clientBuilder, null, null);

or

Suggested change
this(SessionCatalog.SessionContext.createEmpty(), clientBuilder, null, null);
this(SessionCatalog.SessionContext.createEmpty(), clientBuilder);

We might go with the second one so that no change is needed here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense, let me revise it.

Comment on lines +77 to +82
public RESTCatalog(
Function<Map<String, String>, RESTClient> clientBuilder,
BiFunction<SessionCatalog.SessionContext, Map<String, String>, FileIO> ioBuilder,
RESTOperationsBuilder operationsBuilder) {
this(SessionCatalog.SessionContext.createEmpty(), clientBuilder, ioBuilder, operationsBuilder);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this method necessary if we go with this(SessionCatalog.SessionContext.createEmpty(), clientBuilder); in line 68?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even though RESTSessionCatalog allows us to pass in the ioBuilder, but the RESTCatalog doesn't, so I add it in the constructor of RESTCatalog as well.

* RESTSessionCatalog catalog = new RESTSessionCatalog(clientBuilder, ioBuilder, customBuilder);
* </pre>
*/
public interface RESTOperationsBuilder {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It’s more of a factory than a builder. The interface doesn’t progressively build or configure objects. It just creates them directly. The intent and usage align more closely with a Factory or Provider pattern. Should we rename it to xxxFactory?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense, let me revise it!

import org.apache.iceberg.util.LocationUtil;

class RESTTableOperations implements TableOperations {
public class RESTTableOperations implements TableOperations {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need this scope change?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If users only want to make small adjustments to RESTTableOperations (for example, injecting a custom header), they can simply provide a custom implementation that extends RESTTableOperations, without having to copy the entire class.

This makes it much easier for them to upgrade to newer Iceberg SDK versions without dealing with merge conflicts or duplicated code.

I'm okay with either approach here, don't have a strong preference. WDYT?

import org.apache.iceberg.view.ViewOperations;

class RESTViewOperations implements ViewOperations {
public class RESTViewOperations implements ViewOperations {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need this scope change?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants