Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add REST Compatibility Kit #10908

Merged
merged 2 commits into from
Aug 27, 2024

Conversation

danielcweeks
Copy link
Contributor

This PR adds the ability to run the Catalog Table and View tests against a REST Catalog implementation. By default, it will run the tests against a local server using the catalog adaptor to proxy to a JDBC backend catalog.

Catalog configurations can be overridden to point to a different catalog implementation or service running externally.

Copy link
Contributor

@dimas-b dimas-b left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, if I'm posting comment prematurely. I see that the PR is still a draft, but since I came across it, my initial thoughts (mostly around future reuse of these tests) are below.

build.gradle Show resolved Hide resolved
@BeforeAll
static void beforeClass() throws Exception {
if (RCKUtils.isLocalServer()) {
RCKUtils.startLocalServer();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be much nicer to use a JUnit5 Extension for managing the server lifecycle. Also, the extension could support pluggable code for non-default servers. This way the test code can be written without any dependency on the server side, while server startup becomes a runtime concern of the test extension.

Copy link
Contributor Author

@danielcweeks danielcweeks Aug 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That would be great, but I think this could be split out as a follow on PR. I'll take a look and see how involved that is.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I was able to move the local server to an extension, but I'm not sure it's really pluggable from a test perspective, but I also don't think that's necessary. If you take a look at the README, you can disable the local server and configure any external server. The local server just validates all of the catalog tests against the adaptor based implementation which ensure at least the reference implementation complies.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thx! Annotating leaf test classes should work fine. A test suite targeting another server would be able to define its own leaf test class (extending the base TCK class) and have it annotated with a different extension.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess it would be nice to separate actual TCK test code, from helper classes for the "local" server in terms of maven artifacts.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried a few things to separate the tests and local server, but there's always a dependency of some sort (whether it's the JUnit extension or the server classes) on the testFixtures.

The testFixtures are produced as an artifact, so those who want to build client tests against the server can do that (this is what the other projects like pyiceberg/rust do). For running the tests without the test fixtures, I don't have a great solution now, but I'm open to figuring out a way to separate those.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With extensions, I'd usually have it inject (as a JUnit5 ParameterResolver) some java object (e.g. URI) or config into test classes for them to use in creating the REST Catalog (client)... Pseudo-code for example:

static void beforeClass(@RESTServerURI URI uri, @RCKConfig Map<String, String> config) throws Exception {
    restCatalog = RCKUtils.initCatalogClient(uri, config);
}

@RESTServerURI would be defined in the TCK. Non-default test extension will be responsible for injecting it for whatever custom server impl. The URI can be part of the config, of course.

The code that uses @RESTServerURI and @RCKConfig does not have to depend on test extensions.

For example, RCK logic is in class RESTCompatibilitySuite (abstract, for reuse) - no references to system props, env, or RESTServerCatalogAdapter. Then:

@ExtendWith(RESTServerExtension.class)
class RESTCompatibilityReferenceTest extends RESTCompatibilitySuite {
 // this class is empty and NOT REUSED outside of the Iceberg repo
}

Loading config from system props and env is handled by RESTServerExtension, which also injects config into RESTCompatibilitySuite.

WDYT?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this in concept, but in practice, I wasn't able to find an approach that works. One of the issues is that the Suite does not work with the @ExtendWith annotation, so extending the suite class does not allow for extension like in your example. This combined with the multiple test classes we need to extend (CatalogTests<RESTCatalog> and ViewCatalogTests<RESTCatalog>) means that we don't have an easy way to make this pluggable/extendable.

I think we probably want to have a set of standardized tests that we package here and allow for reasonable feature flag tuning, but if you want to customize tests beyond what's defined, just create a separate suite of tests.

Copy link
Contributor

@kevinjqliu kevinjqliu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for adding this! I had similar thoughts about the proliferation of REST catalog implementations and wanted to create a set of standardized tests to ensure that these implementations adhere to the spec.

I wrote up the idea here. Would love to hear your thoughts!

open-api/README.md Outdated Show resolved Hide resolved
@Override
protected boolean requiresNamespaceCreate() {
return PropertyUtil.propertyAsBoolean(
restCatalog.properties(), RCKUtils.RCK_REQUIRES_NAMESPACE_CREATE, false);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
restCatalog.properties(), RCKUtils.RCK_REQUIRES_NAMESPACE_CREATE, false);
restCatalog.properties(), RCKUtils.RCK_REQUIRES_NAMESPACE_CREATE, super.requiresNamespaceCreate());

maybe the default value should be whatever the super class defines?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I dug into the issue here and I feel like we should be explicit here about the defaults for the RCK tests as opposed to relying on the defaults. The issue is the defaults don't reflect the error responses that are validated for a REST catalog. It's a little unintuitive from the name (and primarily applies to the RCK_SUPPORTS_SERVERSIDE_RETRY) but the normal catalog errors for conditions (e.g. Cannot Commit) are different than the error responses the come back in the REST errors (e.g. Commit Failed). This causes issues with the validation. I believe this was introduced specifically for this purpose, but isn't explicitly clear from the names.

Copy link
Contributor

@nastra nastra left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

overall I like this a lot. I haven't tested this against a separate REST server but things look quite promising and I've left a few small/minor comments

Copy link
Contributor

@amogh-jahagirdar amogh-jahagirdar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still need to test against different catalogs but I did a pass over the implementation and it's quite reasonable, thanks for getting this going @danielcweeks !

@danielcweeks danielcweeks force-pushed the rest-compatibility-kit branch 2 times, most recently from e36925a to 22e7102 Compare August 15, 2024 18:34
@danielcweeks danielcweeks force-pushed the rest-compatibility-kit branch from c007d2e to 3e213ca Compare August 15, 2024 22:32
@github-actions github-actions bot added the INFRA label Aug 20, 2024
httpServer.setHandler(context);
httpServer.start();

if(join) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: looks like formatting is off here. I actually wonder whether spotless is being applied to this project

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I double-checked and it indeed looks like spotless is only applied to src + test but not to the code in testFixtures, so we might want to look into that

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll follow up with a PR for this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nastra you were right, PR here

Copy link
Contributor

@Fokko Fokko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, would be good to get this in so we can iterate on it 👍 Eager to get this to PyIceberg and Iceberg Rust as well, and remove the 3rd party containers

Copy link
Contributor

@nastra nastra left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've left a few small comments but I agree with @Fokko, it would be good to get this in and iterate on it.

build.gradle Show resolved Hide resolved
Comment on lines +46 to +49
The REST Compatibility Kit (RCK) is a Technology Compatibility Kit (TCK) implementation for the
Iceberg REST Specification. This includes a series of tests based on the Java reference
implementation of the REST Catalog that can be executed against any REST server that implements the
spec.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe nit-picky, but I do believe RCK (TCK) ought to be based on the spec, not on the reference implementation. The RCK should pass on the reference impl.. but the foundation for it is the spec. Technically, the reference impl. may be incorrect with the respect to the spec (as any piece of software), so the test ought to validate that as well as catalog implementations from other projects.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree the intent is the RCK is based on the Spec and I don't feel this is saying otherwise. The purpose of the reference implementation is to faithfully implement the spec and the tests should validate that behavior. We're using the reference implementation tests as part of this because the "should" achieve the same objective. Creating a separate set of tests for the RCK would either be duplicative or imply that the reference implementation doesn't adhere to the spec (neither of which is good).

so `-Drck.local=false` must be set to point to an external REST server.

```shell
./gradlew :iceberg-open-api:test --tests RESTCompatibilityKitSuite \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add an example or instructions for running the RCK in a downstream Catalog project?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I fully follow, but I would think the RCK is run against a downstream catalog project, not necessarily in one. I suppose you could subclass this suite and include that as part of the tests, but I think that probably is a better follow up once someone has a reference example integrated that we can use to validate an approach.

@danielcweeks danielcweeks merged commit c95bf58 into apache:main Aug 27, 2024
48 checks passed
jenbaldwin pushed a commit to Teradata/iceberg that referenced this pull request Sep 17, 2024
* main: (208 commits)
  Docs: Fix Flink 1.20 support versions (apache#11065)
  Flink: Fix compile warning (apache#11072)
  Docs: Initial committer guidelines and requirements for merging (apache#10780)
  Core: Refactor ZOrderByteUtils (apache#10624)
  API: implement types timestamp_ns and timestamptz_ns (apache#9008)
  Build: Bump com.google.errorprone:error_prone_annotations (apache#11055)
  Build: Bump mkdocs-material from 9.5.33 to 9.5.34 (apache#11062)
  Flink: Backport PR apache#10526 to v1.18 and v1.20 (apache#11018)
  Kafka Connect: Disable publish tasks in runtime project (apache#11032)
  Flink: add unit tests for range distribution on bucket partition column (apache#11033)
  Spark 3.5: Use FileGenerationUtil in PlanningBenchmark (apache#11027)
  Core: Add benchmark for appending files (apache#11029)
  Build: Ignore benchmark output folders across all modules (apache#11030)
  Spec: Add RemovePartitionSpecsUpdate REST update type (apache#10846)
  Docs: bump latest version to 1.6.1 (apache#11036)
  OpenAPI, Build: Apply spotless to testFixtures source code (apache#11024)
  Core: Generate realistic bounds in benchmarks (apache#11022)
  Add REST Compatibility Kit (apache#10908)
  Flink: backport PR apache#10832 of inferring parallelism in FLIP-27 source (apache#11009)
  Docs: Add Druid docs url to sidebar (apache#10997)
  ...
zachdisc pushed a commit to zachdisc/iceberg that referenced this pull request Dec 23, 2024
* Add REST Compatibility Kit

* Publish test artifacts for open api module
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants