Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement Kafka REST from scratch #74

Merged
merged 41 commits into from
Oct 11, 2024
Merged

Conversation

rohitsanj
Copy link
Contributor

@rohitsanj rohitsanj commented Oct 8, 2024

TLDR

This PR implements the Topic V3, Cluster V3 Kafka REST APIs from scratch using the Kafka AdminClient and swaps the underlying Kafka REST proxy target for confluent-local connections to point to ide-sidecar's own Kafka REST implementation, rather than to confluent-local's kafka-rest server. This will put the ide-sidecar Kafka REST impl in use from day 1 and give us early feedback to iterate.

Background/Motivation

The ide-sidecar Quarkus application today exposes a wildcard route (what we call the “Kafka REST proxy”) at http://localhost:26636/kafka/v3/clusters/* and uses connection/cluster metadata passed as request headers to determine which Kafka REST server the request must be forwarded to. We want to extend this functionality to allow clients of ide-sidecar to be able to connect to any Kafka cluster, given appropriate configuration context. Based on this (internal) decision log, we decided to implement from scratch a minimal set of Kafka REST endpoints in ide-sidecar, though the major reasons for the current approach are as follows:

  • The existing https://github.com/confluentinc/kafka-rest library is designed to be configured against a single Kafka cluster. We've prototyped embedding the library and integrating it into our embedded proxy mechanism, so it's doable.
  • The blocker is that the existing https://github.com/confluentinc/kafka-rest library uses Guava, which is not compatible with GraalVM. This makes it impossible to embed this library into the sidecar, as we had originally planned. We will asynchronously work with that project to be compatible with GraalVM, but there may be other challenges besides Guava, so there is no estimated timeframe for completion.
  • We are currently using a small subset of the Kafka REST API, so short-term implementing a small portion of the same OpenAPI spec is doable. This PR takes this approach. However, as we need more, we hope to replace the current partial re-implementation with a GraalVM-compatible https://github.com/confluentinc/kafka-rest library.

Description of code changes

  • kafka-rest.openapi.yaml: Kafka REST OpenAPI spec taken from confluentinc/kafka-rest that is used to generate API models and resource interfaces. See pom.xml for the openapi-generator-maven-plugin configuration.
  • We generate API resource interface classes for the following API groups: TopicV3, ClusterV3, PartitionV3. Out of these, only TopicV3 and ClusterV3 have concrete implementations. PartitionV3 will be implemented in a followup PR to simply the scope of this PR.
  • Note that the OpenAPI generated classes are NOT checked in to the ide-sidecar source code but are instead generated at build time from the target/generated-sources/openapi/src/gen/java directory.
  • A Clients abstract class was extracted out of the SchemaRegistryClients class and made to be its parent. The Clients class can be parameterized based on the client instance type that needs to be cached by connection id and client id.
    • The new Clients class caches client instances by cluster id in a Caffeine cache.
    • It allows inheritors to define arbitrary cache policies (see CaffeineSpec).
  • A new class AdminClients extends Clients<org.apache.kafka.clients.admin.AdminClient> is used to cache AdminClient instances.
    • It sets the Caffeine cache specification to expiresAfterAccess=5m, this evicts and closes the AdminClient client instances after 5 minutes of inactivity.
  • The exception mappers related to Kafka REST exceptions construct response models using the kafka-rest OpenAPI generated io.confluent.idesidecar.restapi.kafkarest.model.Error model, as opposed to using ide-sidecar's own io.confluent.idesidecar.restapi.exceptions.Failure.Error model. This is intentional in order to adhere to the kafka-rest OpenAPI spec as much as possible.
  • [Major change] The Kafka REST Proxy target for LOCAL (confluent local) connections was changed from the kafka-rest server (served at http://localhost:8082/v3 in the confluent-local docker container), to ide-sidecar's own kafka-rest implementation served at http://localhost:26636/internal/kafka/v3.
    • Given that the PartitionAPI has not yet been implemented in ide-sidecar's Kafka REST, listing partitions for a confluent-local kafka topic will NOT work temporarily.
    • Motivation behind this change: Swapping the confluent-local proxy target to now point to ide-sidecar's Kafka REST will give us early feedback to harden/iterate on the implementation.
  • A few tests that tested prior implementation path for confluent-local Kafka REST proxy have been removed since they are no longer compatible with current implementation. Local connectivity has been sufficiently tested in other integration tests that actually spin up a confluent-local docker container.

Design considerations

  • Clients of ide-sidecar must only be aware of the http://localhost:26636/kafka/v3 Kafka REST proxy routes. The http://localhost:26636/internal/kafka/v3 endpoints serve as an implementation detail for talking to clusters using the Kafka-native protocol.

Followup items

  • PartitionV3 API implementation (see Implement Partition V3 Kafka REST API #77)
  • Include connection-specific resource names from the internal Kafka REST responses
  • From Stefan's comment, set client.id to something like Quarkus for VS Code sidecar ${quarkus.application.version}

* Add openapi generation and implement basic endpoint

* Implement remaining endpoints

* add api spec

* Add cluster api and integration tests

* fix null authorizedOperations

* Add metadata and related urls to response

* Introduce AdminClients to cache AdminClient by connection id and cluster id

* refactor

* broken wide

* wip: needs refactor

* works again

* use constant var

* Add tests for cluster API

* Support confluent local using internal kafka rest

* revert file changes

* make util cass final

* Add docstrings

* fix

* Reduce AdminClient request timeout to 10 seconds and handle exception

* Use Caffeine to do time-based eviction on AdminClient instances

* remove unused import

* remove TODO

* Put unrelated openapi spec changes back in place

* Add exception mapper for UnsupportedOperationException
@rohitsanj rohitsanj requested a review from a team as a code owner October 8, 2024 22:33
@rhauch
Copy link
Contributor

rhauch commented Oct 9, 2024

Thanks, @rohitsanj. Please correct me if I'm wrong, but after this change we'll have two Kafka REST endpoints:

  1. http://localhost:26636/internal/kafka/v3 -- This is the "embedded" Kafka REST API that will always translate requests into Kafka-native protocol requests submitted directly against the external designated Kafka cluster, and translate the Kafka-native responses back to the RESTful representation. This "embedded" Kafka REST API is not meant to be called by the extension.
  2. http://localhost:26636/kafka/v3 -- This is the existing Kafka REST API that the extension uses to talk to an external Kafka REST API server that is running next to a Kafka cluster. With this change, this proxy Kafka REST API will forward requests either to the external Kafka REST server, or forward (or-redirect?) requests to the internal "embedded" Kafka REST API that will forward requests (over the Kafka-native protocol) to the external Kafka cluster.

(The endpoints for both APIs, like nearly all the sidecar APIs) both require the session token be passed as the authz header.)

Can you confirm that this approach was chosen because it makes implementing the http://localhost:26636/kafka/v3 endpoints straightforward, since they always forward to an external or internal Kafka REST server?

@rohitsanj
Copy link
Contributor Author

rohitsanj commented Oct 10, 2024

I made some slight tweaks/fixes identified when testing this branch's sidecar executable against a debug instance of confluentinc/vscode:

  1. Set content-length only if transfer-encoding is not set. From this section in RFC 7230: HTTP 1.1 Message Syntax and Routing: A sender MUST NOT send a Content-Length header field in any message that contains a Transfer-Encoding header field.
  2. Map exceptions of type org.apache.kafka.common.errors.ApiException raised by AdminClient to status code 500 internal server error, with exception message passed through as is.
  3. Added support for accepting and respecting the includeAuthorizedOperations query parameter (notice the camelCase) to the list topics (GET /internal/kafka/v3/clusters/{clusterId}/topics) endpoint. This is used by the VS Code extension to determine the scope of actions a user may perform on a topic. (See this comment in the kafka-rest.openapi.yml file for more context).

I also ended up reverting changes made to ClusterStrategy since I felt the refactor did not bring much benefit.

kafka-rest.openapi.yaml Outdated Show resolved Hide resolved
@flippingbits
Copy link
Contributor

Edit: I managed to fix this by adding a src/main/resources/META-INF/native-image/com.github.ben-manes.caffeine/caffeine/reflect-config.json file as recommended in this very helpful GH comment - ben-manes/caffeine#434 (comment).

@rohitsanj Did you manually create this file or did you use the GraalVM tracing agent?

@rohitsanj
Copy link
Contributor Author

Did you manually create this file or did you use the GraalVM tracing agent?

@flippingbits, I manually created the file based on this Quarkus guide. I considered using the GraalVM reachability metadata repository since com.github.ben-manes.caffeine:caffeine has community-contributed metadata (see exhaustive list here).

@rohitsanj rohitsanj requested review from flippingbits and rhauch and removed request for flippingbits October 10, 2024 18:48
@rohitsanj
Copy link
Contributor Author

@rhauch May I get you to please take another look? Thanks 🙇

Copy link
Contributor

@rhauch rhauch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work, @rohitsanj. Let's get @flippingbits approval, too.

Can you also clarify what the impact will be on the UX with just this merged, but before #77 is merged? I'd rather us not get stuck with some functionality broken for more than a day or so.

@rohitsanj
Copy link
Contributor Author

rohitsanj commented Oct 10, 2024

Thank you!

Can you also clarify what the impact will be on the UX with just this merged, but before #77 is merged? I'd rather us not get stuck with some functionality broken for more than a day or so.

The partitions dropdown in the message viewer tabs for confluent local topics will have "All partitions" as the only option with no ability to filter down on topic partitions. The UX/functionality remains intact for CCloud topic message viewer tabs.

(Notice the expected 404 error log in bottom window)

Screenshot 2024-10-10 at 3 07 11 PM

@rohitsanj rohitsanj merged commit b239bbf into main Oct 11, 2024
1 check passed
@rohitsanj rohitsanj deleted the rohitsanj/implement-kafka-rest branch October 11, 2024 07:27
@rohitsanj rohitsanj added the enhancement New feature or request label Nov 23, 2024
@rohitsanj rohitsanj self-assigned this Nov 23, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request kafka-rest
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants