[Schemas] Handle name conflicts #1660

hariso · 2024-06-14T16:18:54Z

Part of #1560.

Currently, our API allows connector developers to specify a schema name to be used. Given that Conduit's schema registry is shared by multiple connectors and pipelines, we need to handle name conflicts.

In other words: different connectors should be allowed to use same schema names, but that shouldn't have any side effects (such as one connector modify the schema from another connector).

This might be useful: Use Schema Contexts in Confluent Platform.

Pull requests:

hariso · 2024-06-17T16:13:29Z

Our team discussed this today. Here are the solutions we discussed and the conclusion.

TL;DR

Conduit will make sure all schema names are unique by adding the connector ID and/or the pipeline ID to the name that a connector developer provides (which is going to be the collection name in most cases).

Long version:

Possible solutions

Make every name unique
1a. Connector developers provide a name, that is "made unique" by Conduit by adding a prefix/suffix.
1b. Connector developers don't provide a name.
Use "namespaces" (each connector gets one)

Discussion:

Make every name unique
1a. Connector developers provide a name. Conduit "makes it unique" by adding a prefix/suffix.
Pros: makes the connector code a bit more clear (by showing what a schema is referring to)
Cons: The actual name is different. The original name is not valid anymore.

1b. Connector developers don't provide a name. Conduit generates a random/unique name.
Pros: Simple implementation.
Cons: The schema registry internally is not well organized. This can done in a limited way by having structured names (e.g. pipeline ID + connector ID + schema name). The actual name is different. The original name is not valid anymore.
Use "namespaces" (each connector gets one)
Confluent's SR has schema contexts. Works more or less like a prefix.

Pros: intuitive way to organize schemas, easier cleanup
Cons: the franz-go client doesn't support contexts as "first class citizens". What CURRENTLY can be done is to change the base URL, but that would mean one client per connector. We might also want to change the client to support schema contexts.

Conclusion

We're choosing 1a for the following reasons:

It's possible to dictate a structure of the IDs, which makes it easier to identify which schemas belong to which pipelines, which in turns makes cleanup easier.
A connector developer's involvement is minimal.

While it does require some care on a connector developer's behalf, because the actual schema name is different, it's still not a big problem, because the parameter name and docs will call it out.

hariso · 2024-07-17T11:58:21Z

The mentioned solution relies on a connector being able to identify themselves (the combination of the pipeline/connector ID and the name that a developer provided guarantees schema subject uniqueness). Tokens can be used for that. Lovro wrote down some thoughts how to do that: #1701 (comment)

hariso · 2024-07-18T13:16:19Z

@lovromazgon and I were discussing the implementation of this. There are a few points:

We're not quite happy with schema.Create() returning a schema with a different name, but there's no good solution.
Eventually, we'd like to organizes schemas into contexts.
There might be cases where a user (not necessarily a connector developer) might want to use an existing schema from an external schema registry. Prefixing the subject name with the connector ID always (as in the proposed solution) makes that impossible. So we're going to make that configurable.
A user will be able to:
- specify a custom prefix
- use the default prefix (connector ID)
- use no prefix at all (subject name is exactly as in the connector code, e.g. the collection name)
The above will be make possible through 2 configuration parameters:
- one to enable the prefix
- one to specify the prefix
The default behavior is to use the connector ID as the prefix
Now the plot twist: we'll use context instead of prefix since we plan to organize schemas into contexts in future.

hariso mentioned this issue Jun 14, 2024

Schema support #1560

Closed

17 tasks

simonl2002 added this to Conduit Main Jun 14, 2024

github-project-automation bot moved this to Triage in Conduit Main Jun 14, 2024

hariso moved this from Triage to Todo in Conduit Main Jun 14, 2024

lovromazgon mentioned this issue Jul 11, 2024

Forward connector utilities address and token to plugin executable ConduitIO/conduit-connector-protocol#112

Merged

4 tasks

lovromazgon assigned hariso Jul 22, 2024

lovromazgon moved this from Todo to In Progress in Conduit Main Jul 22, 2024

This was referenced Jul 23, 2024

[Schemas] Handle name conflicts #1718

Merged

[Schemas] Handle name conflicts ConduitIO/conduit-connector-sdk#153

Merged

Rename environment variable ConduitIO/conduit-connector-protocol#115

Merged

hariso moved this from In Progress to In Review in Conduit Main Jul 24, 2024

hariso closed this as completed in #1718 Jul 31, 2024

github-project-automation bot moved this from In Review to Done in Conduit Main Jul 31, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Schemas] Handle name conflicts #1660

[Schemas] Handle name conflicts #1660

hariso commented Jun 14, 2024 •

edited

Loading

hariso commented Jun 17, 2024

hariso commented Jul 17, 2024

hariso commented Jul 18, 2024

[Schemas] Handle name conflicts #1660

[Schemas] Handle name conflicts #1660

Comments

hariso commented Jun 14, 2024 • edited Loading

hariso commented Jun 17, 2024

hariso commented Jul 17, 2024

hariso commented Jul 18, 2024

hariso commented Jun 14, 2024 •

edited

Loading