Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to push avro schemas/protocols such that it registry uses references and doesn't embed everything into definition #5573

Open
lsegv opened this issue Nov 22, 2024 · 3 comments

Comments

@lsegv
Copy link

lsegv commented Nov 22, 2024

I want to build a custom registry image based on apicurio, this image would preload all my schemas enforcing them (since i will disable pushing artifacts by services). There is one big problem though.

If i simply loop over all my AVSC files (that i generate from AVDL) and upload them one by one then registry does no processing on top of them and does not recognise when 2 messages were imported reused, because the content of AVSC is literally inlined to be self contained.

I started looking for api that would allow me to upload protocol files AVDL or AVPR, but i found no such thing.

When i use a kafka producer/consumer example and let it push the definitions on the go i see that this pushes messages properly (e.g. uses references), but actual code that does this does a lot of work, it figures out all those references and uploads them correctly to registry.

Problem is i cant just let arbitrary java code run during build stage (or at least i dont want to), ideally registry should allow for importing a protocol and then properly store all those definitions and references.

What can i do here other than grabbing the code from kafka serializer? maybe there is an api i do not know about?

@apicurio-bot
Copy link

apicurio-bot bot commented Nov 22, 2024

Thank you for reporting an issue!

Pinging @jsenko to respond or triage.

@EricWittmann
Copy link
Member

What are your expectations for how your schemas are laid out locally? Do you have control over that such that you could maintain some extra metadata?

Another place to look for this type of thing is in our maven plugin. Especially this part:

https://github.com/Apicurio/apicurio-registry/blob/main/utils/maven-plugin/src/main/java/io/apicurio/registry/maven/RegisterRegistryMojo.java#L130-L138

What do you mean precisely when you say this?

this image would preload all my schemas enforcing them

I'd like to better understand your use-case/goals.

There is not currently a way to send a bunch of related stuff to registry all at once and have it automatically figure out the details (with references and all that). It is something we've discussed, but not yet implemented. Could be an opportunity to collaborate on something like that if you are interested.

@lsegv
Copy link
Author

lsegv commented Nov 25, 2024

Regarding "extra metadata" yes we have full control over whatever we are doing with avro, i assume you mean metadata in the message definitions themselves?

The use case is like this: in production we do not want to let different services to push their message definitions at will, which means schema registry will have to be in its correct state (with all versions of schemas) when its running, there are several reasons for that.

  1. we have 400+ message types, and devs dont want to let all that happen at runtime, they would rather prefer to know that once the SR is started all schemas are there ready to be queried.

  2. we have build pipeline tasks that run current avro schemas committed in current branch against "official" schema registry for that target environment, and if it detects violation on compatibility requirements it will fail the build with proper messages.

  3. packing all specific versions of schemas in the custom docker image of schema registry means we know exactly what messages were in use (and what was the last version) at the time things were running for that specific release.

It all basically boils down to us wanting to have deterministic behaviour with registry, if we are rolling back we can just grab an older image and know that a service that shouldnt have known about v4 of some message wont magically figure it out because it was pushed in registry earlier... we can simply wipe the db, deploy the specific docker image and have exact same behaviour.

in dev/qa we let devs go wild and push however they like, in prod services cant do that, messages should be preloaded by the docker image at stratup.

Would be happy to collaborate, i think all of my problems are solved if we can come up with a new rest endpoint that lets us import AVDL or AVPR files. I'll be glad to contribute as well, just need some pointers on how you'd approach this.

P.S. the unions not working was because we were importing all types directly and not using references, once i let kafka clients register in an empty registry all unions worked fine.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants