-
Notifications
You must be signed in to change notification settings - Fork 88
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
question: can i read arbitrary avro messages? #134
Comments
I need the same thing, and it looks like #86 outlines a way to enhance gogen-avro to do it. I'm contemplating taking that on, but can't ATM. |
This is a little ambiguous, I can see two different scenarios:
For scenario 1, gogen-avro is never going to be appropriate. Something like the (https://github.com/linkedin/goavro) will give you the In scenario 2, gogen-avro may be appropriate. gogen-avro is designed to work in two passes:
If your application logic needs to be updated anyways to handle new records/fields, then your development workflow would involve updating the application and generating new structs for the updated schemas at the same time. This would give you type safety when you're building out new features based on new records/fields. Issue #86 specifically is about interacting with the Confluent schema registry as a source of the schema to pass to I know this is a little abstract, if you can provide more detail or an example of the desired API (something in-between generated structs and |
I'm not the originator of this issue, but I'd like to provide my perspective. I'm not particularly interested in mapping to any arbitrary schema. I'm interested in evolving a schema. I expect the records on a particular topic to be one of the known versions of that schema and to be able to take the record, determine which version of the schema it maps to and deserialize it. If it's not a recognized version of that schema, then the reader should not allow it to continue. I think this is doable with the current structure of gogen-avro. I hope that's helpful.. |
@actgardner thankyou for your response! I really appreciate the attention =) My scenario is definitely scenario 1, except that I'm happy to generate an application tied to a particular schema e.g. the service itself doesn't need to read arbitray avro at run time, and I would love to avail myself of the type safety guarantees having a well defined struct brings. I also can use a well defined struct downstream (in our case one example is writing to Parquet files) in a streamlined and rapid way that I can't do with arbitrary Right now we're exploring generating a .go file outside of the compile step using gogen-avro and storing it with the avro schema in its own little git repo. We then manually modify it to follow a convention e.g. But the upsides are:
Does this make sense? |
This sounds like the workflow I imagined - keeping the schemas and the generated code in a git repo you use as a submodule. This is probably the simplest way to make sure there's a generated struct for each schema (you can use pre-push hooks, etc. to enforce it as well) that can be shared across projects.
There's a lot of hidden "contracts" within the generated code where renaming or moving things might break it in unexpected ways. Hopefully we can find a solution that doesn't involve manually modifying the generated code.
So if I understand right the issue is the amount of manual tweaking involved in adapting the archiver service to each new schema? One thing I've been considering is adding code to generate what I would call a "de-multiplexer" for a set of schemas - it would take some framed data (either Single-Object Encoding or Confluent Schema Registry in the future) and match it to the appropriate schema. The result would be an An example of this in action for Single-Object Encoding:
Would this be suitable? You could then hand the resulting |
So I think this makes sense: a general purpose I'm less bothered about the Confluent registry than I was 9 days ago, as a colleague of mine has very patiently pointed out we can't rely on it existing, being reachable, or populated, at build time. So i think it appropriate that we get the schema before compilation one way or another, and not worry too much about where it comes from, at least from the point of view of consumers. Would be happy to help if I can! |
Just a heads up, I've added support for generic data types in the most recent major release. This might serve your needs, let me know: https://github.com/actgardner/gogen-avro/blob/master/README.md#generic-data |
Is the support for generic data still in beta or we can rely on working with it ? cc @actgardner |
I'd like to read arbitrary avro messages from Kafka, where the topic is specified at build time, and the avro schema is pulled from our schema registry.
I thought I'd almost got it by using the
compiler.CompileSchemaBytes(schema, schema)
whereschema
is retrieved from the registry and thenvm.Eval
to deserialise the message.However I couldn't figure out a way of persuading
vm.Eval
to accept something vague for its target argument. From what I can tell I have to specify a target usingNew<RecordType>()
which is tricky as I don't know<RecordType>
ahead of time.To explain more, I could change the generator (let's pretend i could do this) to create NewRecord() without using the
<RecordType>
in the name of the function and then I could calland I think I'd have what I need. However this feels wrong and I imagine isn't something that you guys would want in your codebase.
What am I missing? Is there a better way of achieving the above? Or should i just suffer the indignity of using
map[string]interface{}
even though I've a perfectly good schema sitting in the registry?The text was updated successfully, but these errors were encountered: