Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Potential Bug - Adding default value of null doesn't change schema encoder hash #449

Open
TheGreatAbyss opened this issue Sep 10, 2024 · 4 comments

Comments

@TheGreatAbyss
Copy link

Hello, and once again thank you so much for this library and your support of it.

I ran into what may or may not be a bug depending on what is considered a "valid" schema.

I created my initial subject with a union field in the schema, but by mistake I forgot to specify a default value:

...
{"type": ["null", "string"], "logicalType": "UUID", "name": "publication_id"},
...

When a map[interface] value is passed to the encoder but does not include a publication_id it will appropriately throw an error.

Later I realized my "mistake", and I wanted to change the application behavior so it sets a Null value for publication_id instead of throwing an error. I updated the subject and set a new schema_id with the below schema:

...
{"type": ["null", "string"], "logicalType": "UUID", "name": "publication_id", "default": None},
...

My application runs a background thread that calls GetLatestSchemaInfo every two minutes. To my surprise after the schema was updated I was still getting the avro encoding error about missing publication_id. I banged my head for a while trying to figure out what was wrong with my code, and then finally realized that there is an internal cache for the schema encoder.

The problem is that both schemas posted above hash to the same value. If my initial schema was as below it would have worked as expected as this hashes to something else:

...
{"type": "string", "logicalType": "UUID", "name": "publication_id"},
...

Is this a bug? Is there no valid use case for specifying a union of null and string without setting a default?

Thank You!

@nrwiersma
Copy link
Member

Technically speaking, the default does not apply to the encoder, and is used only as a convenience in this library. By spec, the default does not apply to writes, only read compatibility. This is why default is not included in the write hash.

From the spec:

default: A default value for this field, only used when reading instances that lack the field for schema evolution purposes. The presence of a default value does not make the field optional at encoding time.

@TheGreatAbyss
Copy link
Author

Thanks for your response.

"By spec, the default does not apply to writes, only read compatibility."

I guess I'm a bit confused because it does effect how the library behaves on write. Without default an error is thrown, with default it writes "publication_id":null,

@nrwiersma
Copy link
Member

This is the convenience mechanism I mentioned. While not mandated by spec, the library will use the default for writes if it exists.

@TheGreatAbyss
Copy link
Author

TheGreatAbyss commented Sep 10, 2024

It's not a major issue but if the library uses the default if it exists, then maybe it should be taken into account in the cache if the default changes (or is added in this case) as it does effect the behavior?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

2 participants