Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[pgmq] Validate messages using pg_jsonschema #5

Open
dlight opened this issue Mar 20, 2023 · 1 comment
Open

[pgmq] Validate messages using pg_jsonschema #5

dlight opened this issue Mar 20, 2023 · 1 comment
Labels
enhancement New feature or request

Comments

@dlight
Copy link

dlight commented Mar 20, 2023

I see that jsonschema is already used in the repo (but to the best of my ability I can't find it being used in pgmq), and I want to point out that pgmq could use pg_jsonschema (which is just a thin wrapper on the jsonschema crate) to prevent adding malformed messages to the queue.

I think that it would be useful if each pgmq deployment could optionally have a constraint on the message jsonb column of the queue using an user-provided json schema. It should also be possible to update the constraint with a new json schema, but maybe only if the queue is empty.

The archive is a more delicate matter because it is expected that the json schema evolves as new kinds of messages appear, but old messages aren't supposed to always follow the new schema. In this case my preferred solution would be to add a "type" column with a string value containing a type, and have the constraint check the message using a json schema that corresponds to the type in question (this works better if the type is generated by something like obake).

If pausing the queue for new events isn't desirable when updating the json schema of the messages, then maybe also adding a type column to the queue table makes sense too (with the understanding that new events must be added with the last type only, but other types are permitted to stay in the queue awaiting processing). This also would enable clients to avoid attempting to receiving a message if they don't support a particular type.

Well all of this adds complexity and not everyone would want to validate their messages in the database. But, I think it's worth it anyway.

Another concern is that validating json schemas might be too slow. I don't think it is in practice (jsonschema is very fast) but there are still low hanging fruits in pg_jsonschema, like this issue regarding using a cache to avoid re-parsing the json schema every time it is checked.

@ChuckHend ChuckHend added the enhancement New feature or request label Mar 21, 2023
@ChuckHend
Copy link
Member

This is a great idea @dlight. I agree having it as an optional is would be really nice. Regarding the evolution of the schema -- one way I've seen that handled in other queue-based projects is to version the queues - if the schema changes, then create a new queue. There is probably a more elegant way to handle that though. We could also implement it as a completely new type of queue - like a ConstrainedQueue?

Do you have a fork or anything that could give us an idea what this might look like?

@ChuckHend ChuckHend transferred this issue from tembo-io/tembo Jul 24, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants