-
Notifications
You must be signed in to change notification settings - Fork 92
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Apply schema validation to all topics in local development #2013
Changes from all commits
b2da267
76f2ded
9480347
13301a0
6d484e3
4b38eab
66a9f7d
2b0f28c
226cd44
cb2d0dd
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.
Original file line number | Diff line number | Diff line change | ||||||
---|---|---|---|---|---|---|---|---|
@@ -1,3 +1,5 @@ | ||||||||
#[cfg(debug_assertions)] | ||||||||
use std::cell::RefCell; | ||||||||
use std::collections::{BTreeMap, HashMap}; | ||||||||
use std::fmt; | ||||||||
use std::sync::Arc; | ||||||||
|
@@ -8,11 +10,16 @@ use relay_statsd::metric; | |||||||
use thiserror::Error; | ||||||||
|
||||||||
use crate::config::{KafkaConfig, KafkaParams, KafkaTopic}; | ||||||||
#[cfg(debug_assertions)] | ||||||||
use crate::producer::schemas::Validator; | ||||||||
use crate::statsd::KafkaHistograms; | ||||||||
|
||||||||
mod utils; | ||||||||
use utils::{CaptureErrorContext, ThreadedProducer}; | ||||||||
|
||||||||
#[cfg(debug_assertions)] | ||||||||
mod schemas; | ||||||||
|
||||||||
/// Kafka producer errors. | ||||||||
#[derive(Error, Debug)] | ||||||||
pub enum ClientError { | ||||||||
|
@@ -36,6 +43,11 @@ pub enum ClientError { | |||||||
#[error("failed to serialize json message")] | ||||||||
InvalidJson(#[source] serde_json::Error), | ||||||||
|
||||||||
/// Failed to run schema validation on message. | ||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Can this be conditional?
Suggested change
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. yes, it has to be, fixed |
||||||||
#[cfg(debug_assertions)] | ||||||||
#[error("failed to run schema validation on message")] | ||||||||
SchemaValidationFailed(#[source] schemas::SchemaError), | ||||||||
|
||||||||
/// Configuration is wrong and it cannot be used to identify the number of a shard. | ||||||||
#[error("invalid kafka shard")] | ||||||||
InvalidShard, | ||||||||
|
@@ -126,6 +138,8 @@ impl fmt::Debug for ShardedProducer { | |||||||
#[derive(Debug)] | ||||||||
pub struct KafkaClient { | ||||||||
producers: HashMap<KafkaTopic, Producer>, | ||||||||
#[cfg(debug_assertions)] | ||||||||
schema_validator: RefCell<schemas::Validator>, | ||||||||
} | ||||||||
|
||||||||
impl KafkaClient { | ||||||||
|
@@ -142,6 +156,11 @@ impl KafkaClient { | |||||||
message: &impl Message, | ||||||||
) -> Result<(), ClientError> { | ||||||||
let serialized = message.serialize()?; | ||||||||
#[cfg(debug_assertions)] | ||||||||
self.schema_validator | ||||||||
.borrow_mut() | ||||||||
.validate_message_schema(topic, &serialized) | ||||||||
.map_err(ClientError::SchemaValidationFailed)?; | ||||||||
let key = message.key(); | ||||||||
self.send(topic, organization_id, &key, message.variant(), &serialized) | ||||||||
} | ||||||||
|
@@ -273,6 +292,8 @@ impl KafkaClientBuilder { | |||||||
pub fn build(self) -> KafkaClient { | ||||||||
KafkaClient { | ||||||||
producers: self.producers, | ||||||||
#[cfg(debug_assertions)] | ||||||||
schema_validator: Validator::default().into(), | ||||||||
} | ||||||||
} | ||||||||
} | ||||||||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if it would make sense to hide the
jsonschema
dependency insentry-kafka-schemas
. That is, if we removed this dependency here, andsentry-kafka-schemas
exposed avalidate(topic, message)
function.That way, the library client (Relay) would not have to deal with compiling schemas, and the way that
sentry-kafka-schemas
validates would be completely encapsulated (i.e. you could change the library or even the format it uses for schema definition and validation).There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See my reply to the local cache comment. For now I would like to avoid that. There's still many unsolved questions around how the workflow in Relay will look like, what performance requirements we have on the entire system, etc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK! Definitely not a blocker.
Makes sense, though this could be solved with cargo features. I.e. there could be a
codegen
feature and avalidation
feature, and clients use either-or. My gut feeling is that 90% of clients that want to perform validation will not care about what schema definition language or validation library is used under the hood, hence the idea to hide it inside the library.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i'm reconsidering my response now. one of the reasons i was hesitant to change this was because python currently also works the same (it just exposes the raw schema, doesn't prescribe a schema validation library)
but at the same time it would be good to have a vetted recommended library that everybody ought to use (for both python and rust)