-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(vrl): add parse_dnstap
function
#21985
Conversation
This adds `parse_dnstap` function, which should produce the exact same output as the `dnstap` source. While it is possible to parse dnstap data manually using `parse_proto`, this ensures that the exact same format is used. This makes it possible to delay dnstap parsing using `dnstap` source, by making it produce raw data and then conditionally parsing it in transforms further down the line. It also makes it possible to parse dnstap data from other sources.
I didn't add benchmarks for this, since I can see that there are dnstap benchmarks for the parse function and this really just forwards the data to it. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, this looks much better now.
Co-authored-by: Pavlos Rontidis <[email protected]>
Co-authored-by: Pavlos Rontidis <[email protected]>
https://github.com/vectordotdev/vector/actions/runs/12293554193/job/34323760812?pr=21985 I think we need to update |
Updated. I just needed to add the new location, right? |
Thanks, I think so yes. Let's see if the check passes now. |
I am not sure how to handle this new error now. It seems like this is now reporting an issue because this is treated like 4 different modules instead of 3 that were available previously? Not sure in which ways rest of the proto files are used, but is this a false positive? |
Hmm, see bufbuild/buf#567.
Also, https://github.com/vectordotdev/vector/blob/master/.github/workflows/protobuf.yml#L5-L7 is missing two paths. |
The check is failing due to the move This is the relevant rule: https://buf.build/docs/breaking/rules/#file_no_delete If you cannot figure out a workaround, I am not opposed to reverting the move. |
Moving parser back into main module would make the function inaccessible in Would it make more sense to have the |
I have ignored the
I think it is just missing the new one, since it has both |
I am thinking that ignoring third-party protos is the wrong thing to do but also this is not a breaking change for Vector users. It is just a file move, contents are identical. We could remove the ignore and merge this despite the Note: Starting today, I will be traveling for a couple of weeks. So apologies for delays in reviewing. |
Alright, thanks for the heads up. Do you want me to revert the last change then? |
Yes, we can just revert the changes to |
Head branch was pushed to by a user without write access
For historical purposes I'll put a bit more discussion here: This function allows larger-scale ingestion of dnstap messages, because the "sample" and "throttle" functions can now be used before the dnstap parsing happens. This is important because the dnstap parsing is actually quite heavy, and with hundreds of thousands of events per second, a single Vector node can be easily overwhelmed before the event even passes into a place where functions can be applied. By ingesting the message as protobuf (and not as "dnstap") it is possible to apply sample and throttle functions to the raw, unprocessed data and then selectively expanding the messages into full dnstap events after a smaller set has been filtered. This permits higher-throughput downsampling of events, though of course there is no method to examine what events are chosen and what are discarded until after the sampling is done. If there are critical events that need to be processed at 100%, our suggested model would be to send those from the origin system with a different port number and ingest them as dnstap (as a source) instead of as protobuf, and treat that flow of data entirely differently than others which may be downsampled or throttled. |
It looks like the protobuf compatibility tests are still failing here:
|
Yes, due to the move as explained here: #21985 (comment). Based on the discussions here, I think we can force merge this if all other checks are OK. |
Aha, I missed that discussion. 👍 |
* feat(vrl): add `parse_dnstap` function This adds `parse_dnstap` function, which should produce the exact same output as the `dnstap` source. While it is possible to parse dnstap data manually using `parse_proto`, this ensures that the exact same format is used. This makes it possible to delay dnstap parsing using `dnstap` source, by making it produce raw data and then conditionally parsing it in transforms further down the line. It also makes it possible to parse dnstap data from other sources. * Add changelog entry * Move DNSTAP parsing code to `lib/dnstap-parser` * Add a better error on failed proto compilation for `dnstap-parser` Co-authored-by: Pavlos Rontidis <[email protected]> * Update lib/dnstap-parser/build.rs Co-authored-by: Pavlos Rontidis <[email protected]> * Move anyhow to workspace dependencies * Update `parse_dnstap` query parse example * Fix base64 regex pattern for spelling check action * Update `buf.yaml` with new dnstap proto location * Conditionally add `dnstap_parser` vrl functions if feature is enabled * Ignore `dnstap.proto` in protobuf breaking checker * Revert "Ignore `dnstap.proto` in protobuf breaking checker" This reverts commit 2c1aea8. * Revert "Update `buf.yaml` with new dnstap proto location" This reverts commit 9b25854. * add proto to buf.yaml * cue fmt * Fix failing VRL and docs tests for `parse_dnstap` --------- Co-authored-by: Pavlos Rontidis <[email protected]>
Summary
This adds
parse_dnstap
function, which should produce the exact same output as thednstap
source. While it is possible to parse dnstap data manually usingparse_proto
, this ensures that the exact same format is used. This makes it possible to delay dnstap parsing usingdnstap
source, by making it produce raw data and then conditionally parsing it in transforms further down the line. It also makes it possible to parse dnstap data from other sources.More context: #21985 (comment)
Change Type
Is this a breaking change?
How did you test this PR?
I have built vector in a container and ran it alongside
coredns
. I have connected them usingdnstap
source intcp
mode and made the source produce only raw data. I have used the following vector config:Does this PR include user facing changes?
Checklist
Cargo.lock
), pleaserun
dd-rust-license-tool write
to regenerate the license inventory and commit the changes (if any). More details here.