Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(vrl): add parse_dnstap function #21985

Merged
merged 17 commits into from
Jan 2, 2025

Conversation

esensar
Copy link
Contributor

@esensar esensar commented Dec 9, 2024

Summary

This adds parse_dnstap function, which should produce the exact same output as the dnstap source. While it is possible to parse dnstap data manually using parse_proto, this ensures that the exact same format is used. This makes it possible to delay dnstap parsing using dnstap source, by making it produce raw data and then conditionally parsing it in transforms further down the line. It also makes it possible to parse dnstap data from other sources.

More context: #21985 (comment)

Change Type

  • Bug fix
  • New feature
  • Non-functional (chore, refactoring, docs)
  • Performance

Is this a breaking change?

  • Yes
  • No

How did you test this PR?

I have built vector in a container and ran it alongside coredns. I have connected them using dnstap source in tcp mode and made the source produce only raw data. I have used the following vector config:

sources:
  dnstap_tcp:
    type: "dnstap"
    mode: "tcp"
    address: "0.0.0.0:59001"
    raw_data_only: true

transforms:
  dnstap_thingy:
    type: "remap"
    inputs: ["dnstap_tcp"]
    source: |
      . = parse_dnstap!(.rawData)
      .

sinks:
  console:
    inputs: ["dnstap_tcp", "dnstap_thingy"]
    target: "stdout"
    type: "console"
    encoding:
      codec: "json"

Does this PR include user facing changes?

  • Yes. Please add a changelog fragment based on our guidelines.
  • No. A maintainer will apply the "no-changelog" label to this PR.

Checklist

  • Please read our Vector contributor resources.
  • If this PR introduces changes Vector dependencies (modifies Cargo.lock), please
    run dd-rust-license-tool write to regenerate the license inventory and commit the changes (if any). More details here.

This adds `parse_dnstap` function, which should produce the exact same output as the `dnstap`
source. While it is possible to parse dnstap data manually using `parse_proto`, this ensures that
the exact same format is used. This makes it possible to delay dnstap parsing using `dnstap` source,
by making it produce raw data and then conditionally parsing it in transforms further down the line.
It also makes it possible to parse dnstap data from other sources.
@esensar esensar requested review from a team as code owners December 9, 2024 09:55
@github-actions github-actions bot added domain: sources Anything related to the Vector's sources domain: transforms Anything related to Vector's transform components domain: external docs Anything related to Vector's external, public documentation labels Dec 9, 2024
@esensar
Copy link
Contributor Author

esensar commented Dec 9, 2024

I didn't add benchmarks for this, since I can see that there are dnstap benchmarks for the parse function and this really just forwards the data to it.

src/conditions/vrl.rs Outdated Show resolved Hide resolved
Copy link
Member

@pront pront left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, this looks much better now.

lib/dnstap-parser/build.rs Outdated Show resolved Hide resolved
lib/dnstap-parser/Cargo.toml Outdated Show resolved Hide resolved
lib/dnstap-parser/build.rs Outdated Show resolved Hide resolved
@github-actions github-actions bot added the domain: vdev Anything related to the vdev tooling label Dec 12, 2024
@pront
Copy link
Member

pront commented Dec 12, 2024

https://github.com/vectordotdev/vector/actions/runs/12293554193/job/34323760812?pr=21985

I think we need to update buf.yaml to include the new proto location.

@esensar
Copy link
Contributor Author

esensar commented Dec 12, 2024

Updated. I just needed to add the new location, right?

@pront
Copy link
Member

pront commented Dec 12, 2024

Updated. I just needed to add the new location, right?

Thanks, I think so yes. Let's see if the check passes now.

@esensar
Copy link
Contributor Author

esensar commented Dec 12, 2024

I am not sure how to handle this new error now. It seems like this is now reporting an issue because this is treated like 4 different modules instead of 3 that were available previously? Not sure in which ways rest of the proto files are used, but is this a false positive?

@pront
Copy link
Member

pront commented Dec 12, 2024

Hmm, see bufbuild/buf#567.
You can try locally with:

buf breaking --against  "https://github.com/vectordotdev/vector.git#branch=master" --debug

Also, https://github.com/vectordotdev/vector/blob/master/.github/workflows/protobuf.yml#L5-L7 is missing two paths.

@pront
Copy link
Member

pront commented Dec 12, 2024

The check is failing due to the move proto/third-party/dnstap.protolib/dnstap-parser/proto/dnstap.proto.

This is the relevant rule: https://buf.build/docs/breaking/rules/#file_no_delete

If you cannot figure out a workaround, I am not opposed to reverting the move.

@esensar
Copy link
Contributor Author

esensar commented Dec 13, 2024

The check is failing due to the move proto/third-party/dnstap.protolib/dnstap-parser/proto/dnstap.proto.

This is the relevant rule: https://buf.build/docs/breaking/rules/#file_no_delete

If you cannot figure out a workaround, I am not opposed to reverting the move.

Moving parser back into main module would make the function inaccessible in vector-vrl/web-playground and vector-vrl/tests. I am not sure if this exact issue should be treated as a breaking change, but then again, I am not sure in which ways the protobuf modules are used, so I can't tell if it is okay.

Would it make more sense to have the protobuf workflow only track vector proto files? Or is it important to track these third-party ones too?

@esensar
Copy link
Contributor Author

esensar commented Dec 13, 2024

I have ignored the dnstap.proto file in the checker. Let me know if this makes sense to you.

Also, https://github.com/vectordotdev/vector/blob/master/.github/workflows/protobuf.yml#L5-L7 is missing two paths.

I think it is just missing the new one, since it has both proto/vector and proto/third-party covered with proto/**, but if ignoring the new one is okay, then we don't need to add it there.

@pront
Copy link
Member

pront commented Dec 13, 2024

I am thinking that ignoring third-party protos is the wrong thing to do but also this is not a breaking change for Vector users. It is just a file move, contents are identical. We could remove the ignore and merge this despite the buf check failing.

Note: Starting today, I will be traveling for a couple of weeks. So apologies for delays in reviewing.

@esensar
Copy link
Contributor Author

esensar commented Dec 13, 2024

I am thinking that ignoring third-party protos is the wrong thing to do but also this is not a breaking change for Vector users. It is just a file move, contents are identical. We could remove the ignore and merge this despite the buf check failing.

Note: Starting today, I will be traveling for a couple of weeks. So apologies for delays in reviewing.

Alright, thanks for the heads up. Do you want me to revert the last change then?

@pront
Copy link
Member

pront commented Dec 13, 2024

Alright, thanks for the heads up. Do you want me to revert the last change then?

Yes, we can just revert the changes to buf.yaml.

@pront pront enabled auto-merge December 29, 2024 19:42
auto-merge was automatically disabled December 30, 2024 12:18

Head branch was pushed to by a user without write access

@johnhtodd
Copy link

For historical purposes I'll put a bit more discussion here: This function allows larger-scale ingestion of dnstap messages, because the "sample" and "throttle" functions can now be used before the dnstap parsing happens. This is important because the dnstap parsing is actually quite heavy, and with hundreds of thousands of events per second, a single Vector node can be easily overwhelmed before the event even passes into a place where functions can be applied. By ingesting the message as protobuf (and not as "dnstap") it is possible to apply sample and throttle functions to the raw, unprocessed data and then selectively expanding the messages into full dnstap events after a smaller set has been filtered. This permits higher-throughput downsampling of events, though of course there is no method to examine what events are chosen and what are discarded until after the sampling is done. If there are critical events that need to be processed at 100%, our suggested model would be to send those from the origin system with a different port number and ingest them as dnstap (as a source) instead of as protobuf, and treat that flow of data entirely differently than others which may be downsampled or throttled.

@pront pront enabled auto-merge January 2, 2025 15:09
@jszwedko
Copy link
Member

jszwedko commented Jan 2, 2025

It looks like the protobuf compatibility tests are still failing here:

Failure: input contained 4 images, whereas against contained 3 images

@pront
Copy link
Member

pront commented Jan 2, 2025

It looks like the protobuf compatibility tests are still failing here:

Failure: input contained 4 images, whereas against contained 3 images

Yes, due to the move as explained here: #21985 (comment).

Based on the discussions here, I think we can force merge this if all other checks are OK.

@jszwedko
Copy link
Member

jszwedko commented Jan 2, 2025

It looks like the protobuf compatibility tests are still failing here:

Failure: input contained 4 images, whereas against contained 3 images

Yes, due to the move as explained here: #21985 (comment).

Based on the discussions here, I think we can force merge this if all other checks are OK.

Aha, I missed that discussion. 👍

@pront pront disabled auto-merge January 2, 2025 20:47
@pront pront merged commit 792aeea into vectordotdev:master Jan 2, 2025
40 of 43 checks passed
pront added a commit that referenced this pull request Jan 2, 2025
* feat(vrl): add `parse_dnstap` function

This adds `parse_dnstap` function, which should produce the exact same output as the `dnstap`
source. While it is possible to parse dnstap data manually using `parse_proto`, this ensures that
the exact same format is used. This makes it possible to delay dnstap parsing using `dnstap` source,
by making it produce raw data and then conditionally parsing it in transforms further down the line.
It also makes it possible to parse dnstap data from other sources.

* Add changelog entry

* Move DNSTAP parsing code to `lib/dnstap-parser`

* Add a better error on failed proto compilation for `dnstap-parser`

Co-authored-by: Pavlos Rontidis <[email protected]>

* Update lib/dnstap-parser/build.rs

Co-authored-by: Pavlos Rontidis <[email protected]>

* Move anyhow to workspace dependencies

* Update `parse_dnstap` query parse example

* Fix base64 regex pattern for spelling check action

* Update `buf.yaml` with new dnstap proto location

* Conditionally add `dnstap_parser` vrl functions if feature is enabled

* Ignore `dnstap.proto` in protobuf breaking checker

* Revert "Ignore `dnstap.proto` in protobuf breaking checker"

This reverts commit 2c1aea8.

* Revert "Update `buf.yaml` with new dnstap proto location"

This reverts commit 9b25854.

* add proto to buf.yaml

* cue fmt

* Fix failing VRL and docs tests for `parse_dnstap`

---------

Co-authored-by: Pavlos Rontidis <[email protected]>
jszwedko added a commit that referenced this pull request Jan 2, 2025
jszwedko added a commit that referenced this pull request Jan 2, 2025
Revert "feat(vrl): add `parse_dnstap` function (#21985)"

This reverts commit 792aeea.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
domain: external docs Anything related to Vector's external, public documentation domain: sources Anything related to the Vector's sources domain: transforms Anything related to Vector's transform components domain: vdev Anything related to the vdev tooling
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants