Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add option to make routes exclusive #17040

Open
zpriddy opened this issue Apr 3, 2023 · 8 comments · May be fixed by #20904 or #21707
Open

Add option to make routes exclusive #17040

zpriddy opened this issue Apr 3, 2023 · 8 comments · May be fixed by #20904 or #21707
Labels
needs: approval Needs review & approval before work can begin. type: feature A value-adding code addition that introduce new functionality.

Comments

@zpriddy
Copy link

zpriddy commented Apr 3, 2023

A note for the community

  • Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
  • If you are interested in working on this issue or have submitted a pull request, please leave a comment

I will first start by saying that I think the route documentation needs to be updated to mention that routes are not exclusive and will send events to all matching routes. This is not documented at all. In trying to debug an issue, I ran across #9014 that informed me that the routes are not exclusive.

I do believe that there is a need for being able to make routes exclusive as an option. I think that It would be fairly easy to add a config flag of exclusive_routes that defaults to false but when set to true it would stop looking for more routes on the first match.

Use Cases

There are many times where there are similar events with slight differences that I route to different transforms for remapping, and there are also generic event types that act as a catchall for select formats of events, as well as a default transform for things like JSON or Text events.. Usually I try to write the routes in the order of first match would be the correct route.. But then at times the generic route events can also pick the same event and send it as a duplicate events.

For example lets say im getting GCP logs and I have a transform for vpc_flow and another one for gcp_json_event as well as generic_json, Right now I would either have to make three routers taking the unmatched from each and feeding into the next.. or write match conditions that would exclude the already matched events.. This isnt bad when there are only a few different routes.. But when there are 50+ different routes, those match conditions would get rather lengthy.. Or could be many different routes that are hard to follow..

Adding an exclusive_routes option would allow someone to take responsibility for making sure their routes are in the correct order, and the route transform would stop after sending the event to the first match in the list of routes and not sending them to other transforms that may have matched lower down that would have created similar or duplicate events to the one that they were intending on matching on.

Attempted Solutions

No response

Proposal

I dont know if this is accurate as I am still new to Rust, but I think this solution mat work..

https://github.com/vectordotdev/vector/blob/master/src/transforms/route.rs#L35

impl SyncTransform for Route {
    fn transform(
        &mut self,
        event: Event,
        output: &mut vector_core::transform::TransformOutputsBuf,
    ) {
        let mut check_failed: usize = 0;
        for (output_name, condition) in &self.conditions {
            let (result, event) = condition.check(event.clone());
            if result {
                output.push_named(output_name, event);
                if self.exclusive_routes {
                    break;
                }
            } else {
                check_failed += 1;
            }
        }
        if check_failed == self.conditions.len() {
            output.push_named(UNMATCHED_ROUTE, event);
        }
    }
}

References

No response

Version

0.28.1

@zpriddy zpriddy added the type: feature A value-adding code addition that introduce new functionality. label Apr 3, 2023
@neuronull neuronull added the needs: approval Needs review & approval before work can begin. label Apr 3, 2023
@jszwedko
Copy link
Member

Discussed this briefly today. A couple of thoughts:

  • We could add an algorithm option, rather than a boolean, that would allow it to be extensible to additional routing algorithms in the future. To start we would have match_all and match_one (or something similar)
  • In order to support ordering in the config file we'd need to have the routes be an array rather than a map (supporting both would be useful for compatibility). Alternatively we could add an additional option to each route indicating a "weight" or "priority".

@smitthakkar96
Copy link

smitthakkar96 commented Jul 4, 2023

Another use case would be for someone trying to migrate exclusion filters from Datadog. We create routes with filters and add a sample transform with different rate on each individual routes. When doing exclusion filtering using this approach we need a way to ensure events aren't duplicated one possible way would be to have something like exclusive routes. @jszwedko any updates on this? Is it likely to get prioritised near future? If not are you open to contributions for this issue (since the issue is tagged as needs approval)?

@jszwedko
Copy link
Member

jszwedko commented Jul 5, 2023

Hey!

We are open to contributions here, but given there is some ambiguity about what the configuration should look like, I think it'd be good to settle on that before implementation. Seemingly we'd need to switch the routes from a map of key/value to a list to support ordering so it knows which route to pick if multiple match. The proposal could just be a comment on this issue.

You might already have considered this, but it is possible to workaround the lack of this using a remap transform with conditionals to assign a field that is later used in a route transform. #6731 (comment) is a similar enough example that you can probably imagine what it would look like. Let me know if that doesn't make sense.

@smitthakkar96
Copy link

@jszwedko thanks we will go with the suggested workaround for now, do you think there will be any performance implications of this workaround?

@jszwedko
Copy link
Member

jszwedko commented Aug 2, 2023

@jszwedko thanks we will go with the suggested workaround for now, do you think there will be any performance implications of this workaround?

I don't anticipate that there would be much overhead with the additional remap transform.

@smitthakkar96
Copy link

smitthakkar96 commented Aug 3, 2023

Seemingly we'd need to switch the routes from a map of key/value to a list to support ordering so it knows which route to pick if multiple match

Can we use something like indexmap to preserve insertion order? This way, we can avoid having to introduce any breaking changes or extra configurations such as weight or priority.

We could add an algorithm option, rather than a boolean, that would allow it to be extensible to additional routing algorithms in the future. To start we would have match_all and match_one (or something similar)

I like the idea of introducing a var called routing_algorithm because as you mentioned, this would make the solution more extendible.

Let me know your thoughts

@jszwedko
Copy link
Member

jszwedko commented Aug 3, 2023

Seemingly we'd need to switch the routes from a map of key/value to a list to support ordering so it knows which route to pick if multiple match

Can we use something like indexmap to preserve insertion order? This way, we can avoid having to introduce any breaking changes or extra configurations such as weight or priority.

It is possible, but I hesitate to add ordering guarantees to maps since the configuration formats we have, in particular YAML and JSON, don't currently have a notion of ordering when it comes to maps. Thus it could end up being confusing to users or result in subtle bugs if they are using other tools to generate or transform Vector configs that don't preserve ordering.

Thus I think it'd be better to use an array to define the routes or add a weight parameter to each route that indicates the ordering.

@davidpellcb
Copy link

we'd need to have the routes be an array rather than a map

All of this would be extremely helpful in my current use case which involves dynamically updating thousands of transform files on an interval of ~5m based on per-service volume for dynamic sampling. Currently have to have one router per service with logs having to needlessly pass through so many routers before getting to the one they match instead of just going through a single top-level list on a main router.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
needs: approval Needs review & approval before work can begin. type: feature A value-adding code addition that introduce new functionality.
Projects
None yet
5 participants