Refactor matching process to be chained processors #1869

wagoodman · 2024-05-20T15:25:17Z

Today the matching process is governed by the VulnerabilityMatcher object, which gives us a single place to control aspects of the matching process. The main FindMatches() function has been decomposed into ever-smaller functions that deal with smaller concerns of the matching process, which is ultimately good, however, each decomposition is bespoke in terms of what data it has access to and what the return signature is. This means that changes to the matching process may result in changing these function signatures, which isn't ideal.

Additionally, there is a common theme of the following return signature:

func somename(...) (remainingMatches *match.Matches, ignoredMatches []match.IgnoredMatch, err error) {}

...or similar variants. This return signature is at risk of growing for every new data element we want to track.

I have two changes I'd like to propose:

Add tracking of all kinds of matches to the match.Matches collection (e.g. IgnoredMatches is not tracked within this, any dropped matches, etc.) or add an additional match.Collection that incorporates matches and ignored matches together. Then provide methods for adding, removing, and accessing these match objects. This would allow us to keep a single object in function signatures that produce or change matches as well as give us a single place to track changes to these matches (such as publishing result counts on the bus to the TUI, leading to accurate counts being displayed).
Form the existing decomposed functions involved in matched to a single function signature and chain these methods together into a common pipeline. Any function that wishes to participate in adding/removing/changing any aspect of matching will need to adhere to the function signature and placed into the pipeline for processing. Each "processor" function should be as small in scope and responsibility as possible.

The text was updated successfully, but these errors were encountered:

willmurphyscode · 2024-08-01T22:00:17Z

Posting some thoughts on how this refactor is going to go. This is refactor primarily about what's going on in https://github.com/anchore/grype/blob/main/grype/vulnerability_matcher.go, which has gotten sort of out of control.

There seem to be basically 5 separate concerns in this file:

Building up the matcher configuration
Telling the user what's going on during the slow matching process (updating package counts, logging)
Gathering evidence (matchers, VEX docs, ignore rules, etc.)
Repeatedly partitioning matches and ignored matches based on that evidence
Normalizing by CVE is also in there, which is kind of a blend of 2 and 4.

This is a lot. After the refactor, I am hoping it will look more like this:

There's a matcher builder in a separate file for building the match config.
There's a clean UI interface in a separate file for reporting things to the user
1. Secondarily, this should report events not counts because the counts always confuse everyone
There's a new file with structs to represent a collection of evidence
There's a new file that executes a pipeline of gathering evidence
There's a new file that executes a reduction from gathered evidence to match decisions

Some of the transformations, like merging and normalizing (item 5 in the first list) should be moved onto the collection of matches itself (slightly complicated by their needing a data source).

willmurphyscode · 2024-10-04T15:12:57Z

We'll do this during/after Grype DB Schema v6 #2128

wagoodman added the enhancement New feature or request label May 20, 2024

anchoretoolsops added this to OSS May 20, 2024

willmurphyscode self-assigned this May 21, 2024

willmurphyscode moved this to Ready in OSS May 21, 2024

willmurphyscode moved this from Ready to In Progress in OSS May 22, 2024

willmurphyscode moved this from Stalled to Ready in OSS Oct 4, 2024

willmurphyscode removed their assignment Oct 4, 2024

willmurphyscode added this to the DB v6 milestone Oct 4, 2024

wagoodman modified the milestones: DB v6, Grype 1.0 Nov 7, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactor matching process to be chained processors #1869

Refactor matching process to be chained processors #1869

wagoodman commented May 20, 2024

willmurphyscode commented Aug 1, 2024

willmurphyscode commented Oct 4, 2024

Refactor matching process to be chained processors #1869

Refactor matching process to be chained processors #1869

Comments

wagoodman commented May 20, 2024

willmurphyscode commented Aug 1, 2024

willmurphyscode commented Oct 4, 2024