Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor matching process to be chained processors #1869

Open
wagoodman opened this issue May 20, 2024 · 2 comments
Open

Refactor matching process to be chained processors #1869

wagoodman opened this issue May 20, 2024 · 2 comments
Labels
enhancement New feature or request
Milestone

Comments

@wagoodman
Copy link
Contributor

Today the matching process is governed by the VulnerabilityMatcher object, which gives us a single place to control aspects of the matching process. The main FindMatches() function has been decomposed into ever-smaller functions that deal with smaller concerns of the matching process, which is ultimately good, however, each decomposition is bespoke in terms of what data it has access to and what the return signature is. This means that changes to the matching process may result in changing these function signatures, which isn't ideal.

Additionally, there is a common theme of the following return signature:

func somename(...) (remainingMatches *match.Matches, ignoredMatches []match.IgnoredMatch, err error) {}

...or similar variants. This return signature is at risk of growing for every new data element we want to track.

I have two changes I'd like to propose:

  • Add tracking of all kinds of matches to the match.Matches collection (e.g. IgnoredMatches is not tracked within this, any dropped matches, etc.) or add an additional match.Collection that incorporates matches and ignored matches together. Then provide methods for adding, removing, and accessing these match objects. This would allow us to keep a single object in function signatures that produce or change matches as well as give us a single place to track changes to these matches (such as publishing result counts on the bus to the TUI, leading to accurate counts being displayed).
  • Form the existing decomposed functions involved in matched to a single function signature and chain these methods together into a common pipeline. Any function that wishes to participate in adding/removing/changing any aspect of matching will need to adhere to the function signature and placed into the pipeline for processing. Each "processor" function should be as small in scope and responsibility as possible.
@wagoodman wagoodman added the enhancement New feature or request label May 20, 2024
@willmurphyscode willmurphyscode self-assigned this May 21, 2024
@willmurphyscode willmurphyscode moved this to Ready in OSS May 21, 2024
@willmurphyscode willmurphyscode moved this from Ready to In Progress in OSS May 22, 2024
@willmurphyscode
Copy link
Contributor

Posting some thoughts on how this refactor is going to go. This is refactor primarily about what's going on in https://github.com/anchore/grype/blob/main/grype/vulnerability_matcher.go, which has gotten sort of out of control.

There seem to be basically 5 separate concerns in this file:

  1. Building up the matcher configuration
  2. Telling the user what's going on during the slow matching process (updating package counts, logging)
  3. Gathering evidence (matchers, VEX docs, ignore rules, etc.)
  4. Repeatedly partitioning matches and ignored matches based on that evidence
  5. Normalizing by CVE is also in there, which is kind of a blend of 2 and 4.

This is a lot. After the refactor, I am hoping it will look more like this:

  1. There's a matcher builder in a separate file for building the match config.
  2. There's a clean UI interface in a separate file for reporting things to the user
    1. Secondarily, this should report events not counts because the counts always confuse everyone
  3. There's a new file with structs to represent a collection of evidence
  4. There's a new file that executes a pipeline of gathering evidence
  5. There's a new file that executes a reduction from gathered evidence to match decisions

Some of the transformations, like merging and normalizing (item 5 in the first list) should be moved onto the collection of matches itself (slightly complicated by their needing a data source).

@willmurphyscode
Copy link
Contributor

We'll do this during/after Grype DB Schema v6 #2128

@willmurphyscode willmurphyscode moved this from Stalled to Ready in OSS Oct 4, 2024
@willmurphyscode willmurphyscode removed their assignment Oct 4, 2024
@willmurphyscode willmurphyscode added this to the DB v6 milestone Oct 4, 2024
@wagoodman wagoodman modified the milestones: DB v6, Grype 1.0 Nov 7, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
Status: Ready
Development

No branches or pull requests

2 participants