Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow scanning sub dir within a larger search context #3213

Open
wagoodman opened this issue Sep 9, 2024 · 1 comment
Open

Allow scanning sub dir within a larger search context #3213

wagoodman opened this issue Sep 9, 2024 · 1 comment
Labels
enhancement New feature or request

Comments

@wagoodman
Copy link
Contributor

wagoodman commented Sep 9, 2024

What would you like to be added:
We have multiple issues that want to be able to search within a small space, but reference things outside of that space:

It would be nice to allow for something like this:

syft ./my/dir/project-1 --reference ./my/dir

Were I only want to catalog packages within ./my/dir/project-1 but I want to be able to reference file system material from a specific parent directory ./my/dir

In this way if there is a pom.xml in the project dir but the parent pom has required properties, we can reference that material to get the correct version. This applies to any ecosystem where manifests can reference other manifests in parent directories for correct resolution.

Today we have the --exclude flag to tailor the search space, but this doesn't help in a large set of cases (and may result in cumbersome number of flags or brittle configuration).

@wagoodman
Copy link
Contributor Author

We talked about a few different implementation options here, both from a user-facing level and lower API level.

  1. Incremental include: syft / --include /my/project/dir --include /etc . This essentially implies --exclude '**' which is easier than "start with / and --exclude several directores". This lets us build the index the way we do today, but allows for a few more knobs for the user to dial.
  2. Same as 1 but different syntax: syft /my/project/dir /etc where the common root is inferred from the set of given paths (this affects what "root" will be in the SBOM)
  3. Pair up file globs within catalogers with common known locations (e.g. **/dpkg/status is paired with /var/lib) so that we can ask the catalogers to expand the index before cataloging. This might cause more problems than it solves.
  4. Allow the catalogers to programmatically send requests via the file.Resolver to expand the index and search against subindexes

1 & 2 are relatively simple to add to syft but are not "automagical" in the sense that the user might need to know a lot about our catalogers today to put in just the right amount of paths to include.

3 introduces a new problem: how would catalogers add paths that are semantically useful but don't accidentally add paths that produce packages that otherwise would not have been found. That is, if the user is interested in /home/user/my/project and a cataloger adds /var/lib/dpkg to the index, it would be surprising the their project all the sudden includes system packages. There might be a workable path here but I don't quite see it yet.

4 is relatively difficult to implement, but overcomes most of the problems of 3. Take for example, the dpkg cataloger finds packages, but when searching for md5sum files in the the parent dir this comes back with no locations. At this point we could smuggle an index builder/requester through file.Resolver that could be invoked by the cataloger to add the info dir to the resolver. This would be something of the affect of indexer.Request(<Status file location object>.parent() + "/info").

There are several other considerations with 4 that are important:

  • Should index requests affect other catalogers? if so, what about catalogers that have already run and completed? There could be race conditions there.
  • Should index requests never result in more packages found? That is, should we adopt that convention? or have the indexer enforce this somehow?
  • What about files found by new indexes in relation to the file metadata cataloger -- we probably should make certain they are included in those results.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
Status: No status
Development

No branches or pull requests

1 participant