Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consider a more cache-friendly db location #65

Closed
lread opened this issue Aug 1, 2024 · 11 comments
Closed

Consider a more cache-friendly db location #65

lread opened this issue Aug 1, 2024 · 11 comments
Labels
documentation Improvements or additions to documentation
Milestone

Comments

@lread
Copy link
Contributor

lread commented Aug 1, 2024

Currently

The default nvd database location is /tmp/db/cache

But...

This isn't terribly CI service-friendly when it comes to caching.

There are big advantages to caching the nvd database so that it can be updated with new entries rather than entirely re-downloaded. Updating:

  • greatly reduces time and resources for the scan on CI
  • saves the NVD data feeds from unnecessary traffic

Options

Some options I've thought of:

Option 1: Do nothing

Not interested in addressing this issue.

Option 2: Document only

Document a way for the user to change the database location.
I do see the potential configuration entry that might be overridden.

I suppose the user would do this via --dependency-check-properties cmd line arg.
(Would have to verify)

Option 3: Allow user to easily override

  1. Perhaps there is merit in promoting this option to its own command line arg?
  2. Or at least letting it be overridden via --clj-watson-properties

Option 4: Default to the default dependency check location

The default location is under the ~/.m2 repo alongside dependency check dep.
Example: /home/lee/.m2/repository/org/owasp/dependency-check-utils/10.0.3/data/9.0
(So, in other words, don't override the default for clj-watson).

  • Pro: More easily cached with .m2 deps (PR would document examples)
  • Pro: Db cache automatically invalidated with each dependency check lib bump, which could be overkill, but it also would ensure we don't run into db compatibility issues with a new db release.
  • Pro: If the user likes this default, they do not have to figure out the syntax to express this dynamic value.

Proposal

I like option 4.
This could be combined with Option 3.2 for relatively easy overriding.
I personally don't think /tmp is a great spot for this db in general - and especially for CI.

Next Steps

I am also happy to explore further.
If you agree we should take action, I volunteer to create a PR.

@seancorfield
Copy link
Contributor

I've often thought it's a bit odd that Watson completely overrides the DependencyCheck defaults instead of doing so selectively. I know it needs to override the ${pom.*} values but there's a maintenance overhead in trying to keep dependency-check.properties basically in sync with the dependencycheck.properties file in DependencyCheck itself.

I think I'd like to dig deeper into this and use the core lib's properties by default and only override things that Watson specifically needs to override and then, yes, option 4 could easily fall out of that and also 3.2 would just work automatically (clj-watson.properties is already intended to be the way to override any of the defaults -- but it is poorly-documented).

I think it would also make life easier to add an option that can specify individual property overrides (multi-option producing a vector).

@seancorfield seancorfield self-assigned this Aug 1, 2024
@seancorfield seancorfield added the needs analysis Further hammock time is required to figure out the best solution label Aug 1, 2024
@lread
Copy link
Contributor Author

lread commented Aug 1, 2024

Ya, agreed. If I understand nvd-clojure correctly it concerns itself only with dependency-check settings it is specifically interested in.

@lread
Copy link
Contributor Author

lread commented Aug 3, 2024

While testing, I was surprised that the entire nvd db was being downloaded again.
Then I remembered that an OS is free to clear out /tmp/.
My Linux box does so on startup.
So, another vote against /tmp/... being a default location; this one is not CI-specific.

@lread lread changed the title Consider CI cache-friendly support for db location Consider a more cache-friendly db location Aug 3, 2024
@seancorfield
Copy link
Contributor

I think this will fall out of the work in #66 and then it'll just need a readme update regarding the default location of the db and how to override it?

@lread
Copy link
Contributor Author

lread commented Aug 16, 2024

I think the fact that clj-watson is overriding a sensible DependencyCheck default for data.directory is a mistake.

I think we should document where the db is saved and how to cache it sensibly on CI.
I've worked this out for pomegranate and clj-yaml, so can help with this.

We could make a short mention of how to override the default db location (via system property or cljwatson.properties), but we could recommend against doing so. Taking inspiration from DependencyCheck docs on the configuration of the data directory:

... This should generally not be changed.

@seancorfield
Copy link
Contributor

Agreed. I notice right now that DependencyCheck has:

data.directory=[JAR]/data/9.0

So that has stayed the same across 9.x and 10.x -- and we need to decide whether to stop overriding it (my preference, since it would then become CI cache-friendly as part of .m2) or keep it as /tmp/db/ and emphasize that it isn't CI cache-friendly but here's how to change it...?

I guess the thing that needs to be verified is whether DependencyCheck will still correctly expand [JAR] if that property is passed in from a caller like clj-watson, as opposed to only handling it within its own dependencycheck.properties file?

@lread
Copy link
Contributor Author

lread commented Aug 16, 2024

I like your preference of clj-watson no longer overriding data.directory.
We could make a very brief mention of this property in the README.

[JAR] expands to more than I had originally guessed. By default when using dependency-check 10.0.3.

[JAR] becomes:

  • ~/.m2/repository/org/owasp/dependency-check-utils/10.0.3

[JAR]/data/9.0 becomes:

  • ~/.m2/repository/org/owasp/dependency-check-utils/10.0.3/data/9.0

So I don't think the 9.0 (whatever it means) is terribly important.

Example of override via system property:

clojure '-J-Ddata.directory=[JAR]/foobar' -J-Dnvd.api.key=<key here> -M:clj-watson scan -p deps.edn

And db is written to:

  • ~/.m2/repository/org/owasp/dependency-check-utils/10.0.3/foobar

@seancorfield
Copy link
Contributor

Ah, very nice! Shall we leave this open as a reminder for the documentation, after #66 is dealt with?

@seancorfield seancorfield removed their assignment Aug 16, 2024
@seancorfield seancorfield added documentation Improvements or additions to documentation and removed needs analysis Further hammock time is required to figure out the best solution labels Aug 16, 2024
@lread
Copy link
Contributor Author

lread commented Aug 16, 2024

We could deal with it directly and close it. I'd have to verify but, probably:

  1. Delete override in dependency-check.properties
  2. Document via changelog and readme

Then #66 could do the fuller and more general work.

Happy to take a crack at it.

@seancorfield seancorfield added this to the 6.0 milestone Aug 17, 2024
@seancorfield
Copy link
Contributor

PR #105 (really PR #106) fixes this I believe since it removes the override of data.directory and documents it.

@seancorfield
Copy link
Contributor

PR #106 has been merged.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation
Development

No branches or pull requests

2 participants