Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Vulns from REST sources may be missed due to timing #2670

Open
michaelkedar opened this issue Sep 25, 2024 · 3 comments
Open

Vulns from REST sources may be missed due to timing #2670

michaelkedar opened this issue Sep 25, 2024 · 3 comments
Assignees
Labels
bug Something isn't working

Comments

@michaelkedar
Copy link
Member

The importer uses last_update_date to compare both the Last-Modified of the HEAD request and the modified time of the individual records. last_update_date gets set to the time when the importer started processing the source.

This can cause an edge case where we miss a few updates:

00:10 - REST data source updates.
00:15 - Importer runs. Sets last_update_date to 00:15
00:20 - REST data source updates. Adds records with modified dates between 00:10 - 00:20.
00:30 - Importer runs. Processes the vulnerabilities with modified dates > 00:15 (last_update_date).

Here, the importer won't detect the vulnerabilities from the REST source that were modified between 00:10 - 00:15.

@michaelkedar michaelkedar added the bug Something isn't working label Sep 25, 2024
@andrewpollock
Copy link
Contributor

last_update_date gets set to the time when the importer started processing the source

That's at the end of the run, not the start. Does that change your hypothesis at all?

@michaelkedar
Copy link
Member Author

last_update_date gets set to the time when the importer started processing the source

That's at the end of the run, not the start. Does that change your hypothesis at all?

The value is set at the start, but either way, no - there'd be an issue as long as last_import_date is set anywhere between the two REST updates.

@michaelkedar michaelkedar self-assigned this Sep 26, 2024
michaelkedar added a commit that referenced this issue Sep 30, 2024
…#2672)

Use the `Last-Modified` header (or the latest `modified` vuln date if
the header is missing) to set the REST SourceRepository's
`last_update_date`, instead of using the time when the importer ran.

Should alleviate #2670 for well-behaved REST sources, though it still
doesn't handle cases where the source itself adds a record with date
before the previous updated date.

While I was here, I also tidied up the `ignore_last_import_time` logic a
bit (because the long line splitting was annoying).
@michaelkedar
Copy link
Member Author

I've got #2672 on production now, which should help prevent this if the source is well-behaved.

I think we also should be doing some periodic complete reimports of all our sources to catch anything that might be missed for whatever reason.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants