Increasing timeout ratios for CVE API requests #1393

cmm-lyft · 2024-11-29T16:23:39Z

Summary

Based on tests conducted semi-automated, recently https://services.nvd.nist.gov/rest/json/cves/2.0/ response time has increased up to 100,000 ms, and the latest executions have raised 503 error, responding after 20 to 30 retries with 3 seconds intervals:

Failed to get CVE data from NIST NVD API 503 : <html><body><h1>503 Service Unavailable</h1> No server is available to handle this request. </body></html>

Increasing the CONNECT_AND_READ_TIMEOUT to 120 seconds, the MAX_RETRIES to 30 and including sleep time on Exception to mitigate the insidense of 503 error.

The max execution time of the request logic (worst case scenario) would be of 7 minutes (120 of request connection, 120 of request timeout, 30 retries with 6 sleep time as max).

Checklist

Provide proof that this works (this makes reviews move faster). Please perform one or more of the following:

Update/add unit or integration tests.
Include a screenshot showing what the graph looked like before and after your changes.
Include console log trace showing what happened before and after your changes.

If you are changing a node or relationship:

Update the schema and readme.

If you are implementing a new intel module:

Use the NodeSchema data model.

cmm-lyft · 2024-11-29T16:24:48Z

👀 @achantavy @heryxpc @serge-wq @khanhldt

kledo-lyft · 2024-12-02T17:39:30Z

cartography/intel/cve/feed.py

+MAX_RETRIES = 30
+# Connect and read timeouts of 120 seconds each; see https://requests.readthedocs.io/en/master/user/advanced/#timeouts
+CONNECT_AND_READ_TIMEOUT = (120, 120)


503 is not related to (connect, read) timeout config, no? Any reason to do those changes?

ideally, i would suggest to use http adapter with backoff. If that's complex to change, at least try to do similar thing in code (doubling the sleep time every next attempt).

I thought the same, but executing tests I observed that is the current behavior.
You can manually try with this link: https://services.nvd.nist.gov/rest/json/cves/2.0/
execute the request several times from your browser (even without passing credentials / token), and you will reproduce that 503 error.
the response is super slow, and in certain cases (from my tests) it keeps returning the error even after +20 retries:

I think we can implement a backoff strategy for this one.

I see. It looks like a timeout that occurred on their reverse proxy or load balancer. The configuration we have here is for the client side to cut off the connection, I suspect it make a difference.

I tried the curl command and it failed with 503 fairly fast, then 1 time it successfully waited for 10 seconds before returning the whole data. Usually connect timeout doesn't need to be very long, while read timeout can be longer to wait for the data to come back. Still maybe you don't really need to do those changes. If you really wish so, I would suggest something like (30, 120) instead.

And yes, backoff retry is better because if everyone keeps bombarding their service won't be able to recover.

kledo-lyft · 2024-12-04T20:50:28Z

cartography/intel/cve/feed.py

+            # Exponential backoff
+            sleep_time *= 2
+            time.sleep(sleep_time)


so a total of 3 * 2^5 or 6 * 2^5 seconds when API key is provided?

That seems low. Do we need higher number?

why 2^5?
condition at L99 is >= , so it will actually iterate over 6.
3 * 2^6 = 192
3 * 2^6 = 384
I think those times are high enough. 👀

heh, let's try to enumerate it.
it's incremented by 1 before the check and stopped when the value reaches 6

retries variable value -> sleep time 1 -> 3 2 -> 6 3 -> 12 4 -> 24 5 -> 48 6 -> stopped (hit >= MAX_RETRIES)

cmm-lyft · 2024-12-06T02:58:36Z

👀 @achantavy

Signed-off-by: cmm-lyft <[email protected]>

…ion (cartography-cncf#1392) ### Summary Correcting error: ``` File "/code/venvs/venv/lib/python3.10/site-packages/cartography/intel/aws/ec2/network_acls.py", line 72, in transform_network_acl_data 'CidrBlock': rule['CidrBlock'], KeyError: 'CidrBlock' ``` ### Checklist Provide proof that this works (this makes reviews move faster). Please perform one or more of the following: - [ ] Update/add unit or integration tests. - [ ] Include a screenshot showing what the graph looked like before and after your changes. - [ ] Include console log trace showing what happened before and after your changes. If you are changing a node or relationship: - [ ] Update the [schema](https://github.com/lyft/cartography/tree/master/docs/root/modules) and [readme](https://github.com/lyft/cartography/blob/master/docs/schema/README.md). If you are implementing a new intel module: - [ ] Use the NodeSchema [data model](https://cartography-cncf.github.io/cartography/dev/writing-intel-modules.html#defining-a-node). Signed-off-by: cmm-lyft <[email protected]>

Signed-off-by: cmm-lyft <[email protected]>

### Summary Migrating the ASG sync to use the data model. Nothing else was changed in the sync aside from creating the model and using it during the sync. ### Checklist Provide proof that this works (this makes reviews move faster). Please perform one or more of the following: - [x] Update/add unit or integration tests. - [ ] Include a screenshot showing what the graph looked like before and after your changes. - [ ] Include console log trace showing what happened before and after your changes. If you are changing a node or relationship: - [ ] Update the [schema](https://github.com/lyft/cartography/tree/master/docs/root/modules) and [readme](https://github.com/lyft/cartography/blob/master/docs/schema/README.md). If you are implementing a new intel module: - [ ] Use the NodeSchema [data model](https://cartography-cncf.github.io/cartography/dev/writing-intel-modules.html#defining-a-node). --------- Signed-off-by: Sergio Franco <[email protected]> Signed-off-by: cmm-lyft <[email protected]>

kledo-lyft reviewed Dec 2, 2024

View reviewed changes

cmm-lyft requested a review from kledo-lyft December 4, 2024 20:00

kledo-lyft reviewed Dec 4, 2024

View reviewed changes

cmm-lyft requested a review from kledo-lyft December 4, 2024 21:36

kledo-lyft approved these changes Dec 4, 2024

View reviewed changes

achantavy approved these changes Dec 6, 2024

View reviewed changes

cmm-lyft and others added 8 commits December 6, 2024 11:48

Increasing timeout ratios for CVE API requests

0d2dded

Signed-off-by: cmm-lyft <[email protected]>

update tests

67162c9

Signed-off-by: cmm-lyft <[email protected]>

Adding exponential backoff

e17689d

Signed-off-by: cmm-lyft <[email protected]>

fixing max retries to 8

ad61e4a

Signed-off-by: cmm-lyft <[email protected]>

fixing unit tests

7a8d6cc

Signed-off-by: cmm-lyft <[email protected]>

fixing unit tests

d2a4cbe

Signed-off-by: cmm-lyft <[email protected]>

cmm-lyft force-pushed the time_sleep_nvd_retry branch from 31aa342 to 5309729 Compare December 6, 2024 17:48

Merge branch 'master' into time_sleep_nvd_retry

89b01ea

achantavy merged commit f842546 into cartography-cncf:master Dec 6, 2024
7 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Increasing timeout ratios for CVE API requests #1393

Increasing timeout ratios for CVE API requests #1393

cmm-lyft commented Nov 29, 2024 •

edited

Loading

cmm-lyft commented Nov 29, 2024

kledo-lyft Dec 2, 2024

cmm-lyft Dec 2, 2024

cmm-lyft Dec 2, 2024

kledo-lyft Dec 2, 2024

kledo-lyft Dec 4, 2024

cmm-lyft Dec 4, 2024

kledo-lyft Dec 4, 2024

cmm-lyft commented Dec 6, 2024

Increasing timeout ratios for CVE API requests #1393

Increasing timeout ratios for CVE API requests #1393

Conversation

cmm-lyft commented Nov 29, 2024 • edited Loading

Summary

Checklist

cmm-lyft commented Nov 29, 2024

kledo-lyft Dec 2, 2024

Choose a reason for hiding this comment

cmm-lyft Dec 2, 2024

Choose a reason for hiding this comment

cmm-lyft Dec 2, 2024

Choose a reason for hiding this comment

kledo-lyft Dec 2, 2024

Choose a reason for hiding this comment

kledo-lyft Dec 4, 2024

Choose a reason for hiding this comment

cmm-lyft Dec 4, 2024

Choose a reason for hiding this comment

kledo-lyft Dec 4, 2024

Choose a reason for hiding this comment

cmm-lyft commented Dec 6, 2024

cmm-lyft commented Nov 29, 2024 •

edited

Loading