-
Notifications
You must be signed in to change notification settings - Fork 355
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Increasing timeout ratios for CVE API requests #1393
Increasing timeout ratios for CVE API requests #1393
Conversation
cartography/intel/cve/feed.py
Outdated
MAX_RETRIES = 30 | ||
# Connect and read timeouts of 120 seconds each; see https://requests.readthedocs.io/en/master/user/advanced/#timeouts | ||
CONNECT_AND_READ_TIMEOUT = (120, 120) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
503 is not related to (connect, read) timeout config, no? Any reason to do those changes?
ideally, i would suggest to use http adapter with backoff. If that's complex to change, at least try to do similar thing in code (doubling the sleep time every next attempt).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought the same, but executing tests I observed that is the current behavior.
You can manually try with this link: https://services.nvd.nist.gov/rest/json/cves/2.0/
execute the request several times from your browser (even without passing credentials / token), and you will reproduce that 503 error.
the response is super slow, and in certain cases (from my tests) it keeps returning the error even after +20 retries:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can implement a backoff strategy for this one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see. It looks like a timeout that occurred on their reverse proxy or load balancer. The configuration we have here is for the client side to cut off the connection, I suspect it make a difference.
I tried the curl
command and it failed with 503 fairly fast, then 1 time it successfully waited for 10 seconds before returning the whole data. Usually connect timeout doesn't need to be very long, while read timeout can be longer to wait for the data to come back. Still maybe you don't really need to do those changes. If you really wish so, I would suggest something like (30, 120) instead.
And yes, backoff retry is better because if everyone keeps bombarding their service won't be able to recover.
# Exponential backoff | ||
sleep_time *= 2 | ||
time.sleep(sleep_time) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so a total of 3 * 2^5 or 6 * 2^5 seconds when API key is provided?
That seems low. Do we need higher number?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why 2^5?
condition at L99 is >= , so it will actually iterate over 6.
3 * 2^6 = 192
3 * 2^6 = 384
I think those times are high enough. 👀
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
heh, let's try to enumerate it.
it's incremented by 1 before the check and stopped when the value reaches 6
retries variable value -> sleep time
1 -> 3
2 -> 6
3 -> 12
4 -> 24
5 -> 48
6 -> stopped (hit >= MAX_RETRIES)
Signed-off-by: cmm-lyft <[email protected]>
Signed-off-by: cmm-lyft <[email protected]>
…ion (cartography-cncf#1392) ### Summary Correcting error: ``` File "/code/venvs/venv/lib/python3.10/site-packages/cartography/intel/aws/ec2/network_acls.py", line 72, in transform_network_acl_data 'CidrBlock': rule['CidrBlock'], KeyError: 'CidrBlock' ``` ### Checklist Provide proof that this works (this makes reviews move faster). Please perform one or more of the following: - [ ] Update/add unit or integration tests. - [ ] Include a screenshot showing what the graph looked like before and after your changes. - [ ] Include console log trace showing what happened before and after your changes. If you are changing a node or relationship: - [ ] Update the [schema](https://github.com/lyft/cartography/tree/master/docs/root/modules) and [readme](https://github.com/lyft/cartography/blob/master/docs/schema/README.md). If you are implementing a new intel module: - [ ] Use the NodeSchema [data model](https://cartography-cncf.github.io/cartography/dev/writing-intel-modules.html#defining-a-node). Signed-off-by: cmm-lyft <[email protected]>
Signed-off-by: cmm-lyft <[email protected]>
Signed-off-by: cmm-lyft <[email protected]>
Signed-off-by: cmm-lyft <[email protected]>
Signed-off-by: cmm-lyft <[email protected]>
### Summary Migrating the ASG sync to use the data model. Nothing else was changed in the sync aside from creating the model and using it during the sync. ### Checklist Provide proof that this works (this makes reviews move faster). Please perform one or more of the following: - [x] Update/add unit or integration tests. - [ ] Include a screenshot showing what the graph looked like before and after your changes. - [ ] Include console log trace showing what happened before and after your changes. If you are changing a node or relationship: - [ ] Update the [schema](https://github.com/lyft/cartography/tree/master/docs/root/modules) and [readme](https://github.com/lyft/cartography/blob/master/docs/schema/README.md). If you are implementing a new intel module: - [ ] Use the NodeSchema [data model](https://cartography-cncf.github.io/cartography/dev/writing-intel-modules.html#defining-a-node). --------- Signed-off-by: Sergio Franco <[email protected]> Signed-off-by: cmm-lyft <[email protected]>
31aa342
to
5309729
Compare
Summary
Based on tests conducted semi-automated, recently https://services.nvd.nist.gov/rest/json/cves/2.0/ response time has increased up to 100,000 ms, and the latest executions have raised 503 error, responding after 20 to 30 retries with 3 seconds intervals:
Failed to get CVE data from NIST NVD API 503 : <html><body><h1>503 Service Unavailable</h1> No server is available to handle this request. </body></html>
Increasing the
CONNECT_AND_READ_TIMEOUT
to 120 seconds, the MAX_RETRIES to 30 and including sleep time on Exception to mitigate the insidense of 503 error.The max execution time of the request logic (worst case scenario) would be of 7 minutes (120 of request connection, 120 of request timeout, 30 retries with 6 sleep time as max).
Checklist
Provide proof that this works (this makes reviews move faster). Please perform one or more of the following:
If you are changing a node or relationship:
If you are implementing a new intel module: