You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I think this is fine, though as the number of spiders increases, we might want to increase it further when we're running commands like latestreleasedate or dryrun.
CONCURRENT_REQUESTS_PER_DOMAIN: 2 (default 8)
I think some sources can handle a lot more concurrent requests than this. It would be nice for sources that can handle more to complete faster. We can maybe leave it at the default in settings.py, and lower it in individual spiders as needed. (Right now we set CONCURRENT_REQUESTS to 1 in a few cases.) If some servers can do more than 8, we can also increase it. We don't need to be super accurate, though.
DOWNLOAD_TIMEOUT: 360 (6 minutes)
I set this based on the performance of the scrapy dryrun command. In short, it needs to be high enough that slow servers can respond, but low enough that a spider doesn't wait a very long time for a non-response.
Notably:
latestreleasedate
ordryrun
.CONCURRENT_REQUESTS
to 1 in a few cases.) If some servers can do more than 8, we can also increase it. We don't need to be super accurate, though.scrapy dryrun
command. In short, it needs to be high enough that slow servers can respond, but low enough that a spider doesn't wait a very long time for a non-response.Reference: https://docs.scrapy.org/en/latest/topics/settings.html
The text was updated successfully, but these errors were encountered: