Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Review overridden settings in settings.py #461

Open
jpmckinney opened this issue Jul 22, 2020 · 1 comment
Open

Review overridden settings in settings.py #461

jpmckinney opened this issue Jul 22, 2020 · 1 comment
Labels
settings Relating to how we configure settings
Milestone

Comments

@jpmckinney
Copy link
Member

Notably:

  • CONCURRENT_REQUESTS: 32 (default 16)
    • I think this is fine, though as the number of spiders increases, we might want to increase it further when we're running commands like latestreleasedate or dryrun.
  • CONCURRENT_REQUESTS_PER_DOMAIN: 2 (default 8)
    • I think some sources can handle a lot more concurrent requests than this. It would be nice for sources that can handle more to complete faster. We can maybe leave it at the default in settings.py, and lower it in individual spiders as needed. (Right now we set CONCURRENT_REQUESTS to 1 in a few cases.) If some servers can do more than 8, we can also increase it. We don't need to be super accurate, though.
  • DOWNLOAD_TIMEOUT: 360 (6 minutes)
    • I set this based on the performance of the scrapy dryrun command. In short, it needs to be high enough that slow servers can respond, but low enough that a spider doesn't wait a very long time for a non-response.

Reference: https://docs.scrapy.org/en/latest/topics/settings.html

@jpmckinney jpmckinney added the framework Relating to other common functionality label Jul 22, 2020
@yolile yolile added this to the Priority milestone Mar 3, 2021
@jpmckinney jpmckinney added settings Relating to how we configure settings and removed framework Relating to other common functionality labels Sep 1, 2021
@jpmckinney
Copy link
Member Author

Moving from #655: See if some sources support greater concurrency

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
settings Relating to how we configure settings
Projects
None yet
Development

No branches or pull requests

2 participants