Description
I don't use SCRAPY_SETTINGS_MODULE
, so the code hangs (because sys.exit()
, see #389 as it is directly related).
The ways to configure Scrapy are documented, and SCRAPY_SETTINGS_MODULE
is one of many, and not even the one which is being used by default or mentioned first in the docs.
I use scrapy.cfg
to specify the settings module, because then I can use both the standalone scrapy crawl
CLI as well as read the settings in Python code and integrate with Apify. This proves useful especially when debugging the Apify integration, because I can run the same scraper with and without Apify, or with and without any custom code.
To prevent the program from hanging, I'd have to now remember to always set SCRAPY_SETTINGS_MODULE
in my environment, but its value is always the same and it's not a secret, so this feels unnecessary. Yes, doable by direnv
, .env
files, mise
, and so on, but those are advanced and not very straightforward practices. As of now, anyone who starts a vanilla Scrapy project using scrapy.cfg
will end up with a program which hangs.
As a workaround, I can do the following:
os.environ["SCRAPY_SETTINGS_MODULE"] = "jg.plucker.settings"
async with Actor:
logger.info("Hello")
Ugly, but it does prevent the sys.exit()
to fire and hang the program. But if I want to keep using the CLI, I must also keep scrapy.cfg
containing the same information.