Skip to content

Relying on SCRAPY_SETTINGS_MODULE isn't sufficient #401

Closed
@honzajavorek

Description

@honzajavorek

I don't use SCRAPY_SETTINGS_MODULE, so the code hangs (because sys.exit(), see #389 as it is directly related).

The ways to configure Scrapy are documented, and SCRAPY_SETTINGS_MODULE is one of many, and not even the one which is being used by default or mentioned first in the docs.

I use scrapy.cfg to specify the settings module, because then I can use both the standalone scrapy crawl CLI as well as read the settings in Python code and integrate with Apify. This proves useful especially when debugging the Apify integration, because I can run the same scraper with and without Apify, or with and without any custom code.

To prevent the program from hanging, I'd have to now remember to always set SCRAPY_SETTINGS_MODULE in my environment, but its value is always the same and it's not a secret, so this feels unnecessary. Yes, doable by direnv, .env files, mise, and so on, but those are advanced and not very straightforward practices. As of now, anyone who starts a vanilla Scrapy project using scrapy.cfg will end up with a program which hangs.

As a workaround, I can do the following:

os.environ["SCRAPY_SETTINGS_MODULE"] = "jg.plucker.settings"
async with Actor:
    logger.info("Hello")

Ugly, but it does prevent the sys.exit() to fire and hang the program. But if I want to keep using the CLI, I must also keep scrapy.cfg containing the same information.

Metadata

Metadata

Assignees

Labels

t-toolingIssues with this label are in the ownership of the tooling team.

Type

No type

Projects

No projects

Relationships

None yet

Development

No branches or pull requests

Issue actions