Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Relying on SCRAPY_SETTINGS_MODULE isn't sufficient #401

Open
honzajavorek opened this issue Feb 13, 2025 · 1 comment
Open

Relying on SCRAPY_SETTINGS_MODULE isn't sufficient #401

honzajavorek opened this issue Feb 13, 2025 · 1 comment
Assignees
Labels
t-tooling Issues with this label are in the ownership of the tooling team.

Comments

@honzajavorek
Copy link
Contributor

I don't use SCRAPY_SETTINGS_MODULE, so the code hangs (because sys.exit(), see #389 as it is directly related).

The ways to configure Scrapy are documented, and SCRAPY_SETTINGS_MODULE is one of many, and not even the one which is being used by default or mentioned first in the docs.

I use scrapy.cfg to specify the settings module, because then I can use both the standalone scrapy crawl CLI as well as read the settings in Python code and integrate with Apify. This proves useful especially when debugging the Apify integration, because I can run the same scraper with and without Apify, or with and without any custom code.

To prevent the program from hanging, I'd have to now remember to always set SCRAPY_SETTINGS_MODULE in my environment, but its value is always the same and it's not a secret, so this feels unnecessary. Yes, doable by direnv, .env files, mise, and so on, but those are advanced and not very straightforward practices. As of now, anyone who starts a vanilla Scrapy project using scrapy.cfg will end up with a program which hangs.

As a workaround, I can do the following:

os.environ["SCRAPY_SETTINGS_MODULE"] = "jg.plucker.settings"
async with Actor:
    logger.info("Hello")

Ugly, but it does prevent the sys.exit() to fire and hang the program. But if I want to keep using the CLI, I must also keep scrapy.cfg containing the same information.

@github-actions github-actions bot added the t-tooling Issues with this label are in the ownership of the tooling team. label Feb 13, 2025
@honzajavorek
Copy link
Contributor Author

I think the only way to detect the code is a part of a Scrapy project might be calling get_project_settings() 🤔

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
t-tooling Issues with this label are in the ownership of the tooling team.
Projects
None yet
Development

No branches or pull requests

2 participants