Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

🕷️ Fix spider: Illinois Criminal Justice Information Authority #1127

Merged
merged 1 commit into from
Jul 16, 2024

Conversation

SimmonsRitchie
Copy link
Contributor

@SimmonsRitchie SimmonsRitchie commented Jul 16, 2024

What's this PR do?

Fixes our Illinois Criminal Justice Information Authority spider (aka. il_criminal_justice_information).

Why are we doing this?

The spider broke due to changes on the pages it's targeting. The changes in this PR ensure the scraper runs without error.

Steps to manually test

After installing the project using pipenv:

  1. Activate the virtual environment:
pipenv shell
  1. Run the spider:
scrapy crawl il_criminal_justice_information -O test_output.csv
  1. Monitor the stdout and ensure that the crawl proceeds without raising any errors. Pay attention to the final status report from scrapy.

  2. Inspect test_output.csv to ensure the data looks valid. I suggest opening a few of the URLs under the source column of test_output.csv and comparing the data for the row with what you see on the page.

Are there any smells or added technical debt to note?

  • This rebuilt spider request all meetings from the agency's GraphQL endpoint. It's possible the request could be optimized so we filter out meetings older than 90 days in the query but I haven't toyed with the agency's API enough to determine if that's possible. The current approach seemed decent enough.

@SimmonsRitchie SimmonsRitchie marked this pull request as ready for review July 16, 2024 19:40
@SimmonsRitchie SimmonsRitchie merged commit 7492ac4 into main Jul 16, 2024
2 checks passed
@SimmonsRitchie SimmonsRitchie deleted the fix-icjia-gql branch July 16, 2024 19:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant