Skip to content
This repository has been archived by the owner on Jan 6, 2022. It is now read-only.

onion service scanning: check meta tags for onion-location #270

Merged
merged 8 commits into from
Oct 6, 2020

Conversation

redshiftzero
Copy link
Contributor

@redshiftzero redshiftzero commented Sep 14, 2020

Closes #269. Followup to #262.

This PR:

  • Changes onion-location to a nullable field. This is motivated primarily by wanting to have a way to distinguish between failed and successful scans, both for anyone who were to consume this data but also for the STN Twitter bot (that code is here).
  • Gets the page content using requests, parses the page content using lxml, then sees if there is an onion-location meta tag

The easiest way to test once on this branch is to:

  1. Ensure you have the test data added (so you have some sites to test on). Then run a scan: make dev-scan
  2. Visit the API: http://127.0.0.1:8000/api/v1/sites/
  3. For any sites that failed during the scan, they should have a null onion_available. Otherwise, Propublica should have onion_available as true, and techcrunch should have onion_available as false.

@redshiftzero redshiftzero changed the title [wip] onion service scanning: check meta tags for onion-location onion service scanning: check meta tags for onion-location Sep 18, 2020
@redshiftzero redshiftzero marked this pull request as ready for review September 18, 2020 20:33
Copy link
Contributor

@chigby chigby left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This generally seems great to me. A nice feature to have. I left a comment below about how many scans to run on the various URLs for a site, which you are free to consider and change if desired. Otherwise it's fine to merge as-is once this branch is rebased.

sites/management/commands/scan.py Outdated Show resolved Hide resolved
@redshiftzero
Copy link
Contributor Author

thanks for reviewing! the last commit (e8ec6bc) is the new change here, the prior commits are just rebasing on the latest

@chigby chigby self-assigned this Oct 5, 2020
@chigby
Copy link
Contributor

chigby commented Oct 5, 2020

@redshiftzero This latest change looks great. I can merge once this is up-to-date with the base branch.

@chigby chigby merged commit d767019 into freedomofpress:develop Oct 6, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

onion-location: check if onion-location is defined in meta tag
2 participants