-
-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add exception for blocked sites in search()
#34
Comments
Also worth noting: in the v2
It’d be good to implement this in a way that supports both. |
Some more quick notes:
So, I think we probably don’t need to worry too much about other possible exception types or more generic handling based on the |
Searching for a blocked URL used to raise a relatively uninformative and generic `WaybackException` error. Now it raises a `BlockedSiteError`. Fixes #34.
`WaybackClient.search()` and `WaybackClient.get_memento()` now raise `BlockedSiteError` any time you request a URL that has been blocked from access (for example, in situations where the Internet Archive has received a takedown notice). These previously would have resulted in different (and more generic, less informative) errors depending on which method you called. Now blocked URLs always cause the same error across this library. Fixes #34.
Looking at @edsu’s very awesome COVID-19 notebook, it turns out CDX searches can return a special error for blocked sites, e.g. http://web.archive.org/cdx/search/cdx?url=https%3A%2F%2Fnationalpost.com%2Fhealth%2Fbio-warfare-experts-question-why-canada-was-sending-lethal-viruses-to-china&from=20191001000000&showResumeKey=true&resolveRevisits=true
Just like we have a custom
BlockedByRobotsError
, we should have another error for this, rather than just raising a not-so-great HTTP error.In this case, the response code is
403
and there is a header like:(And the same text as the header in the response body.)
We can probably follow Wayback’s naming and call this
AdministrativeAccessControlException
orBlockedSiteError
.It might even make sense to generalize this for any 4xx/5xx response that has an
X-Archive-Wayback-Runtime-Error
header.The text was updated successfully, but these errors were encountered: