-
Notifications
You must be signed in to change notification settings - Fork 12
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
fix: use httpx instead of requests to avoid Bandcamp blocking
When using requests/urllib3, Bandcamp response to all requests with 403 errors. Investigating why, I tried: - using curl to send the same request: it worked - writing a tiny Python script to `GET bandcamp.com/` with requests: it failed with 403 - waiting a week to see if it solved itself: no luck - changing the above mentioned script to use http.client or httpx worked I think that in this case, Bandcamp's Web Application Firewall (WAF) blocks the requests based not on their contents but on an artifact of how urllib3 builds/sends the data, since curl with exact same headers works. Instead of trying to identify the exact reason, which is quite hard without any info on Bandcamp's WAF, and fix/workaround that, I rewrote the very little required HTTP code to use httpx and sidestep the issue.
- Loading branch information
1 parent
0fa90d8
commit ea78c1f
Showing
6 changed files
with
122 additions
and
177 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,21 @@ | ||
from functools import lru_cache | ||
from html import unescape | ||
from urllib.parse import urlsplit | ||
|
||
from beets import __version__ | ||
import httpx | ||
|
||
HTTPError = httpx.HTTPError | ||
|
||
USER_AGENT = f"beets/{__version__} +https://beets.io/" | ||
|
||
_client = httpx.Client(headers={"User-Agent": USER_AGENT}) | ||
|
||
@lru_cache(maxsize=None) | ||
def http_get_text(url: str) -> str: | ||
"""Return text contents of the url.""" | ||
|
||
response = _client.get(url) | ||
response.raise_for_status() | ||
|
||
return unescape(response.text) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.