Skip to content

Commit

Permalink
Caveat on restricted access items added
Browse files Browse the repository at this point in the history
  • Loading branch information
john-corcoran authored Jul 3, 2021
1 parent 6597ee2 commit 61395ae
Show file tree
Hide file tree
Showing 2 changed files with 11 additions and 7 deletions.
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -49,7 +49,7 @@ The available flags can be viewed using: `python3 ia_downloader.py download --he
- `-r` or `--resume`: if used, interrupted file transfers will be restarted where they left off, rather than being started over from scratch. In testing, Internet Archive connections can be unstable, so this is recommended for large file transfers.
- `-s [int]` or `--split [int]`: if used, the behaviour of downloads will change - instead of multiple files being downloaded simultaneously, only one file will be downloaded at a time, with each file over 10MB split into separate download threads (number of download threads is specified with this flag); each thread will download a separate portion of the file, and the file will be combined when all download threads complete. This may increase per-file download speeds, but will use more temporary storage space as files are downloaded. To avoid overloading Internet Archive servers, only one file will be downloaded at a time if this option is used (i.e. `-t` will be ignored). If using `-r` and the script has been restarted, use the same number of splits passed with this argument as was used during previous script execution. The maximum is `5`; the default is `1` (i.e. no file splitting will be performed).
- `-f [str ... str]` or `--filefilters [str ... str]`: one or more (space separated) file name filters; only files with names that contain any of the provided filter strings (case insensitive) will be downloaded. If multiple filters are provided, the search will be an 'OR' (i.e. only one of the provided strings needs to hit). For example, `-f png jpg` will download all files that contain either `png` or `jpg` in the file name. Individual terms can be wrapped in quotation marks.
- `-c [str] [str]` or `--credentials [str] [str]`: some Internet Archive items contain files that can only be accessed when logged in with an Internet Archive account. An email address and password can be supplied with this argument as two separate strings (email address first, then password - note that passwords containing spaces will need to be wrapped in quotation marks). Note that terminal history on your system may reveal your credentials to other users, and your credentials will be stored in a plaintext file in either `$HOME/.ia` or `$HOME/.config/ia.ini` as per [Internet Archive Python Library guidance](https://archive.org/services/docs/api/internetarchive/api.html#configuration). Credentials will be cached for future uses of this script (i.e. this flag only needs to be used once).
- `-c [str] [str]` or `--credentials [str] [str]`: some Internet Archive items contain files that can only be accessed when logged in with an Internet Archive account. An email address and password can be supplied with this argument as two separate strings (email address first, then password - note that passwords containing spaces will need to be wrapped in quotation marks). Note that terminal history on your system may reveal your credentials to other users, and your credentials will be stored in a plaintext file in either `$HOME/.ia` or `$HOME/.config/ia.ini` as per [Internet Archive Python Library guidance](https://archive.org/services/docs/api/internetarchive/api.html#configuration). Credentials will be cached for future uses of this script (i.e. this flag only needs to be used once). Note that, if the Internet Archive item is [access restricted (e.g. books in the lending program),](https://help.archive.org/hc/en-us/articles/360016398872-Downloading-A-Basic-Guide-) downloads will still not be possible even if credentials are supplied ('403 Forbidden' messages will occur).
- `--hashfile [str]`: output path to write file containing hash metadata (as recorded by Internet Archive). If left unspecified, the hash metadata file will be created in the cache within the logs folder.
- `--cacherefresh`: metadata for Internet Archive items and collections will be cached in the log folder and used if a download is resumed or restarted, or if the `verify` mode is used. When downloading, metadata will be refreshed if the data in the cache is over one week old, or if this flag is used.

Expand Down
16 changes: 10 additions & 6 deletions ia_downloader.py
Original file line number Diff line number Diff line change
Expand Up @@ -510,9 +510,11 @@ def file_download(
status_code = http_error.response.status_code
if status_code == 403:
log.warning(
"'{}' - 403 forbidden error occurred - an account login may"
" be required to access this file (account details can be passed"
" using the '-c' flag)".format(ia_file_name)
"'{}' - 403 Forbidden error occurred - an account login may be"
" required to access this file (account details can be passed using"
" the '-c' flag) - note that download may not be possible even when"
" logged in, if the file is within a restricted access item (e.g."
" books in the lending program)".format(ia_file_name)
)
else:
log.warning(
Expand Down Expand Up @@ -563,9 +565,11 @@ def file_download(
status_code = http_error.response.status_code
if status_code == 403:
log.warning(
"'{}' - 403 forbidden error occurred - an account login may"
" be required to access this file (account details can be passed"
" using the '-c' flag)".format(ia_file_name)
"'{}' - 403 Forbidden error occurred - an account login may be"
" required to access this file (account details can be passed using"
" the '-c' flag) - note that download may not be possible even when"
" logged in, if the file is within a restricted access item (e.g."
" books in the lending program)".format(ia_file_name)
)
else:
log.warning(
Expand Down

0 comments on commit 61395ae

Please sign in to comment.