-
Notifications
You must be signed in to change notification settings - Fork 2
Dataset rsync instructions
All of the following commands should work in bash on Linux and macos, as well as PowerShell on Windows. You must have a working installation of rsync and, for certain commands, ruby or perl.
Replace $TREE
with to the name of the rsync tree you are connecting to (such as ht_text_pd
) and $LOCAL_PATH
is the path on your local filesystem you want to write the dataset to, for example /path/to/datasets
.
rsync --copy-links --delete --ignore-errors --recursive --times --verbose datasets.hathitrust.org::$TREE $LOCAL_PATH
id_list.txt must be a plain text file containing one HathiTrust Volume ID per line, with Unix line endings and no other encoding (URL esaping, quotes, etc)
First run pip install pairtree
, then save this script as ids_to_ppath.py:
import sys, pairtree;
for line in sys.stdin:
(n,i) = line.strip().split('.',1);
print("/".join([n, 'pairtree_root', pairtree.id2path(i), pairtree.id_encode(i)]))
Then run:
python ids_to_ppath.py < id_list.txt > path_list.txt
First run gem install pairtree
to install the pairtree gem.
ruby -e 'require "pairtree";ARGF.each {|l|l.chomp!;n,i=l.split(/\./,2);puts "#{n}/pairtree_root/#{Pairtree::Path.id_to_path i}"}' id_list.txt > path_list.txt
First install File::Pairtree
CPAN module
perl -MFile::Pairtree -ne 'chomp;($n,$i)=split /\./,$_,2;print "$n/".File::Pairtree::id2ppath($i).File::Pairtree::s2ppchars($i)."\n"' id_list.txt > path_list.txt
Replace $TREE
with to the name of the rsync tree you are connecting to (such as ht_text_pd
) and $LOCAL_PATH
is the path on your local filesystem you want to write the dataset to, for example /path/to/datasets
.
rsync --copy-links --delete --ignore-errors --recursive --times --verbose --files-from=path_list.txt datasets.hathitrust.org::$TREE $LOCAL_PATH