Islandora Scraping Utils

This is a package for scraping data off of an existing instace of Islandora, and preparing it to ingest into another.

Getting Started

Setup:

pip install -r requirements.txt

Usage

usage: main.py [-h] [-d DEST] {download,prep} ...

positional arguments:
  {download,prep}

optional arguments:
  -h, --help            show this help message and exit
  -d DEST, --dest DEST  The directory to save files in

usage: main.py download [-h] [-c COUNT] [-q QUERY] [-l] url

positional arguments:
  url                   The url of the islandora instance ex: "https://islandnewspapers.ca/islandora"

optional arguments:
  -h, --help            show this help message and exit
  -c COUNT, --count COUNT
                        The number of results to pull from
  -q QUERY, --query QUERY
                        The term to search for when downloading issues
  -l, --light_weight    Download pages without OBJ

usage: main.py prep [-h] [-i {dir,zip}] [-n {zip,marcxml}] [-s SOURCE]

optional arguments:
  -h, --help            show this help message and exit
  -i {dir,zip}, --issues {dir,zip}
                        The format to save the issues in.
  -n {zip,marcxml}, --newspapers {zip,marcxml}
                        The format to save the newspapers in.
  -s SOURCE, --source SOURCE

Islandora newspaper base box vagrant

Usage

Requires Vagrant and VirtualBox
Download the java 8 JDK from Oracle and place it in the vagrant directory.
Run vagrant up
To create a reusable package run vagrant package

Name		Name	Last commit message	Last commit date
Latest commit History 24 Commits
docs		docs
src		src
vagrant		vagrant
xml		xml
.gitattributes		.gitattributes
.gitignore		.gitignore
README.md		README.md
Vagrantfile		Vagrantfile
build_box.sh		build_box.sh
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Islandora Scraping Utils

Getting Started

Usage

Islandora newspaper base box vagrant

Usage

About

Releases

Packages

Contributors 3

Languages

Islandora-Image-Segmentation/dev-ops

Folders and files

Latest commit

History

Repository files navigation

Islandora Scraping Utils

Getting Started

Usage

Islandora newspaper base box vagrant

Usage

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages