Harvesting and export of datasets from the ULB Sachsen-Anhalt's OJS/OMP installations to a DSpace based Repository
Automatic publication of contents of OJS and OMP resources (journals, series, monographs) in a DSpace 6.3 repository. DOI registration of exported contents via the DSpace repository. Return and store the DOI metadata information into the OJS and OMP systems.
In this first step the python script journal2saf.py is used to get all relevant metadata for the resources in a given OMP or OJS server which are marked as published and that are to be sent to DSpace. This is done via the REST-API. The script will only export resources that have not been exported with it yet.
The script then converts the extracted data from OJS/OMP into the SAF Archive format which is used by DSpace installations to import data into a standard DSpace collection. These data are saved in a previously defined export folder.
Once in this folder, the newly created SAF-Archive files are then copied/exported to the target DSpace installation using scp into a previously defined folder.
On the DSpace server, a bash script is then used to automatically import SAF files and also to export a list of all newly created DOIs by the DSpace installation which processes the new data files.
./dspace/bin/journals_import.sh
You may need to change the script to work with your local DSpace instance.
The following directory structure needs to exist on the DSpace server:
~/<exchange_folder>/source ~/<exchange_folder>/doi ~/<exchange_folder>/map
Everytime journal2saf.py is executed, the script checks if DOIs from SAF files which have been already exported to DSpace are available on the DSpace server. If it finds new DOIs, they get copied onto the OJS/OMP server.
☛ For each resource ((a galley or publicationFormat) in OJS/OMP terminology) in a journal an external URL can be stored in the field (urlRemote)
If the conf/config.ini setting "update_remote" is true, the script journal2saf.py ensures that the newly available DOIs are stored in OJS/OMP as the urlRemote attribute for each publication. For this to work properly, the OJS/OMP Plugin SetRemoteUrlPlugin must be previously installed.
Make sure you use Python 3.6 or higher. Clone the project and move into the appropriate directory as shown below:
python3 -m venv venv # windows venv\Scripts\activate.bat # other source venv/bin/activate pip install --upgrade pip pip install -r requirements.txt
A test should be carried out to ensure the setup has worked:
pytest -v
You need to create this file from the conf/config.ini.example by renaming it. All values are commented in the file. Values that need to be changed are marked with <>
In the config_meta.ini file, the metadata available from the OJS/OMP system which should be exported in the corresponding XML files are marked. The schema can be expanded as required as long as values which are valid for the API request are used. Here developing teams should ensure their implemented metadata mapping is conformant with cataloguing standards.
Static values need to be marked in quotation marks and these are then not read. The examples used in this project for the OJS and OMP installations are available in folder ./conf.
In some cases, given metadata needs some filtering before being added to DSpace. For more information on how to filter metadata before exporting, see the file ./lib/filters.py.
The script is ideally called by a cronjob.
python journal2saf.py -c ./conf/config.ini -m ./conf/config_meta_ojs.ini
see the LICENSE file