-
Download CZI mentions dataset
-
Prepare input data using:
python3 prepare <czi mentions data zip file downloaded in step.1>
-
Ensure that your machine has connection to the internet; OpenAIRE data are downloaded from the
pipeline.py
script -
Note that the code is configured to produce a sample output. If you want to produce the full output, you have to change the value of the variable
OPENAIRE_DOI_TO_RORID_INPUT_FILE
inconfig.properties
, i.e., comment line 2 and uncomment line 5 For that you will need a machine with large memory (the full output was produced with a machine with 256GB of RAM) -
Execute pipeline with:
python3 pipeline.py
- The output file is produced under the
output/
folder