Skip to content

Latest commit

 

History

History

openaire_x_czi_pipeline

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Affiliations joining OpenAIRE data (DOI to RORid) and CZI mentions data (DOI to repositories)

  1. Download CZI mentions dataset

  2. Prepare input data using:

python3 prepare <czi mentions data zip file downloaded in step.1>
  1. Ensure that your machine has connection to the internet; OpenAIRE data are downloaded from the pipeline.py script

  2. Note that the code is configured to produce a sample output. If you want to produce the full output, you have to change the value of the variable OPENAIRE_DOI_TO_RORID_INPUT_FILE in config.properties, i.e., comment line 2 and uncomment line 5 For that you will need a machine with large memory (the full output was produced with a machine with 256GB of RAM)

  3. Execute pipeline with:

python3 pipeline.py
  1. The output file is produced under the output/ folder