Skip to content
Olly Butters edited this page Jan 10, 2022 · 15 revisions

Welcome to the PUMA wiki

PUMA = PUblications Metadata Augmentation

General Overview

In essence this is a pipeline that takes a list of published academic papers and uses various APIs to add (augment) extra metadata to them. This can include numbers of citations, geolocation of first authors etc. With this extra metadata we should be able to generate rich web pages with lists of publications (grouping things by subject, showing citations etc). We can also generate useful datasets for analysis with other software.

The generated web pages can be made publicly available, or can be used locally to explore the data yourself.

Using the bibliographic manager Zotero we can make sure we get a structured clean list of papers to begin with.

Pipeline overview

  • Start with a list of publications in Zotero
  • Get the metadata for each publication from doi.org, PubMed and Scopus
  • Merge the metadata
  • Clean the metadata
  • Add some extra metadata to it (citations, geocode etc)
  • Analyse it
  • Make some web pages showing it

Repo config

puma_cache