Skip to content

Get fulltexts or fulltext URLs of papers matching a search query

License

Notifications You must be signed in to change notification settings

jcmolloy/getpapers

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

31 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

getpapers

Get fulltexts or fulltext URLs of papers matching a search query using the EuropePMC API.

getpapers can fetch article metadata, fulltexts (PDF or XML), and supplementary materials. It's designed for use in content mining, but you may find it useful for quickly acquiring large numbers of papers for reading.

Installation

$ npm install --global getpapers

Usage

Use getpapers --help to see the command-line help:


Usage: getpapers [options]

Options:

  -h, --help              output usage information
  -V, --version           output the version number
  -q, --query <query>     Search query (required)
  -o, --outdir <path>     Output directory (required - will be created if not found)
  -x, --xml               Download fulltext XMLs if available
  -p, --pdf               Download fulltext PDFs if available
  -s, --supp              Download supplementary files if available
  -l, --loglevel <level>  amount of information to log (silent, verbose, info*, data, warn, error, or debug)

Screenshot

screenshot

Query format

Queries are processed by EuropePMC. In their simplest form, they can be free text, like this:

--query 'brain tumour rnaseq'

But they can also be much more detailed, using the EuropePMC webservice's query language (see Appendix 1 of the EuropePMC reference PDF).

For example we can restrict our search to only papers that mention 'transcriptome assembly' in the methods:

--query 'METHODS:"transcriptome assembly"'

Or to only papers with a CC-BY license:

--query 'LICENSE:"cc by" OR LICENSE:"cc-by"'

Note that in this case, we combine two restrictions using the logical OR keyword. We can also use AND, and can group operations using brackets:

--query '(LICENSE:"cc by" OR LICENSE:"cc-by") AND METHODS:"transcriptome assembly"'

A selection of the most commonly useful search fields are explained below...

Restrict search by bibliographic metadata

Field Description Example
PMCID: Search for a publication by its PubMed Central ID, where applicable (i.e. available as full text) PMCID:PMC1287967
TITLE: Search for a term or terms in publication titles TITLE:aspirin, TITLE:”protein knowledgebase”
ABSTRACT: Search for a term or terms in publication abstracts ABSTRACT:malaria, ABSTRACT:”chicken pox”
AUTH: Search for a surname and (optionally) initial(s) in publication author lists AUTH:einstein, AUTH:”Smith AB”
JOURNAL: Journal title – searchable either in full or abbreviated form JOURNAL:”biology letters”, JOURNAL:”biol lett”
LICENSE: Search for content according to the assigned Creative Commons license (where provided). LICENSE:"cc by" OR LICENSE:"cc-by", LICENSE:cc

Restrict by article metadata

Field Description Example
DISEASE: Search for mined diseases DISEASE:dysthymias
GENE_PROTEIN: Search for records that have GENE_PROTEINS mined GENE_PROTEIN:gng11
GOTERM: Search for records that have GOTERM mined GOTERM:apoptosis
CHEM: Limit your search by MeSH substance CHEM:propantheline, CHEM:”protein kinases”
ORGANISM: Search for mined organisms ORGANISM:terebratulide
PUB_TYPE: Limit your search by publication type PUB_TYPE:review, PUB_TYPE:”retraction of publication”

Section-level search

Field Description Example
INTRO: Find articles with a phrase in the Introduction & Background section INTRO:“protein interactions”
METHODS: Find articles with a phrase in the Materials & Methods section METHODS:“yeast two-hybrid”
RESULTS: Find articles with a phrase in the Results section RESULTS:"in vivo"
DISCUSS: Find articles with a phrase in the Discussion seciton DISCUSS:cardivascular

License

Copyright (c) 2014 Shuttleworth Foundation Licensed under the MIT license

About

Get fulltexts or fulltext URLs of papers matching a search query

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages

  • JavaScript 100.0%