Skip to content

Options

petermr edited this page Mar 16, 2021 · 4 revisions

Commandline options

current option list

Welcome to Pygetpapers. -h or --help for help

optional arguments:
  -h, --help            show this help message and exit
  -q QUERY, --query QUERY
                        Add the query you want to search for. Enclose the query in quotes.
  -k LIMIT, --limit LIMIT
                        Add the number of papers you want. Default =100
  -o OUTPUT, --output OUTPUT
                        Add the output directory url. Default is the current working directory
  -v, --onlyquery       Only makes the query and stores the result.
  -p FROMPICKLE, --frompickle FROMPICKLE
                        Reads the picke and makes the xml files. Takes the path to the pickle as the
                        input
  -m, --makepdf         Also makes pdf files for the papers. Works only with --api method.
  -j, --makejson        Also makes json files for the papers. Works only with --api method.
  -c, --makecsv         Also makes csv files for the papers. Works only with --api method.
  -u UPDATE, --update UPDATE
                        Updates the corpus by downloading new papers. Requires -k or --limit and -q or
                        --query to be given. Takes the path to the pickle as the input
  --api                 Get papers using the official EuropePMC api
  --webscraping         Get papers using the scraping EuropePMC. Also supports getting only research
                        papers, preprints or review papers.
  --onlyresearcharticles
                        Get only research papers (Only works with --webscraping)
  --onlypreprints       Get only preprints (Only works with --webscraping)
  --onlyreviews         Get only review papers (Only works with --webscraping)

style

Avoid conversational style:

Add the query you want to search for. Enclose the query in quotes.

change to:

query string transmitted to repository API. Repository-dependent (see examples). May need nested quoting (platform dependent)

be precise. And concise. Avoid "add the",

Add the output directory url. Default is the current working directory.

This is not a URL - it is a directory. Does it create a new directory?

output directory (Default: current working directory)

-q Query

  • What is the format of this query?
  • does it depend on the target repository
  • does it depend on operating system?
  • what type of quotes?

Examples

Essential to give examples. This was a major problem with getpapers

-k Limit

maximum number of hits (default: 100)

-v --onlyquery

What does "makes the query" mean? (The user makes the query)

-p FROMPICKLE, --frompickle FROMPICKLE

                    Reads the picke and makes the xml files. Takes the path to the pickle as the
                    input

*MUST change this option from -p as that is used for --pdf by getpapers. Be consistent.