Skip to content

Options

petermr edited this page Mar 16, 2021 · 4 revisions

Commandline options

current option list

Welcome to Pygetpapers. -h or --help for help

optional arguments:
  -h, --help            show this help message and exit
  -q QUERY, --query QUERY
                        Add the query you want to search for. Enclose the query in quotes.
  -k LIMIT, --limit LIMIT
                        Add the number of papers you want. Default =100
  -o OUTPUT, --output OUTPUT
                        Add the output directory url. Default is the current working directory
  -v, --onlyquery       Only makes the query and stores the result.
  -p FROMPICKLE, --frompickle FROMPICKLE
                        Reads the picke and makes the xml files. Takes the path to the pickle as the
                        input
  -m, --makepdf         Also makes pdf files for the papers. Works only with --api method.
  -j, --makejson        Also makes json files for the papers. Works only with --api method.
  -c, --makecsv         Also makes csv files for the papers. Works only with --api method.
  -u UPDATE, --update UPDATE
                        Updates the corpus by downloading new papers. Requires -k or --limit and -q or
                        --query to be given. Takes the path to the pickle as the input
  --api                 Get papers using the official EuropePMC api
  --webscraping         Get papers using the scraping EuropePMC. Also supports getting only research
                        papers, preprints or review papers.
  --onlyresearcharticles
                        Get only research papers (Only works with --webscraping)
  --onlypreprints       Get only preprints (Only works with --webscraping)
  --onlyreviews         Get only review papers (Only works with --webscraping)

style

Avoid conversational style:

Add the query you want to search for. Enclose the query in quotes.

change to:

query string transmitted to repository API. Repository-dependent (see examples). May need nested quoting (platform dependent)

be precise. And concise. Avoid "add the",

-o --outdir

Add the output directory url. Default is the current working directory.

This is not a URL - it is a directory. Does it create a new directory?

output directory (Default: current working directory)

What is in this directory? (Maybe defer to architecture discussion).

-q Query

  • What is the format of this query?
  • does it depend on the target repository
  • does it depend on operating system?
  • what type of quotes?

Examples

Essential to give examples. This was a major problem with getpapers

-k Limit

maximum number of hits (default: 100)

-v --onlyquery

What does "makes the query" mean? (The user makes the query). Does it mean "submit the query"? If so what is different from -q with -n? "Stores the result". What result? Stores it where?

-p FROMPICKLE, --frompickle FROMPICKLE

                    Reads the picke and makes the xml files. Takes the path to the pickle as the
                    input

** MUST change this option from -p as that is used for --pdf by getpapers. Be consistent.**

What is "the picke" (assume "pickle"). Is there one? or many? what is its purpose? What is its content? What is its name and where is it located? (You may describe the content later) Does this mean:

Reads a per-project metadata file (FROMPICKLE) and downloads any 

-m, --makepdf Also makes pdf files for the papers. Works only with --api method.

This should be -p for compatibility.

What does "make pdfs" mean? I think it means "downloads PDFs".

--webscraping

We should not be using Webscraping unless it is consistent with EPMC terms and conditions.

** PLEASE OUTLINE WHAT THIS ACTUALLY DOES AND WHY BEFORE IT IS DEPLOYED.**

--api

This should have the name of the repository as an argument, e.g. --api crossref . I do not understand how it is used at the moment.

-j and -c

-j, --makejson        Also makes json files for the papers. Works only with --api method.
-c, --makecsv         Also makes csv files for the papers. Works only with --api method.

What are these files?

Is this what you mean?

-c record per-document metadata as CSV`
-j record per-document metadata as JSON`

-u UPDATE, --update UPDATE

                        Updates the corpus by downloading new papers. Requires -k or --limit and -q or
                        --query to be given. Takes the path to the pickle as the input

Why does it require -k (this has a default)?