-
Notifications
You must be signed in to change notification settings - Fork 9
Options
Welcome to Pygetpapers. -h or --help for help
optional arguments:
-h, --help show this help message and exit
-q QUERY, --query QUERY
Add the query you want to search for. Enclose the query in quotes.
-k LIMIT, --limit LIMIT
Add the number of papers you want. Default =100
-o OUTPUT, --output OUTPUT
Add the output directory url. Default is the current working directory
-v, --onlyquery Only makes the query and stores the result.
-p FROMPICKLE, --frompickle FROMPICKLE
Reads the picke and makes the xml files. Takes the path to the pickle as the
input
-m, --makepdf Also makes pdf files for the papers. Works only with --api method.
-j, --makejson Also makes json files for the papers. Works only with --api method.
-c, --makecsv Also makes csv files for the papers. Works only with --api method.
-u UPDATE, --update UPDATE
Updates the corpus by downloading new papers. Requires -k or --limit and -q or
--query to be given. Takes the path to the pickle as the input
--api Get papers using the official EuropePMC api
--webscraping Get papers using the scraping EuropePMC. Also supports getting only research
papers, preprints or review papers.
--onlyresearcharticles
Get only research papers (Only works with --webscraping)
--onlypreprints Get only preprints (Only works with --webscraping)
--onlyreviews Get only review papers (Only works with --webscraping)
Avoid conversational style:
Add the query you want to search for. Enclose the query in quotes.
change to:
query string transmitted to repository API. Repository-dependent (see examples). May need nested quoting (platform dependent)
be precise. And concise. Avoid "add the",
Add the output directory url. Default is the current working directory.
This is not a URL - it is a directory. Does it create a new directory?
output directory (Default: current working directory)
What is in this directory? (Maybe defer to architecture discussion).
- What is the format of this query?
- does it depend on the target repository
- does it depend on operating system?
- what type of quotes?
Essential to give examples. This was a major problem with getpapers
maximum number of hits (default: 100)
What does "makes the query" mean? (The user makes the query). Does it mean "submit the query"? If so what is different from -q
with -n
?
"Stores the result". What result? Stores it where?
Reads the picke and makes the xml files. Takes the path to the pickle as the
input
** MUST change this option from -p
as that is used for --pdf
by getpapers. Be consistent.**
What is "the picke" (assume "pickle"). Is there one? or many? what is its purpose? What is its content? What is its name and where is it located? (You may describe the content later) Does this mean:
Reads a per-project metadata file (FROMPICKLE) and downloads any
This should be -p
for compatibility.
What does "make pdfs" mean? I think it means "downloads PDFs".
We should not be using Webscraping unless it is consistent with EPMC terms and conditions.
** PLEASE OUTLINE WHAT THIS ACTUALLY DOES AND WHY BEFORE IT IS DEPLOYED.**
This should have the name of the repository as an argument, e.g. --api crossref
. I do not understand how it is used at the moment.
-j, --makejson Also makes json files for the papers. Works only with --api method.
-c, --makecsv Also makes csv files for the papers. Works only with --api method.
What are these files?
Is this what you mean?
-c record per-document metadata as CSV`
-j record per-document metadata as JSON`
Updates the corpus by downloading new papers. Requires -k or --limit and -q or
--query to be given. Takes the path to the pickle as the input
Why does it require -k
(this has a default)?