-
Notifications
You must be signed in to change notification settings - Fork 9
Reports for testers
tlhahsn edited this page Apr 8, 2021
·
15 revisions
Format: Name of the tester, Test Performed, Result (If error, then start a github issue)
- Check the Installation of pygetpapers and it's prerequisites for the set-up.
- This command will install the updated version of pygetpapers
pip install git+git://github.com/petermr/pygetpapers
.
C:\Users\DELL>pip3 install git+git://github.com/petermr/pygetpapers
Collecting git+git://github.com/petermr/pygetpapers
Cloning git://github.com/petermr/pygetpapers to c:\users\dell\appdata\local\temp\pip-req-build-6l3rldns
Running command git clone -q git://github.com/petermr/pygetpapers 'C:\Users\DELL\AppData\Local\Temp\pip-req-build-6l3rldns'
Requirement already satisfied: requests in c:\users\dell\appdata\local\programs\python\python39\lib\site-packages (from pygetpapers==0.0.3.1) (2.20.0)
Requirement already satisfied: pandas_read_xml in c:\users\dell\appdata\local\programs\python\python39\lib\site-packages (from pygetpapers==0.0.3.1) (0.0.9)
Requirement already satisfied: pandas in c:\users\dell\appdata\local\programs\python\python39\lib\site-packages (from pygetpapers==0.0.3.1) (1.2.0)
Requirement already satisfied: lxml in c:\users\dell\appdata\local\programs\python\python39\lib\site-packages (from pygetpapers==0.0.3.1) (4.6.2)
Requirement already satisfied: chromedriver_autoinstaller in c:\users\dell\appdata\local\programs\python\python39\lib\site-packages (from pygetpapers==0.0.3.1) (0.2.2)
Requirement already satisfied: xmltodict in c:\users\dell\appdata\local\programs\python\python39\lib\site-packages (from pygetpapers==0.0.3.1) (0.12.0)
Requirement already satisfied: selenium in c:\users\dell\appdata\local\programs\python\python39\lib\site-packages (from pygetpapers==0.0.3.1) (3.12.0)
Requirement already satisfied: numpy>=1.16.5 in c:\users\dell\appdata\local\programs\python\python39\lib\site-packages (from pandas->pygetpapers==0.0.3.1) (1.20.1)
Requirement already satisfied: python-dateutil>=2.7.3 in c:\users\dell\appdata\local\programs\python\python39\lib\site-packages (from pandas->pygetpapers==0.0.3.1) (2.8.1)
Requirement already satisfied: pytz>=2017.3 in c:\users\dell\appdata\local\programs\python\python39\lib\site-packages (from pandas->pygetpapers==0.0.3.1) (2021.1)
Requirement already satisfied: six>=1.5 in c:\users\dell\appdata\local\programs\python\python39\lib\site-packages (from python-dateutil>=2.7.3->pandas->pygetpapers==0.0.3.1) (1.15.0)
Requirement already satisfied: zipfile36 in c:\users\dell\appdata\local\programs\python\python39\lib\site-packages (from pandas_read_xml->pygetpapers==0.0.3.1) (0.1.3)
Requirement already satisfied: distlib in c:\users\dell\appdata\local\programs\python\python39\lib\site-packages (from pandas_read_xml->pygetpapers==0.0.3.1) (0.3.1)
Requirement already satisfied: pyarrow in c:\users\dell\appdata\local\programs\python\python39\lib\site-packages (from pandas_read_xml->pygetpapers==0.0.3.1) (3.0.0)
Requirement already satisfied: chardet<3.1.0,>=3.0.2 in c:\users\dell\appdata\local\programs\python\python39\lib\site-packages (from requests->pygetpapers==0.0.3.1) (3.0.4)
Requirement already satisfied: idna<2.8,>=2.5 in c:\users\dell\appdata\local\programs\python\python39\lib\site-packages (from requests->pygetpapers==0.0.3.1) (2.7)
Requirement already satisfied: certifi>=2017.4.17 in c:\users\dell\appdata\local\programs\python\python39\lib\site-packages (from requests->pygetpapers==0.0.3.1) (2020.12.5)
Requirement already satisfied: urllib3<1.25,>=1.21.1 in c:\users\dell\appdata\local\programs\python\python39\lib\site-packages (from requests->pygetpapers==0.0.3.1) (1.24.3)
Using legacy 'setup.py install' for pygetpapers, since package 'wheel' is not installed.
Installing collected packages: pygetpapers
Attempting uninstall: pygetpapers
Found existing installation: pygetpapers 0.0.1
Uninstalling pygetpapers-0.0.1:
Successfully uninstalled pygetpapers-0.0.1
Running setup.py install for pygetpapers ... done
Successfully installed pygetpapers-0.0.3.1
- Run
pygetpapers --help
C:\Users\DELL>pygetpapers --help
usage: pygetpapers [-h] [-v] [-q QUERY] [-o OUTPUT] [-x] [-p] [-s] [--references REFERENCES] [-n]
[--citations CITATIONS] [-l LOGLEVEL] [-f LOGFILE] [-k LIMIT] [-r RESTART] [-u UPDATE]
[--onlyquery] [-c] [--synonym]
Welcome to Pygetpapers version 0.0.3.1. -h or --help for help
optional arguments:
-h, --help show this help message and exit
-v, --version output the version number
-q QUERY, --query QUERY
query string transmitted to repository API. Eg. 'Artificial Intelligence' or 'Plant Parts'. To
escape special characters within the quotes, use backslash. The query to be quoted in either
single or double quotes.
-o OUTPUT, --output OUTPUT
output directory (Default: current working directory)
-x, --xml download fulltext XMLs if available
-p, --pdf download fulltext PDFs if available
-s, --supp download supplementary files if available
--references REFERENCES
Download references if available. Requires source for references
(AGR,CBA,CTX,ETH,HIR,MED,PAT,PMC,PPR).
-n, --noexecute report how many results match the query, but don't actually download anything
--citations CITATIONS
Download citations if available. Requires source for citations
(AGR,CBA,CTX,ETH,HIR,MED,PAT,PMC,PPR).
-l LOGLEVEL, --loglevel LOGLEVEL
Provide logging level. Example --log warning <<info,warning,debug,error,critical>>,
default='info'
-f LOGFILE, --logfile LOGFILE
save log to specified file in output directory as well as printing to terminal
-k LIMIT, --limit LIMIT
maximum number of hits (default: 100)
-r RESTART, --restart RESTART
Reads the json and makes the xml files. Takes the path to the json as the input
-u UPDATE, --update UPDATE
Updates the corpus by downloading new papers. Takes the path of metadata json file of the
orignal corpus as the input. Requires -k or --limit (If not provided, default will be used)
and -q or --query (must be provided) to be given. Takes the path to the json as the input.
--onlyquery Saves json file containing the result of the query in storage. The json file can be given to
--restart to download the papers later.
-c, --makecsv Stores the per-document metadata as csv. Works only with --api method.
--synonym Results contain synonyms as well.
- Example query:
pygetpapers -q "Medicinal Activity" -k 10 -o "output" -x -p -c -s
- In this command -
-x
(--xml) download fulltext XMLs if available,-p
(--pdf) download fulltext PDFs if available,-s
(--supp) download supplementary files if available,-c
(--makecsv)Stores the per-document metadata as csv. Works only with --api method. - The command created "output" folder in the current directory within this folder, giving limited papers downloaded with PMC ID folder name (Eg:PMC7751408).
- This PMC ID folder contains eupmc_result- JSON file, Fullltext csv, pdf, xml and supplementary files as well.
- Apart from that,
- a
.csv
file with PMC id, HTML link,Keywords, pdf link, journaltitle and the author info was created.
- a
- Example query 2:
pygetpapers -q "Medicinal Activity" -k 10 -o "out_test" -x -p -c -s -l "info"
- In this command
-l
(--loglevel) LOGLEVEL Provide logging level such as info, warning, debug, error, critical
- In this command
C:\Users\DELL>pygetpapers -q "Medicinal Activity" -k 10 -o "out_test" -x -p -c -s -l "info"
INFO: Total Hits are 206841
INFO: Saving XML files to C:\Users\DELL\out_test\*\fulltext.xml
INFO: Made Supplementary files for PMC7822064
INFO: */Wrote xml for PMC7822064/
INFO: Wrote the pdf file for PMC7822064
INFO: Made Supplementary files for PMC7993383
INFO: */Wrote xml for PMC7993383/
INFO: Wrote the pdf file for PMC7993383
INFO: Made Supplementary files for PMC7939573
INFO: */Wrote xml for PMC7939573/
INFO: Wrote the pdf file for PMC7939573
INFO: Made Supplementary files for PMC7833026
INFO: */Wrote xml for PMC7833026/
INFO: Wrote the pdf file for PMC7833026
INFO: Made Supplementary files for PMC7808749
INFO: */Wrote xml for PMC7808749/
INFO: Wrote the pdf file for PMC7808749
INFO: Made Supplementary files for PMC7751408
INFO: */Wrote xml for PMC7751408/
INFO: Wrote the pdf file for PMC7751408
INFO: Made Supplementary files for PMC7889190
INFO: */Wrote xml for PMC7889190/
INFO: Wrote the pdf file for PMC7889190
INFO: Made Supplementary files for PMC7850424
INFO: */Wrote xml for PMC7850424/
INFO: Wrote the pdf file for PMC7850424
INFO: Made Supplementary files for PMC7782983
INFO: */Wrote xml for PMC7782983/
INFO: Wrote the pdf file for PMC7782983
INFO: Made Supplementary files for PMC7782162
INFO: */Wrote xml for PMC7782162/
INFO: Wrote the pdf file for PMC7782162
Collecting git+git://github.com/petermr/pygetpapers
Cloning git://github.com/petermr/pygetpapers to c:\users\hp pc\appdata\local\temp\pip-req-build-id5w7re6
Requirement already satisfied: requests in c:\users\hp pc\appdata\local\programs\python\python39\lib\site-packages (from pygetpapers==0.0.3.1) (2.20.0)
Requirement already satisfied: pandas_read_xml in c:\users\hp pc\appdata\local\programs\python\python39\lib\site-packages (from pygetpapers==0.0.3.1) (0.0.9)
Requirement already satisfied: pandas in c:\users\hp pc\appdata\local\programs\python\python39\lib\site-packages (from pygetpapers==0.0.3.1) (1.2.0)
Requirement already satisfied: lxml in c:\users\hp pc\appdata\local\programs\python\python39\lib\site-packages (from pygetpapers==0.0.3.1) (4.6.2)
Requirement already satisfied: chromedriver_autoinstaller in c:\users\hp pc\appdata\local\programs\python\python39\lib\site-packages (from pygetpapers==0.0.3.1) (0.2.2)
Requirement already satisfied: xmltodict in c:\users\hp pc\appdata\local\programs\python\python39\lib\site-packages (from pygetpapers==0.0.3.1) (0.12.0)
Requirement already satisfied: selenium in c:\users\hp pc\appdata\local\programs\python\python39\lib\site-packages (from pygetpapers==0.0.3.1) (3.12.0)
Requirement already satisfied: certifi>=2017.4.17 in c:\users\hp pc\appdata\local\programs\python\python39\lib\site-packages (from requests->pygetpapers==0.0.3.1) (2020.12.5)
Requirement already satisfied: idna<2.8,>=2.5 in c:\users\hp pc\appdata\local\programs\python\python39\lib\site-packages (from requests->pygetpapers==0.0.3.1) (2.7)
Requirement already satisfied: chardet<3.1.0,>=3.0.2 in c:\users\hp pc\appdata\local\programs\python\python39\lib\site-packages (from requests->pygetpapers==0.0.3.1) (3.0.4)
Requirement already satisfied: urllib3<1.25,>=1.21.1 in c:\users\hp pc\appdata\local\programs\python\python39\lib\site-packages (from requests->pygetpapers==0.0.3.1) (1.24.3)
Requirement already satisfied: zipfile36 in c:\users\hp pc\appdata\local\programs\python\python39\lib\site-packages (from pandas_read_xml->pygetpapers==0.0.3.1) (0.1.3)
Requirement already satisfied: pyarrow in c:\users\hp pc\appdata\local\programs\python\python39\lib\site-packages (from pandas_read_xml->pygetpapers==0.0.3.1) (3.0.0)
Requirement already satisfied: distlib in c:\users\hp pc\appdata\local\programs\python\python39\lib\site-packages (from pandas_read_xml->pygetpapers==0.0.3.1) (0.3.1)
Requirement already satisfied: numpy>=1.16.5 in c:\users\hp pc\appdata\local\programs\python\python39\lib\site-packages (from pandas->pygetpapers==0.0.3.1) (1.20.1)
Requirement already satisfied: pytz>=2017.3 in c:\users\hp pc\appdata\local\programs\python\python39\lib\site-packages (from pandas->pygetpapers==0.0.3.1) (2021.1)
Requirement already satisfied: python-dateutil>=2.7.3 in c:\users\hp pc\appdata\local\programs\python\python39\lib\site-packages (from pandas->pygetpapers==0.0.3.1) (2.8.1)
Requirement already satisfied: six>=1.5 in c:\users\hp pc\appdata\local\programs\python\python39\lib\site-packages (from python-dateutil>=2.7.3->pandas->pygetpapers==0.0.3.1) (1.15.0)
Using legacy 'setup.py install' for pygetpapers, since package 'wheel' is not installed.
Installing collected packages: pygetpapers
Attempting uninstall: pygetpapers
Found existing installation: pygetpapers 0.0.1
Uninstalling pygetpapers-0.0.1:
Successfully uninstalled pygetpapers-0.0.1
Running setup.py install for pygetpapers ... done
Successfully installed pygetpapers-0.0.3.1
WARNING: You are using pip version 20.2.3; however, version 21.0.1 is available.
You should consider upgrading via the 'c:\users\hp pc\appdata\local\programs\python\python39\python.exe -m pip install --upgrade pip' command.
- Update your installation using
pip install git+git://github.com/petermr/pygetpapers
on your commandline and the new version of pygetpapers will get installed. - Output
C:\Users\vasan>pip install git+git://github.com/petermr/pygetpapers
Collecting git+git://github.com/petermr/pygetpapers
Cloning git://github.com/petermr/pygetpapers to c:\users\vasan\appdata\local\temp\pip-req-build-7d0v_az0
Requirement already satisfied: requests in c:\users\vasan\appdata\local\programs\python\python39\lib\site-packages (from pygetpapers==0.0.3.1) (2.25.1)
Requirement already satisfied: pandas_read_xml in c:\users\vasan\appdata\roaming\python\python39\site-packages (from pygetpapers==0.0.3.1) (0.0.9)
Requirement already satisfied: pandas in c:\users\vasan\appdata\roaming\python\python39\site-packages (from pygetpapers==0.0.3.1) (1.2.0)
Requirement already satisfied: lxml in c:\users\vasan\appdata\roaming\python\python39\site-packages (from pygetpapers==0.0.3.1) (4.6.2)
Requirement already satisfied: chromedriver_autoinstaller in c:\users\vasan\appdata\roaming\python\python39\site-packages (from pygetpapers==0.0.3.1) (0.2.2)
Requirement already satisfied: xmltodict in c:\users\vasan\appdata\local\programs\python\python39\lib\site-packages (from pygetpapers==0.0.3.1) (0.12.0)
Requirement already satisfied: selenium in c:\users\vasan\appdata\roaming\python\python39\site-packages (from pygetpapers==0.0.3.1) (3.12.0)
Requirement already satisfied: idna<3,>=2.5 in c:\users\vasan\appdata\local\programs\python\python39\lib\site-packages (from requests->pygetpapers==0.0.3.1) (2.10)
Requirement already satisfied: urllib3<1.27,>=1.21.1 in c:\users\vasan\appdata\local\programs\python\python39\lib\site-packages (from requests->pygetpapers==0.0.3.1) (1.26.4)
Requirement already satisfied: chardet<5,>=3.0.2 in c:\users\vasan\appdata\local\programs\python\python39\lib\site-packages (from requests->pygetpapers==0.0.3.1) (4.0.0)
Requirement already satisfied: certifi>=2017.4.17 in c:\users\vasan\appdata\local\programs\python\python39\lib\site-packages (from requests->pygetpapers==0.0.3.1) (2020.12.5)
Requirement already satisfied: zipfile36 in c:\users\vasan\appdata\local\programs\python\python39\lib\site-packages (from pandas_read_xml->pygetpapers==0.0.3.1) (0.1.3)
Requirement already satisfied: pyarrow in c:\users\vasan\appdata\roaming\python\python39\site-packages (from pandas_read_xml->pygetpapers==0.0.3.1) (3.0.0)
Requirement already satisfied: distlib in c:\users\vasan\appdata\local\programs\python\python39\lib\site-packages (from pandas_read_xml->pygetpapers==0.0.3.1) (0.3.1)
Requirement already satisfied: numpy>=1.16.5 in c:\users\vasan\appdata\local\programs\python\python39\lib\site-packages (from pandas->pygetpapers==0.0.3.1) (1.20.1)
Requirement already satisfied: python-dateutil>=2.7.3 in c:\users\vasan\appdata\roaming\python\python39\site-packages (from pandas->pygetpapers==0.0.3.1) (2.8.1)
Requirement already satisfied: pytz>=2017.3 in c:\users\vasan\appdata\roaming\python\python39\site-packages (from pandas->pygetpapers==0.0.3.1) (2021.1)
Requirement already satisfied: six>=1.5 in c:\users\vasan\appdata\roaming\python\python39\site-packages (from python-dateutil>=2.7.3->pandas->pygetpapers==0.0.3.1) (1.15.0)
Building wheels for collected packages: pygetpapers
Building wheel for pygetpapers (setup.py) ... done
Created wheel for pygetpapers: filename=pygetpapers-0.0.3.1-py2.py3-none-any.whl size=15228 sha256=f33360e30867278ef54c94a6c44b077a552c1f7ae94d33645d9c7cea0335f2ff
Stored in directory: C:\Users\vasan\AppData\Local\Temp\pip-ephem-wheel-cache-cbi4pcjd\wheels\91\d1\11\341c5b9440e416ab82c2d7b3ce086fb12256db35effd396391
Successfully built pygetpapers
Installing collected packages: pygetpapers
Successfully installed pygetpapers-0.0.3.1
- Output
C:\Users\vasan>pygetpapers --help
usage: pygetpapers [-h] [-v] [-q QUERY] [-o OUTPUT] [-x] [-p] [-s] [--references REFERENCES] [-n]
[--citations CITATIONS] [-l LOGLEVEL] [-f LOGFILE] [-k LIMIT] [-r RESTART] [-u UPDATE]
[--onlyquery] [-c] [--synonym]
Welcome to Pygetpapers version 0.0.3.1. -h or --help for help
optional arguments:
-h, --help show this help message and exit
-v, --version output the version number
-q QUERY, --query QUERY
query string transmitted to repository API. Eg. 'Artificial Intelligence' or 'Plant Parts'. To
escape special characters within the quotes, use backslash. The query to be quoted in either
single or double quotes.
-o OUTPUT, --output OUTPUT
output directory (Default: current working directory)
-x, --xml download fulltext XMLs if available
-p, --pdf download fulltext PDFs if available
-s, --supp download supplementary files if available
--references REFERENCES
Download references if available. Requires source for references
(AGR,CBA,CTX,ETH,HIR,MED,PAT,PMC,PPR).
-n, --noexecute report how many results match the query, but don't actually download anything
--citations CITATIONS
Download citations if available. Requires source for citations
(AGR,CBA,CTX,ETH,HIR,MED,PAT,PMC,PPR).
-l LOGLEVEL, --loglevel LOGLEVEL
Provide logging level. Example --log warning <<info,warning,debug,error,critical>>,
default='info'
-f LOGFILE, --logfile LOGFILE
save log to specified file in output directory as well as printing to terminal
-k LIMIT, --limit LIMIT
maximum number of hits (default: 100)
-r RESTART, --restart RESTART
Reads the json and makes the xml files. Takes the path to the json as the input
-u UPDATE, --update UPDATE
Updates the corpus by downloading new papers. Takes the path of metadata json file of the
orignal corpus as the input. Requires -k or --limit (If not provided, default will be used)
and -q or --query (must be provided) to be given. Takes the path to the json as the input.
--onlyquery Saves json file containing the result of the query in storage. The json file can be given to
--restart to download the papers later.
-c, --makecsv Stores the per-document metadata as csv. Works only with --api method.
--synonym Results contain synonyms as well.
- query commmand :
pygetpapers -q "Plant genes" -o "testingfiles" -s -p -c -x -k 10
- A folder created named "testingfiles" containing 10 papers with PMC ID and each PMC ID includes JSON file, fullltext csv, pdf, xml and supplementary files.
pygetpapers -q "Plant genes" -o "testing_files" -s -p -c -x -k 10 -l "info"
-
-l
(--loglevel) and it provides logging level such as info, warning, debug, error, critical - Output
C:\Users\vasan>pygetpapers -q "Plant genes" -o "testing_files" -s -p -c -x -k 10 -l "info"
INFO: Total Hits are 325273
WARNING: Keywords not found for paper 1
WARNING: Keywords not found for paper 4
WARNING: html url not found for paper 5
WARNING: Keywords not found for paper 5
WARNING: pdf url not found for paper 5
WARNING: Keywords not found for paper 10
INFO: Saving XML files to C:\Users\vasan\testing_files\*\fulltext.xml
INFO: Made Supplementary files for PMC7736860
INFO: */Wrote xml for PMC7736860/
INFO: Wrote the pdf file for PMC7736860
INFO: Made Supplementary files for PMC6874142
INFO: */Wrote xml for PMC6874142/
INFO: Wrote the pdf file for PMC6874142
INFO: Made Supplementary files for PMC7516213
INFO: */Wrote xml for PMC7516213/
INFO: Wrote the pdf file for PMC7516213
INFO: Made Supplementary files for PMC7383801
INFO: */Wrote xml for PMC7383801/
INFO: Wrote the pdf file for PMC7383801
INFO: Made Supplementary files for PMC7001462
INFO: */Wrote xml for PMC7001462/
INFO: Made Supplementary files for PMC6777021
INFO: */Wrote xml for PMC6777021/
INFO: Wrote the pdf file for PMC6777021
INFO: Made Supplementary files for PMC6296014
INFO: */Wrote xml for PMC6296014/
INFO: Wrote the pdf file for PMC6296014
INFO: Made Supplementary files for PMC5664361
INFO: */Wrote xml for PMC5664361/
INFO: Wrote the pdf file for PMC5664361
INFO: Made Supplementary files for PMC5343966
INFO: */Wrote xml for PMC5343966/
INFO: Wrote the pdf file for PMC5343966
INFO: Made Supplementary files for PMC5596367
INFO: */Wrote xml for PMC5596367/
INFO: Wrote the pdf file for PMC5596367
- This command will install the updated version of pygetpapers
pip install git+git://github.com/petermr/pygetpapers
.
C:\Users\talha>pip3 install git+git://github.com/petermr/pygetpapers
Collecting git+git://github.com/petermr/pygetpapers
Cloning git://github.com/petermr/pygetpapers to c:\users\talha\appdata\local\temp\pip-req-build-r70bxipr
Running command git clone -q git://github.com/petermr/pygetpapers 'C:\Users\talha\AppData\Local\Temp\pip-req-build-r70bxipr'
Requirement already satisfied: requests in c:\users\talha\appdata\roaming\python\python39\site-packages (from pygetpapers==0.0.3.1) (2.25.1)
Requirement already satisfied: pandas_read_xml in c:\users\talha\appdata\roaming\python\python39\site-packages (from pygetpapers==0.0.3.1) (0.0.9)
Requirement already satisfied: pandas in c:\users\talha\appdata\roaming\python\python39\site-packages (from pygetpapers==0.0.3.1) (1.2.3)
Requirement already satisfied: lxml in c:\users\talha\appdata\roaming\python\python39\site-packages (from pygetpapers==0.0.3.1) (4.6.2)
Requirement already satisfied: chromedriver_autoinstaller in c:\users\talha\appdata\roaming\python\python39\site-packages (from pygetpapers==0.0.3.1) (0.2.2)
Requirement already satisfied: xmltodict in c:\users\talha\appdata\roaming\python\python39\site-packages (from pygetpapers==0.0.3.1) (0.12.0)
Requirement already satisfied: selenium in c:\users\talha\appdata\roaming\python\python39\site-packages (from pygetpapers==0.0.3.1) (3.141.0)
Requirement already satisfied: numpy>=1.16.5 in c:\users\talha\appdata\roaming\python\python39\site-packages (from pandas->pygetpapers==0.0.3.1) (1.20.1)
Requirement already satisfied: pytz>=2017.3 in c:\users\talha\appdata\roaming\python\python39\site-packages (from pandas->pygetpapers==0.0.3.1) (2021.1)
Requirement already satisfied: python-dateutil>=2.7.3 in c:\users\talha\appdata\roaming\python\python39\site-packages (from pandas->pygetpapers==0.0.3.1) (2.8.1)
Requirement already satisfied: six>=1.5 in c:\users\talha\appdata\roaming\python\python39\site-packages (from python-dateutil>=2.7.3->pandas->pygetpapers==0.0.3.1) (1.15.0)
Requirement already satisfied: pyarrow in c:\users\talha\appdata\roaming\python\python39\site-packages (from pandas_read_xml->pygetpapers==0.0.3.1) (3.0.0)
Requirement already satisfied: distlib in c:\users\talha\appdata\roaming\python\python39\site-packages (from pandas_read_xml->pygetpapers==0.0.3.1) (0.3.1)
Requirement already satisfied: zipfile36 in c:\users\talha\appdata\roaming\python\python39\site-packages (from pandas_read_xml->pygetpapers==0.0.3.1) (0.1.3)
Requirement already satisfied: certifi>=2017.4.17 in c:\users\talha\appdata\roaming\python\python39\site-packages (from requests->pygetpapers==0.0.3.1) (2020.12.5)
Requirement already satisfied: idna<3,>=2.5 in c:\users\talha\appdata\roaming\python\python39\site-packages (from requests->pygetpapers==0.0.3.1) (2.10)
Requirement already satisfied: chardet<5,>=3.0.2 in c:\users\talha\appdata\roaming\python\python39\site-packages (from requests->pygetpapers==0.0.3.1) (4.0.0)
Requirement already satisfied: urllib3<1.27,>=1.21.1 in c:\users\talha\appdata\roaming\python\python39\site-packages (from requests->pygetpapers==0.0.3.1) (1.26.4)
- Run
pygetpapers --help
C:\Users\talha>pygetpapers --help
usage: pygetpapers [-h] [-v] [-q QUERY] [-o OUTPUT] [-x] [-p] [-s] [--references REFERENCES] [-n] [--citations CITATIONS] [-l LOGLEVEL] [-f LOGFILE] [-k LIMIT]
[-r RESTART] [-u UPDATE] [--onlyquery] [-c] [--synonym]
Welcome to Pygetpapers version 0.0.3.1. -h or --help for help
optional arguments:
-h, --help show this help message and exit
-v, --version output the version number
-q QUERY, --query QUERY
query string transmitted to repository API. Eg. 'Artificial Intelligence' or 'Plant Parts'. To escape special characters within the quotes,
use backslash. The query to be quoted in either single or double quotes.
-o OUTPUT, --output OUTPUT
output directory (Default: current working directory)
-x, --xml download fulltext XMLs if available
-p, --pdf download fulltext PDFs if available
-s, --supp download supplementary files if available
--references REFERENCES
Download references if available. Requires source for references (AGR,CBA,CTX,ETH,HIR,MED,PAT,PMC,PPR).
-n, --noexecute report how many results match the query, but don't actually download anything
--citations CITATIONS
Download citations if available. Requires source for citations (AGR,CBA,CTX,ETH,HIR,MED,PAT,PMC,PPR).
-l LOGLEVEL, --loglevel LOGLEVEL
Provide logging level. Example --log warning <<info,warning,debug,error,critical>>, default='info'
-f LOGFILE, --logfile LOGFILE
save log to specified file in output directory as well as printing to terminal
-k LIMIT, --limit LIMIT
maximum number of hits (default: 100)
-r RESTART, --restart RESTART
Reads the json and makes the xml files. Takes the path to the json as the input
-u UPDATE, --update UPDATE
Updates the corpus by downloading new papers. Takes the path of metadata json file of the orignal corpus as the input. Requires -k or --limit
(If not provided, default will be used) and -q or --query (must be provided) to be given. Takes the path to the json as the input.
--onlyquery Saves json file containing the result of the query in storage. The json file can be given to --restart to download the papers later.
-c, --makecsv Stores the per-document metadata as csv. Works only with --api method.
--synonym Results contain synonyms as well.
- Example query:
pygetpapers -q "Medicinal Activity" -k 10 -o "output" -x -p -c -s
- In this command -
-x
(--xml) download fulltext XMLs if available,-p
(--pdf) download fulltext PDFs if available,-s
(--supp) download supplementary files if available,-c
(--makecsv)Stores the per-document metadata as csv. Works only with --api method. - The command created "output" folder in the current directory within this folder, giving limited papers downloaded with PMC ID folder name (Eg:PMC7751408).
- This PMC ID folder contains eupmc_result- JSON file, Fulltext csv, pdf, xml and supplementary files as well.
- Apart from that,
- a
.csv
file with PMC id, HTML link, Keywords, pdf link, journaltitle and the author info was created.
- a
- Example query 2:
pygetpapers -q "Medicinal Activity" -k 10 -o "out_test" -x -p -c -s -l "info"
- In this command
-l
(--loglevel) LOGLEVEL Provide logging level such as info, warning, debug, error, critical
- In this command
C:\Users\talha>pygetpapers -q "Medicinal Activity" -k 10 -o "out_test" -x -p -c -s -l "info"
INFO: Total Hits are 206841
INFO: Saving XML files to C:\Users\DELL\out_test\*\fulltext.xml
INFO: Made Supplementary files for PMC7822064
INFO: */Wrote xml for PMC7822064/
INFO: Wrote the pdf file for PMC7822064
INFO: Made Supplementary files for PMC7993383
INFO: */Wrote xml for PMC7993383/
INFO: Wrote the pdf file for PMC7993383
INFO: Made Supplementary files for PMC7939573
INFO: */Wrote xml for PMC7939573/
INFO: Wrote the pdf file for PMC7939573
INFO: Made Supplementary files for PMC7833026
INFO: */Wrote xml for PMC7833026/
INFO: Wrote the pdf file for PMC7833026
INFO: Made Supplementary files for PMC7808749
INFO: */Wrote xml for PMC7808749/
INFO: Wrote the pdf file for PMC7808749
INFO: Made Supplementary files for PMC7751408
INFO: */Wrote xml for PMC7751408/
INFO: Wrote the pdf file for PMC7751408
INFO: Made Supplementary files for PMC7889190
INFO: */Wrote xml for PMC7889190/
INFO: Wrote the pdf file for PMC7889190
INFO: Made Supplementary files for PMC7850424
INFO: */Wrote xml for PMC7850424/
INFO: Wrote the pdf file for PMC7850424
INFO: Made Supplementary files for PMC7782983
INFO: */Wrote xml for PMC7782983/
INFO: Wrote the pdf file for PMC7782983
INFO: Made Supplementary files for PMC7782162
INFO: */Wrote xml for PMC7782162/
INFO: Wrote the pdf file for PMC7782162