Skip to content

Open-source tool in Python for generating a list of relevant articles from Scopus on a user-defined topic

License

Notifications You must be signed in to change notification settings

Ildar-Daminov/Python_for_literature_review_in_Scopus

Repository files navigation

Documentation

Publish Docs Check Documentation DOI

Python_for_literature_review

Understanding the main goal of code:

Main goal of this Python tool is generate a list of papers on a given topic available in Scopus with minimal input data.

Input:

  • a name of a topic (for saving the results)
  • a reference paper via Scopus eid
  • a list of your keywords for your topic of interest

Output:

  • Excel file <topic_name>_outputs.xlsx representing the list of papers in Scopus relevant to the given topic
  • Interactive graph showing the population of papers in html format

Main_idea

Figure 1 - Input and outputs of Python module

Zoom of a population graph:

Interactive graph (shown above and its increased view below) representing the population of papers on a given topic consists of blue dots and lines. Each blue dot represents an article and lines between these dots represent their "connection". Here, a connection appears if one of the paper cites another one. increased_view

Figure 2 - Interactive graph as the output of Python code and its increased view. Blue dots are articles on a given topic and lines are their connections

List of papers sorted in decsending order by the number of "connections"

After processing your query, Python generates a excel file with papers corresponding to your given topic sorted in descending order by the connection number (inside of population graph above) list

Figure 3 - The example of excel file with the papers on the topic of hosting capacity

Note: In additon to this excel file, Python generates npy files with the list of publications outside Scopus and papers with the error like 404 (such situation happens if paper in Scopus is not correctly filled e.g. empty title, authors names, abstract etc). These npy files can be further processed to doublecheck of relevant papers (this doublechecking is not included in current version of module yet)

Python workflow: how does it work

Python_workflow

Figure 4 - Workflow of how a paper population is reconstructed inside of the python tool

Before getting started:

First of all, you need to ensure an access to Scopus API via pybliometrics:

Install the pybliometrics package

Refer to the site for pybliometrics instructions

Get API keys for accessing Scopus via Elsevier API

  1. To access Scopus via its API, you need to check two things. First, your university needs to be a subscriber (not only to Scopus, but also to its API); second, you need to register API keys at https://dev.elsevier.com/apikey/manage. For each profile, you may register 10 keys.

  2. Add your API keys into config.ini (see instructions)

  3. It may be neccesary to change apikey from config.json (see main folder). Note that a key allows for 5,000 retrieval requests, or 20,000 search requests via the Scopus Search API per week. Without changing the apikey, it may be quickly depleted

Get Started:

Using a poetry to install all neccesary packages and run a code

  1. Copy the reposioty to your computer and open it in your code software e.g. we use Visual Studio code
  2. If you do not have a poetry on your computer you can use pip to install it. Just copy pip install poetry into your Python terminal
  3. Once poetry is installed, just type poetry install in Python terminal. This will create a virtual environment (folder .venv) where all neccesary packages will be installed. Note that the installation may take few minutes but once it will be finished you can be sure that everything would work as on our computer.
  4. During the installation accept that .venv will be installed in the same folder where you copied this Python module. Just click yes.
  5. Usually this is done automatically but check that Python interpeter (.venv':poetry) is selected. If you are in Visual Studio code just see the right bottom corner
  6. Open main_test.py in your editor and change the name, reference_paper_eid and select your keywords or run the example for the topic self-consumption (for the sake of example, we intentionally used a long keyword to reduce the number of corresponding papers and therefore get the results faster) .
  7. Before running the code, make sure that you are using the university network (directly or using VPN) to access the Scopus. Otherwise you will get the 401 error Unauthorized
  8. Run main_test.py
  9. After the message <<<< Analysis is finished >>>>, check the resuts in the excel file _outputs.xlsx and/or interactive graph

Quick start:

Once you installed everything, you can simply change the input data (see Figure 5) and run your case studies. You may find the eid of your reference paper on its Scopus webpage (see example on Figure 6). Python code

Figure 5 - The only input data to be changed in order to run your case in main_test.py

Scopus eid Figure 6 - The way how you may find the Scopus eid e.g. 2-s2.0-85101235827.

About

Open-source tool in Python for generating a list of relevant articles from Scopus on a user-defined topic

Topics

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages