RussScholar-Seeker:A Python package for predicting whether a name is Russian
I am aware that this topic may be viewed from a political perspective. That is absolutely AGAINST my motivation.
This project contains a series of programs designed to automatically identify and analyze Russian authors in academic papers. Utilizing the latest natural language processing technologies, it predicts the geographical attribute of names using a pre-trained BERT model to determine whether a given name is Russian.
This script enables users to search the latest 1000 papers from selected conferences (or journals) and utilizes a large model to identify authors possibly of Russian background. It outputs the paper title, author names, and DOI. The script has already been deployed online, web version: https://russscholar.online
The core of the project is based on the BertForSequenceClassification
model from the transformers
library, trained with a specific dataset to distinguish Russian from non-Russian names. We first scrape metadata of academic papers, including titles and author names, from databases like DBLP. Then, we use this trained model to predict the names fetched, automatically identifying Russian authors.
https://huggingface.co/Gao-Tianci/RussScholar-Seeker
- Data Preparation: First, we collected a set of names labeled as Russian and non-Russian to serve as the dataset for training the model.
- Model Training: We trained the model using
BertForSequenceClassification
and the collected dataset. During the training process, we adjusted the model parameters to achieve the best predictive performance. - Data Scraping: We wrote web scraping programs to fetch metadata of academic papers from databases like DBLP.
- Prediction and Analysis: The fetched names were predicted using the trained model to identify Russian authors, and the related information was output.
Before using this tool, you need to install some necessary Python libraries, including transformers
, torch
, requests
, and beautifulsoup4
. The installation command is as follows:
pip install transformers torch requests beautifulsoup4
After that, you can run prediction.py to execute the Russian expert identification. The command might look like this:
python prediction.py
One of the notable applications of this project was the analysis of academic papers from the AAAI 2021 conference, listed on DBLP(HTML,XML). The goal was to identify papers with Russian authors, showcasing the model's ability to provide insights into geographical distributions of academic contributions.
The model successfully identified several papers with Russian authors, underlining the global collaboration in the field of Artificial Intelligence. Here are a few highlights from the analysis:
These results not only demonstrate the practical utility of the Russian Expert Identifier in analyzing academic contributions but also highlight the diverse international collaboration within the AI research community.
This case study underscores the potential of AI and NLP technologies in enhancing our understanding of academic landscapes. By automating the identification of geographical attributes of authors, we can gain valuable insights into global research trends, collaboration networks, and the geographical distribution of expertise.