Article Popularity Prediction

https://chulinuwu-articlepopularitypre-03-data-visualizationapp-ho1xxi.streamlit.app/

Install required Python modules 📥

pip install -r requirements.txt

00_Web_Scraping 🕸️

This section covers converting raw data files to JSON and scraping data from the Scopus website. Use the credentials provided in secret.py to log in to Scopus.

Steps:

Navigate to the project directory:
```
cd 00_Web_Scraping
code .
```
Converting raw data files to JSON (update folder path in convert_file_type.py):
```
python convert_file_type.py
```
Create a file named secret.py with the following content:
```
_id = "your email"
_pass = "your pass"
```
Execute scraping_script.py to scrape data from Scopus website:
```
cd scobus_scraping
python scraping_script.py
```
(Optional) Concatenate CSV files from scraping (update input file paths in concat_csv.py):
```
python concat_csv.py
```
Clean the CSV file:
```
python clean_csv.py
cd ..
```

01_Data_Preparation 🧹

This section involves extracting data provided by the professor and combining it with the data scraped from the web.

Steps:

Run the 1_ingest_Ajarn.ipynb file to extract data provided by the professor
Run the 2_concat_data.ipynb file to combine the data from the professor with the data scraped from the web.
The combined data will be saved as 2_data_combined.

02_0_Data_Storage 💾

This section covers how to store the combined dataset into a Cassandra database and interact with it using CQL (cassandra).

Steps:

Install Cassandra locally or use a cloud-based instance Start cassandra using sudo service cassandra start
Set up the database by running cqlsh -f 01.5_Data_Storage/structure.cql
Write the data to Cassandra using PySpark python -u 01.5_Data_Storage/spark_storage.py After completing these steps, you can query and use the stored data with PySpark.

02_Data_Science 🔬

This section involves using the 2_data_combined dataset to train a model to predict the cited by count of research papers.

Steps:

Run the train_model.ipynb file to train the model on the combined dataset and generate predictions for the cited by count of research papers.

03_Data_Visualization 📊

This section involves visualizing the data using Streamlit.

Steps:

Navigate to the project directory and run the app:
```
streamlit run .\03_Data_Visualization\app.py
```

Name		Name	Last commit message	Last commit date
Latest commit History 63 Commits
.devcontainer		.devcontainer
00_Web_Scraping		00_Web_Scraping
01_Data_Preparation		01_Data_Preparation
02_0_Data_Storage		02_0_Data_Storage
02_Data_Science		02_Data_Science
03_Data_Visualization		03_Data_Visualization
AutogluonModels/ag-20241206_093135		AutogluonModels/ag-20241206_093135
lib		lib
result		result
.gitignore		.gitignore
2_data_combined.csv		2_data_combined.csv
Readme.md		Readme.md
requirements.txt		requirements.txt
research_network.html		research_network.html

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Article Popularity Prediction

Install required Python modules 📥

00_Web_Scraping 🕸️

Steps:

01_Data_Preparation 🧹

Steps:

02_0_Data_Storage 💾

Steps:

02_Data_Science 🔬

Steps:

03_Data_Visualization 📊

Steps:

About

Releases

Packages

Contributors 6

Languages

Chulinuwu/ArticlePopularityPrediction

Folders and files

Latest commit

History

Repository files navigation

Article Popularity Prediction

Install required Python modules 📥

00_Web_Scraping 🕸️

Steps:

01_Data_Preparation 🧹

Steps:

02_0_Data_Storage 💾

Steps:

02_Data_Science 🔬

Steps:

03_Data_Visualization 📊

Steps:

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 6

Languages

Packages