EECS 486 RMP Webscrape

Using all reviews given to University of Michigan professors on ratemyprofessors.com (RMP) as the corpus, we benchmark the performance of different text-encoding methodologies and machine learning models.

Installation

You should have a Jupyter development set up, preferably using Python 3.11 kernel.

Then, pip install -r requirements.txt

Directory Structure

.

├── Data
│   ├── clean_prof_info.csv
│   ├── clean_ratings.csv
│   ├── data_cleaning.ipynb
│   ├── glove.6B.100d.txt
│   ├── glove.6B.100d.txt.word2vec
│   ├── raw_prof_info.csv
│   └── raw_ratings.csv
├── README.md
├── RateMyProfessorAPI
│   ├── LICENSE
│   ├── MANIFEST.in
│   ├── README.md
│   ├── examples
│   │   └── example.py
│   ├── ratemyprofessor
│   │   ├── __init__.py
│   │   ├── __pycache__
│   │   │   ├── __init__.cpython-38.pyc
│   │   │   ├── professor.cpython-38.pyc
│   │   │   └── school.cpython-38.pyc
│   │   ├── json
│   │   │   ├── header.json
│   │   │   ├── professorquery.json
│   │   │   └── ratingsquery.json
│   │   ├── professor.py
│   │   ├── ratings.json
│   │   ├── ratings_info.json
│   │   ├── sample.py
│   │   └── school.py
│   ├── requirements.txt
│   ├── setup.cfg
│   ├── setup.py
│   └── tests
│       └── test.py
├── data_acquisition.py
├── experiment.ipynb
├── html
│   ├── diff.txt
│   ├── html.txt
│   └── profID.txt
├── pipeline.ipynb
├── requirements.txt
├── scraper.py
└── util.py

File Overview

`Data/`

Raw and cleaned datasets, dataset schema is provided below.
data_cleaning.ipynb: Processing procedures to clean raw data.
glove.6B.100d.txt: Download available at the GloVe website. We used the 6B tokens dataset with 100-dimension vectors.

`RateMyProfessorAPI`/

Lightly edited fork of RateMyProfessorAPI, see acknowledgement.

`html`/

Webscrapping artifacts.

`./`

data_acquisiton.py: Given a list of profIDs, use RMPAPI to retrieve relevant information in JSON format.
experiment.ipynb: Benchmarking with different encodings and machine learning models.
scraper.py: Selenium program to retrieve UMich profIDs.
util.py: Project utilities.

Cleaned Dataset Schema

`clean_prof_info.csv`

Column Name	Data Type	Note
profID	int
firstName	str
lastName	str
fullName	str	Concatenate first and last name
department	str	Known defects, not reliable
numRatings	int
wouldTakeAgainPct	float	Ranges from 0 to 100, has Na
avgDifficulty	float	Ranges from 1 to 5
avgRating	float	Ranges from 1 to 5

`clean_ratings.csv`

Column Name	Data Type	Note
profID	int
class	str
attendanceMandatory	bool
comment	str
date	`pd.datetime`	UTC format, accurate to second
difficutyRating	float	Range from 1 to 5
grade	str	Letter grades, with +/-
helpfulRating	float	Range from 1 to 5
isForCredit	bool
isForOnlineClass	bool
ratingTags	list	List of up to 3 tags
wouldTakeAgain	bool

Earlier comments may have difficultyRating and helpfulRating at 0.5 increments.

At some point, the site began to only allow integer ratings.

Acknowledgement

RateMyProfessorAPI authored by NobelZ, ChrisBryann, Ozeitis. Apache-2.0 license.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

EECS 486 RMP Webscrape

Installation

Directory Structure

File Overview

`Data/`

`RateMyProfessorAPI`/

`html`/

`./`

Cleaned Dataset Schema

`clean_prof_info.csv`

`clean_ratings.csv`

Acknowledgement

About

Releases

Packages

Contributors 5

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 44 Commits
Data		Data
RateMyProfessorAPI		RateMyProfessorAPI
html		html
.gitignore		.gitignore
Final Report.pdf		Final Report.pdf
GUI.ipynb		GUI.ipynb
README.md		README.md
data_acquisition.py		data_acquisition.py
experiment.ipynb		experiment.ipynb
pipeline.ipynb		pipeline.ipynb
requirements.txt		requirements.txt
scraper.py		scraper.py
util.py		util.py

Casper-Guo/EECS-486-RMP-Sentiment-Analysis

Folders and files

Latest commit

History

Repository files navigation

EECS 486 RMP Webscrape

Installation

Directory Structure

File Overview

Data/

RateMyProfessorAPI/

html/

./

Cleaned Dataset Schema

clean_prof_info.csv

clean_ratings.csv

Acknowledgement

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 5

Languages

`Data/`

`RateMyProfessorAPI`/

`html`/

`./`

`clean_prof_info.csv`

`clean_ratings.csv`

Packages