RECSYS

Implementation of various popular Collaborative Filtering algorithms in Python.

Course Assignment for CS F469- Information Retrieval @ BITS Pilani, Hyderabad Campus.

Done under the guidance of Dr. Aruna Malapati, Assistant Professor, BITS Pilani, Hyderabad Campus.

Instructions to run the scripts

Run the following commands for the various implementations of collaborative filtering:

Create the train matrix and the mappings

python create_matrices.py

Collaborative Filtering

python collaborative_filtering.py

Collaborative Filtering with Baseline Approach

python collaborative_filtering_baseline.py

SVD

python SVD.py

CUR

python CUR.py

Introduction

Collaborative filtering is a method of making automatic predictions (filtering) about the interests of a user by collecting preferences or taste information from many users (collaborating). The main purpose of this project is understand how the following collaborative filtering algorithms work-

Basic Approach
Baseline Approach
SVD Decomposition Approach
CUR Decomposition Approach

More on Collaborative Filtering. *

Data

We used the MovieLens 1M dataset, which can be found here. The dataset contains 1 million ratings from 6000 users on 4000 movies. The ratings range from 1 to 5, with zeroes indicating missing ratings. The data can be found in the folder data.

The data was shuffled and then split to create train(80%) and test(20%) sets.

	Ratings	Unique users	Unique movies
Overall Dataset	1000209	6040	3706
Train Set (80%)	800167	6040	3682
Test Set (20%)	200041	6036	3462

Directory Structure:

recsys_final/
+-- data
|   +-- ratings.dat (original data file containing ratings)
+-- temp_data
|   +-- movie_map.pkl (movie_map in pickled format) 
|   +-- sigma.npy (numpy file containing sigma matrix in dense representation)
|   +-- test_table.pkl (pickled pandas dataframe conaining the test data) 
|   +-- train.npz (numpy file containing the train matrix in sparse representation) 
|   +-- U.npy (numpy file containing U matrix in dense representation)
|   +-- user_map.pkl (user_map in pickled format)
|   +-- V_t.npy (numpy file containing transpose of V matrix in dense representation)
+--  create_matrices.py(python script to create read the data, adn create train matrix, test dataframe and user and movie mappings and save them to disk)
+-- collaborative_filtering.py(python script to perform collaborative filtering)
+--  CUR.py(python script to perform collaborative filtering using CUR decomposition)
+--  evaluation.py (python script containing functions for evaluation metrics)
+--  SVD_module.py (python script to perform collaborative filtering using SVD)
+--  recsys_utils.py (python script containing functions for loading matrices and mappings)

Basic Collaborative Filtering

While predicting ratings, the ratings of the 10 most similar users are used.

More on Collaborative Filtering

Collaborative Filtering with baseline

While predicting ratings, the ratings of the 10 most similar users are used.

SVD

No. of singular values retained= 1088 (90% energy)

More on SVD

CUR

No. of columns and rows retained= 900

More on CUR

Machine specs:

Processor: i7-7500U

Ram: 16 GB DDR4

OS: Ubuntu 16.04 LTS

Results

Recommender System Technique	Root Mean Square Error (RMSE)	Precision on top K	Spearman Rank Correlation	Time taken for prediction (secs)
Collaborative	2.033519 (item) 2.1502(user)	0.6016 (item) 0.584474(user)	0.99999975(item) 0.99999972 (user)	211.979 (item) 272.817 (user)
Collaborative along with Baseline approach	0.939036 (item) 1.005434 (user)	0.62865586 (item) 0.64406025 (user)	0.999999947 (item) 0.99999939 (user)	313.3369 (item) 273.2009(user)
SVD	1.03512426007	0.654428981666	0.999999999839	565.33
SVD with 90% retained energy	1.03	0.6528	0.999999999839	361.49
CUR	1.19389972	0.900607466	0.99999999786	53.4029

Members

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
data		data
docs		docs
.gitignore		.gitignore
20150288_20150342_20150174_20150082_3.txt		20150288_20150342_20150174_20150082_3.txt
CUR.py		CUR.py
Design-Doc-Assignment-3.pdf		Design-Doc-Assignment-3.pdf
README.html		README.html
README.md		README.md
SVD.py		SVD.py
collaborative_filtering.py		collaborative_filtering.py
collaborative_filtering_baseline.py		collaborative_filtering_baseline.py
create_matrices.py		create_matrices.py
evaluation.py		evaluation.py
recsys_utils.py		recsys_utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

RECSYS

Table of Contents

Instructions to run the scripts

Create the train matrix and the mappings

Collaborative Filtering

Collaborative Filtering with Baseline Approach

SVD

CUR

Introduction

Data

Directory Structure:

Basic Collaborative Filtering

Collaborative Filtering with baseline

SVD

CUR

Machine specs:

Results

Members

About

Releases

Packages

Languages

shubhamjha97/recsys

Folders and files

Latest commit

History

Repository files navigation

RECSYS

Table of Contents

Instructions to run the scripts

Create the train matrix and the mappings

Collaborative Filtering

Collaborative Filtering with Baseline Approach

SVD

CUR

Introduction

Data

Directory Structure:

Basic Collaborative Filtering

Collaborative Filtering with baseline

SVD

CUR

Machine specs:

Results

Members

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages