GitHub

My Projects

Table of Content

Project	Link	Code
NYC Taxi	Visualization for NYCTaxi	Private
AutoComplete	Visualization for AutoComplete	Github Link
PageRank	Visualization for PageRank	Github Link
Movie Recommendation		Github Link

Project1: NYC Taxi

Description

Cleaned and filtered 1.2 Billion NYC Taxi Trip Data (300GB) and stored them in S3 buckets.
Built and configured data pipeline based on AWS resources automatically byTerraform and Ansible.
Designed a MapReduce program and deployed it on an auto scaling group of EC2 instances to generate the statistics of the taxi trip data.
Scheduled and tracked tasks on each node by SQS, where each task read the input by date range.
Built a Docker image of the MapReduce program and pushed it to ECS repository as an alternative to facilitate the scaling out process and compared the latencies.
Aggregated and stored the output of reducer into DynamoDB and used it as the data source for Bokeh visualization.

Link to Data source

http://www.nyc.gov/html/tlc/html/about/trip_record_data.shtml

Project2: AutoComplete

Description

Constructed a N-Gram library based on Wiki data set, built a Language Model by computing the Gram probabilities to generate (frequency) data and stored them into MySql.
Fetch data from MySql using Jquery and PHP to realize a real time auto completion feature of a searching engine on the web browser.

Project3: PageRank

Description

Implemented a page rank algorithm based on Twitter social network data sets with 11.3 million user profiles and 85 million social relations.
Formulated the relations between different users using transition matrix, calculated each user's rank value through 30 iterations until converge using EMR cluster.
Visualized the social network Graph based on the resulting PageRank matrix through Node.js.

Link to Data source

http://socialcomputing.asu.edu/datasets/Twitter

Project4: Movie Recommendation System

Description

Formulated a user rating matrix and a Co-concurrence matrix based on Netflix raw data set with 480k users, 17k movies and over 100 million ratings.
Merged the two matrices using a Item-based collaborative filtering algorithm to compute the movie recommendation list and deployed the jobs on AWS Hadoop cluster.

Link to Data source

http://academictorrents.com/details/9b13183dc4d60676b773c9e2cd6de5e5542cee9a

Name		Name	Last commit message	Last commit date
Latest commit History 22 Commits
AutoComplete/NGram		AutoComplete/NGram
PageRank		PageRank
RecommenderSystem		RecommenderSystem
.gitignore		.gitignore
README.md		README.md
_config.yml		_config.yml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

My Projects

Table of Content

Project1: NYC Taxi

Description

Link to Data source

Project2: AutoComplete

Description

Project3: PageRank

Description

Link to Data source

Project4: Movie Recommendation System

Description

Link to Data source

About

Releases

Packages

Languages

qwjlegend/qwjlegend.github.io

Folders and files

Latest commit

History

Repository files navigation

My Projects

Table of Content

Project1: NYC Taxi

Description

Link to Data source

Project2: AutoComplete

Description

Project3: PageRank

Description

Link to Data source

Project4: Movie Recommendation System

Description

Link to Data source

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages