Twitter PageRank

A MapReduce calculation of PageRank applied to the Twitter social-graph. Social-graph sampled via a crawler, using the Twitter REST API. The crawler performs a breadth first traversal of the Twitter graph starting at an arbitrary user node and storing connections in a MySQL database schema. A simplified PageRank algorithm was applied to the collected social graph data to calculate rank scores for all users. The simplified algorithm does not include the damping factor meant to account for a jump by a web surfer to a random web page. However, this omission probably accounts in part for the fact that our implementation converges to a uniform PageRank distribution of zero if allowed to run for too many iterations. A more in-depth analysis would have to include a some analogous damping factor in order to converge towards more meaningful values.

Included are R scripts to analyse results, SQL DDL scripts, and a MapReduce job to generate adjacency lists from raw cralwer data.

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
crawler		crawler
data		data
images		images
mapreduce		mapreduce
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Twitter PageRank

About

Releases

Packages

Languages

justinkamerman/ripple

Folders and files

Latest commit

History

Repository files navigation

Twitter PageRank

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages