Skip to content

slu1992/ranking

 
 

Repository files navigation

Rankit

Build Status PyPI version

What is rankit?

Rankit is created for the purpose of a more "scientific" ranking of rankable objects.

We rank objects by giving objects a score, we call that score rating. Traditionally, people would generate ratings by calculating average score, median or some other statistical meaningful numbers. However, eventhough this method is widely accepted, it can have bias in some extreme cases. Average score would easily be manipulated if the number of scores are unrestricted. One cope to this cheat is weighting, but this can only leverage the problem but not solving it.

Here in Rankit, we provide a variety of ranking solutions other than simple average. These methods includes famous sports ranking solutions like Massey ranking system, Colley ranking system, Keener ranking system, Elo ranking system... Some of the methods borrow the wisdom from PageRank and HITS, and a ranking system aims to predict score difference also exists.

To further compete with ranking cheating, rankit also included ranking merging methods and provided measures to measure distance between different ranking results.

All the algorithms implemented in rankit have been described in Who's #1? The Science of Rating and Ranking. In fact, rankit is a sheer implementation of this book.

Quick start

Suppose we want to generate the Massey rank of five teams from NCAA American football competition by using their scores in season 2005 (this is also the example used more than once in the book I mentioned above:)

import pandas as pd

data = pd.DataFrame({
    "primary": ["Duke", "Duke", "Duke", "Duke", "Miami", "Miami", "Miami", "UNC", "UNC", "UVA"], 
    "secondary": ["Miami", "UNC", "UVA", "VT", "UNC", "UVA", "VT", "UVA", "VT", "VT"],
    "rate1": [7, 21, 7, 0, 34, 25, 27, 7, 3, 14],
    "rate2": [52, 24, 38, 45, 16, 17, 7, 5, 30, 52]
}, columns=["primary", "secondary", "rate1", "rate2"])
data
primary secondary rate1 rate2
0 Duke Miami 7 52
1 Duke UNC 21 24
2 Duke UVA 7 38
3 Duke VT 0 45
4 Miami UNC 34 16
5 Miami UVA 25 17
6 Miami VT 27 7
7 UNC UVA 7 5
8 UNC VT 3 30
9 UVA VT 14 52
from rankit.Table import Table
from rankit.Ranker import MasseyRanker

data = Table(data, ['primary', 'secondary', 'rate1', 'rate2'])
ranker = MasseyRanker(data)
ranker.rank()
name rating rank
0 Miami 18.2 1
1 VT 18.0 2
2 UVA -3.4 3
3 UNC -8.0 4
4 Duke -24.8 5

That's it! All the things you have to do is preparing the games data in the form of pandas DataFrame, specifying the players' columns and score columns, pick a ranker and rank!

There are a variety of ranking methods for you to choose, but what if one wants to merge several ranking results?

from rankit.Ranker import MasseyRanker, ColleyRanker, KeenerRanker, MarkovRanker
from rankit.Merge import borda_count_merge

mergedrank = borda_count_merge([
    MasseyRanker(data).rank(), KeenerRanker(data).rank(), MarkovRanker(data).rank()])
mergedrank
name BordaCount rank
0 Miami 12 1
1 VT 9 2
2 UVA 6 3
3 UNC 3 4
4 Duke 0 5

So that's rankit! I hope that with rankit, there will be less dispute on the cheating of ranking and common people who does not know about the science of ranking will benefit from it.

License

MIT Licensed.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 100.0%