Skip to content

Commit

Permalink
better dataset description
Browse files Browse the repository at this point in the history
  • Loading branch information
renecotyfanboy committed Jun 16, 2024
1 parent a49d209 commit b8435c1
Show file tree
Hide file tree
Showing 2 changed files with 56 additions and 17 deletions.
60 changes: 48 additions & 12 deletions docs/dataset/cookbook.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,32 +2,68 @@

Here I'll provide a few examples of how to use the dataset using [`polars`](https://docs.pola.rs/).

### Load the dataset
## Load the dataset

```python
import polars as pl
Load the dataset from [`huggingface`](https://huggingface.co/datasets/renecotyfanboy/leagueData) and display all the available columns.

df = pl.read_csv("league_dataframe.csv")
```python
from datasets import load_dataset

print(df)
df = load_dataset("renecotyfanboy/leagueData", split="train").to_polars()
print(df.columns)
```

## Find the history of a player

``` python
```python
puuid = 'your_puuid' # (1)!
historic_of_random_player = df.filter(
puuid=puuid, is_in_reference_sample=True # (2)!
).sort(by='gameStartTimestamp')

).sort(by='gameStartTimestamp')
```

1. `b3fhGxFuV-hCD3B5Vvj9nrD--8YwlFACxvAIox_sOq2aNUtmkcsmem8NFufjdZd79L49I9spnh7LQg` is a valid `puuid`.
2. `is_in_reference_sample=True` indicates that we only keep the match history collected initially. Sometimes, the player
can appear in the others matches, but for history analysis it would include matches that were not initially selected.

## Build the win/loss curve
## Lowest number of games
Remake games were removed from the dataset, so some players don't have 100 games. This is how we get the lowest number of game for a single player, which is 85.

```python
from datasets import load_dataset

columns = ['elo', 'puuid', 'gameStartTimestamp', 'is_in_reference_sample', 'win']
df = load_dataset("renecotyfanboy/leagueData", split="train").select_columns(columns).to_polars()
df = df.filter(is_in_reference_sample=True)

number_of_games = []

for puuid in df['puuid'].unique():
player = df.filter(puuid=puuid)
number_of_games.append(len(player.sort(by='gameStartTimestamp')['win'].to_numpy()))

min(number_of_games)
```

## History of the Gold III players

Display the history of Gold III players in the dataset as an image.

```python
import numpy as np
import matplotlib.pyplot as plt
from datasets import load_dataset

columns = ['elo', 'puuid', 'gameStartTimestamp', 'is_in_reference_sample', 'win']
df = load_dataset("renecotyfanboy/leagueData", split="train").select_columns(columns).to_polars()
df = df.filter(elo="GOLD_III", is_in_reference_sample=True)

history = []

for puuid in df['puuid'].unique():
player = df.filter(puuid=puuid)
history.append(player.sort(by='gameStartTimestamp')['win'].to_numpy()[-85:])

plt.matshow(np.asarray(history))
```

``` python
import matplotlib.pyplot as plt
```
13 changes: 8 additions & 5 deletions docs/dataset/introduction.md
Original file line number Diff line number Diff line change
Expand Up @@ -19,9 +19,12 @@ in SoloQ for each of these players.

1. ![La source](https://risibank.fr/cache/medias/0/14/1420/142061/full.png){ align=left }

Let's explore a bit our dataset. In the next plot, I show the winrate of players in each division. The winrate is
computed using the history list of each player.
The following plot show the winrate of players in each division.

```plotly
{"file_path": "dataset/assets/winrate_over_division.json"}
```
<div class="grid cards" markdown>

- <p style='text-align: center;'> **Winrate per division** </p>
``` plotly
{"file_path": "dataset/assets/winrate_over_division.json"}
```
</div>

0 comments on commit b8435c1

Please sign in to comment.