Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Updates example section in the Readme #48

Open
wants to merge 12 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
53 changes: 53 additions & 0 deletions .github/create_documentation_plots.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,53 @@
import matplotlib.pyplot as plt
import pandas as pd
from pysankey import sankey

df = pd.read_csv("../pysankey/fruits.txt", sep=" ", names=["true", "predicted"])

colorDict = {
"apple": "#f71b1b",
"blueberry": "#1b7ef7",
"banana": "#f3f71b",
"lime": "#12e23f",
"orange": "#f78c1b",
"kiwi": "#9BD937",
}

labels = list(colorDict.keys())
leftLabels = [label for label in labels if label in df["true"].values]
rightLabels = [label for label in labels if label in df["predicted"].values]

ax = sankey(
left=df["true"],
right=df["predicted"],
leftLabels=leftLabels,
rightLabels=rightLabels,
colorDict=colorDict,
aspect=20,
fontsize=12,
)

plt.savefig("img/fruits.png")
plt.close()


# This calculates how often the different combinations of "true" and
# "predicted" co-occure
df = df.groupby(["true", "predicted"]).size().reset_index()
weights = df[0].astype(float)


ax = sankey(
left=df["true"],
right=df["predicted"],
rightWeight=weights,
leftWeight=weights,
leftLabels=leftLabels,
rightLabels=rightLabels,
colorDict=colorDict,
aspect=20,
fontsize=12,
)


plt.savefig("img/fruits_weighted.png")
Binary file added .github/img/fruits.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added .github/img/fruits_weighted.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
2 changes: 2 additions & 0 deletions .github/workflows/ci.yaml
Original file line number Diff line number Diff line change
@@ -1,3 +1,5 @@
name: build

on:
push:
branches:
Expand Down
2 changes: 2 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -1,3 +1,5 @@
# Jupyter
.ipynb_checkpoints/

#Eclipse/pydev
.project
Expand Down
189 changes: 0 additions & 189 deletions .ipynb_checkpoints/plotFruit-checkpoint.ipynb

This file was deleted.

131 changes: 66 additions & 65 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,14 +4,17 @@ Uses matplotlib to create simple <a href="https://en.wikipedia.org/wiki/Sankey_d
Sankey diagrams</a> flowing only from left to right.

[![PyPI version](https://badge.fury.io/py/pySankeyBeta.svg)](https://badge.fury.io/py/pySankeyBeta)
[![Build Status](https://travis-ci.org/Pierre-Sassoulas/pySankey.svg?branch=master)](https://travis-ci.org/Pierre-Sassoulas/pySankey)
[![Build Status](https://github.com/Pierre-Sassoulas/pySankey/actions/workflows/ci.yaml/badge.svg)](https://github.com/Pierre-Sassoulas/pySankey/actions/workflows/ci.yaml)
[![Coverage Status](https://coveralls.io/repos/github/Pierre-Sassoulas/pySankey/badge.svg?branch=master)](https://coveralls.io/github/Pierre-Sassoulas/pySankey?branch=master)
[![Code style: black](https://img.shields.io/badge/code%20style-black-000000.svg)](https://github.com/ambv/black)
[![License: GPL v3](https://img.shields.io/badge/License-GPLv3-blue.svg)](https://www.gnu.org/licenses/gpl-3.0)

## Example
## Examples

With fruits.txt :
### Simple expected/predicted example with fruits.txt:

`pysankey` contains a simple expected/predicted dataset called `fruits.txt` which looks
the following:
Comment on lines +16 to +17
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
`pysankey` contains a simple expected/predicted dataset called `fruits.txt` which looks
the following:
`pysankey` contains a simple expected/predicted dataset called `fruits.txt` which looks
like the following:


<div>
<table border="1" class="dataframe">
Expand Down Expand Up @@ -80,9 +83,13 @@ import pandas as pd
from pysankey import sankey
import matplotlib.pyplot as plt


df = pd.read_csv(
'pysankey/fruits.txt', sep=' ', names=['true', 'predicted']
'fruits.txt',
sep=' ',
names=['true', 'predicted']
)

colorDict = {
'apple':'#f71b1b',
'blueberry':'#1b7ef7',
Expand All @@ -92,83 +99,77 @@ colorDict = {
'kiwi':'#9BD937'
}

labels = list(colorDict.keys())
leftLabels = [label for label in labels if label in df['true'].values]
rightLabels = [label for label in labels if label in df['predicted'].values]

# Create the sankey diagram
ax = sankey(
df['true'], df['predicted'], aspect=20, colorDict=colorDict,
leftLabels=['banana','orange','blueberry','apple','lime'],
rightLabels=['orange','banana','blueberry','apple','lime','kiwi'],
left=df['true'],
right=df['predicted'],
leftLabels=leftLabels,
rightLabels=rightLabels,
colorDict=colorDict,
aspect=20,
fontsize=12
)

plt.show() # to display
plt.savefig('fruit.png', bbox_inches='tight') # to save
```

![Fruity Alchemy](pysankey/fruit.png)
![Fruity Alchemy](.github/img/fruits.png)

You could also use weight:
### Plotting preprocessed data using weights

```
,customer,good,revenue
0,John,fruit,5.5
1,Mike,meat,11.0
2,Betty,drinks,7.0
3,Ben,fruit,4.0
4,Betty,bread,2.0
5,John,bread,2.5
6,John,drinks,8.0
7,Ben,bread,2.0
8,Mike,bread,3.5
9,John,meat,13.0
```
However, not always you have or can have the data available in the format mentioned in
the previous example (e.g. if the dataset is too large). In this case, the weights
between the true and predicted labels can also be calculated beforehand and used to
create the sankey diagram. In this example, we continue to work with the data loaded
already in the previous example:
Comment on lines +124 to +128
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
However, not always you have or can have the data available in the format mentioned in
the previous example (e.g. if the dataset is too large). In this case, the weights
between the true and predicted labels can also be calculated beforehand and used to
create the sankey diagram. In this example, we continue to work with the data loaded
already in the previous example:
However, the data may not always be available in the format mentioned in
the previous example (for instance, if the dataset is too large). In such cases, the weights
between the true and predicted labels can be calculated in advance and used to
create the Sankey diagram. In this example, we will continue working with the data that was loaded
in the previous example:


```python
import pandas as pd
from pysankey import sankey
import matplotlib.pyplot as plt

df = pd.read_csv(
'pysankey/customers-goods.csv', sep=',',
names=['id', 'customer', 'good', 'revenue']
)
weight = df['revenue'].values[1:].astype(float)
# Calculate the weights from the fruits dataframe
df = df.groupby(["true", "predicted"]).size().reset_index()
weights = df[0].astype(float)

ax = sankey(
left=df['customer'].values[1:], right=df['good'].values[1:],
rightWeight=weight, leftWeight=weight, aspect=20, fontsize=20
left=df['true'],
right=df['predicted'],
rightWeight=weights,
leftWeight=weights,
leftLabels=leftLabels,
rightLabels=rightLabels,
colorDict=colorDict,
aspect=20,
fontsize=12
)

plt.show() # to display
plt.savefig('customers-goods.png', bbox_inches='tight') # to save
```

![Customer goods](pysankey/customers-goods.png)

Similar to seaborn, you can pass a matplotlib `Axes` to `sankey` function:

```python
import pandas as pd
from pysankey import sankey
import matplotlib.pyplot as plt

df = pd.read_csv(
'pysankey/fruits.txt',
sep=' ', names=['true', 'predicted']
)
colorDict = {
'apple': '#f71b1b',
'blueberry': '#1b7ef7',
'banana': '#f3f71b',
'lime': '#12e23f',
'orange': '#f78c1b'
}

ax1 = plt.axes()

sankey(
df['true'], df['predicted'], aspect=20, colorDict=colorDict,
fontsize=12, ax=ax1
)

plt.show()
```
![Fruity Alchemy](.github/img/fruits_weighted.png)

### pysankey function overview

> `sankey(left, right, leftWeight=None, rightWeight=None, colorDict=None, leftLabels=None, rightLabels=None, aspect=4, rightColor=False, fontsize=14, ax=None, color_gradient=False, alphaDict=None)`
>
> **left**, **right** : NumPy array of object labels on the left and right of the
> diagram
>
> **leftWeight**, **rightWeight** : Numpy arrays of the weights each strip
>
> **colorDict** : Dictionary of colors to use for each label
>
> **leftLabels**, **rightLabels** : order of the left and right labels in the diagram
>
> **aspect** : vertical extent of the diagram in units of horizontal extent
>
> **rightColor** : If true, each strip in the diagram will be be colored according to
> its left label
>
> **fontsize** : Fontsize to be used for the labels
>
> **ax** : matplotlib axes to plot on, otherwise uses current axes.

## Important informations

Expand Down
Binary file removed pysankey/customers-goods.png
Binary file not shown.
Binary file removed pysankey/fruit.png
Binary file not shown.
Binary file removed pysankey/fruits.png
Binary file not shown.
Loading