Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(python!): Use Altair in DataFrame.plot #17995

Merged
merged 50 commits into from
Aug 27, 2024
Merged
Show file tree
Hide file tree
Changes from 48 commits
Commits
Show all changes
50 commits
Select commit Hold shift + click to select a range
9ed8836
feat(python!): Use Altair in DataFrame.plot
MarcoGorelli Aug 1, 2024
00f7413
missing file
MarcoGorelli Aug 1, 2024
f0c806f
use ChannelType
MarcoGorelli Aug 1, 2024
eaafc23
typing
MarcoGorelli Aug 1, 2024
db6d8f7
requirements
MarcoGorelli Aug 1, 2024
f1e5906
Merge remote-tracking branch 'upstream/main' into altair
MarcoGorelli Aug 11, 2024
08e09d4
add histogram example to docstring
MarcoGorelli Aug 11, 2024
c6e7d6b
update user guide
MarcoGorelli Aug 11, 2024
9a92211
formatting
MarcoGorelli Aug 11, 2024
c59780e
cross-version compat
MarcoGorelli Aug 11, 2024
541361b
py38 typing compat
MarcoGorelli Aug 11, 2024
7f51118
py38 typing compat
MarcoGorelli Aug 11, 2024
bb6116c
fix minimum version
MarcoGorelli Aug 11, 2024
f4b42b1
try setting typing extensions minimum
MarcoGorelli Aug 11, 2024
91a19f8
regular pip install to debug :sunglasses:
MarcoGorelli Aug 11, 2024
5c61982
that worked...what if we put uv back but without compile-bytecode?
MarcoGorelli Aug 11, 2024
8a11760
maybe not
MarcoGorelli Aug 11, 2024
f27326d
try putting torch and extra-index-url on the same line
MarcoGorelli Aug 12, 2024
f294a1e
inline torch install
MarcoGorelli Aug 12, 2024
2a1cba0
UV_INDEX_STRATEGY
MarcoGorelli Aug 12, 2024
b029aed
revert requirements-ci.txt change
MarcoGorelli Aug 12, 2024
e19a0b4
need both altair and hvplot in user guide docs
MarcoGorelli Aug 12, 2024
db6b59f
another strategy
MarcoGorelli Aug 12, 2024
e22550b
maybe a bit of separation was all we needed
MarcoGorelli Aug 12, 2024
6ff8e99
what if we install cython
MarcoGorelli Aug 12, 2024
a9861a0
only use extra index url on linux?
MarcoGorelli Aug 12, 2024
8d329e4
include fi
MarcoGorelli Aug 12, 2024
c7a31f0
regular old-fashioned pip install
MarcoGorelli Aug 12, 2024
f3186b5
revert requirements-ci.txt change
MarcoGorelli Aug 12, 2024
dfd25fa
Merge remote-tracking branch 'upstream/main' into altair
MarcoGorelli Aug 13, 2024
5bd4fb4
Merge remote-tracking branch 'upstream/main' into altair
MarcoGorelli Aug 13, 2024
0f5e803
install typing-extensions _before_ the other requirements
MarcoGorelli Aug 13, 2024
df98a2e
minor updates
MarcoGorelli Aug 14, 2024
8d786e1
extra comment
MarcoGorelli Aug 14, 2024
4491d83
remove unused type alias
MarcoGorelli Aug 14, 2024
fb0438d
Merge remote-tracking branch 'upstream/main' into altair
MarcoGorelli Aug 14, 2024
9266adb
lint
MarcoGorelli Aug 14, 2024
615bc9f
Merge remote-tracking branch 'upstream/main' into altair
MarcoGorelli Aug 14, 2024
6f4d85a
:truck: 1.5.0 => 1.6.0
MarcoGorelli Aug 14, 2024
3942e62
Merge remote-tracking branch 'upstream/main' into altair
MarcoGorelli Aug 16, 2024
4bc052f
wip
MarcoGorelli Aug 17, 2024
e043a35
add Series.plot
MarcoGorelli Aug 17, 2024
65fae74
Merge remote-tracking branch 'upstream/main' into altair
MarcoGorelli Aug 18, 2024
ec57fb0
add Series.plot
MarcoGorelli Aug 18, 2024
28ac596
add missing page, add `scatter` as alias
MarcoGorelli Aug 18, 2024
d5167f1
lint
MarcoGorelli Aug 18, 2024
efed5c9
rename, better bar plot example, simplify
MarcoGorelli Aug 18, 2024
381d481
assorted improvements
MarcoGorelli Aug 18, 2024
ea018b5
assorted docs and typing improvements
MarcoGorelli Aug 19, 2024
40a0e31
Merge remote-tracking branch 'upstream/main' into altair
MarcoGorelli Aug 26, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 6 additions & 1 deletion .github/workflows/benchmark.yml
Original file line number Diff line number Diff line change
Expand Up @@ -48,7 +48,12 @@ jobs:

- name: Install Python dependencies
working-directory: py-polars
run: uv pip install --compile-bytecode -r requirements-dev.txt -r requirements-ci.txt --verbose
run: |
# Install typing-extensions separately whilst the `--extra-index-url` in `requirements-ci.txt`
# doesn't have an up-to-date typing-extensions, see
# https://github.com/astral-sh/uv/issues/6028#issuecomment-2287232150
uv pip install -U typing-extensions
uv pip install --compile-bytecode -r requirements-dev.txt -r requirements-ci.txt --verbose

- name: Set up Rust
run: rustup show
Expand Down
7 changes: 6 additions & 1 deletion .github/workflows/test-coverage.yml
Original file line number Diff line number Diff line change
Expand Up @@ -103,7 +103,12 @@ jobs:

- name: Install Python dependencies
working-directory: py-polars
run: uv pip install --compile-bytecode -r requirements-dev.txt -r requirements-ci.txt --verbose
run: |
# Install typing-extensions separately whilst the `--extra-index-url` in `requirements-ci.txt`
# doesn't have an up-to-date typing-extensions, see
# https://github.com/astral-sh/uv/issues/6028#issuecomment-2287232150
uv pip install -U typing-extensions
uv pip install --compile-bytecode -r requirements-dev.txt -r requirements-ci.txt --verbose

- name: Set up Rust
run: rustup component add llvm-tools-preview
Expand Down
4 changes: 4 additions & 0 deletions .github/workflows/test-python.yml
Original file line number Diff line number Diff line change
Expand Up @@ -66,6 +66,10 @@ jobs:
- name: Install Python dependencies
run: |
pip install uv
# Install typing-extensions separately whilst the `--extra-index-url` in `requirements-ci.txt`
# doesn't have an up-to-date typing-extensions, see
# https://github.com/astral-sh/uv/issues/6028#issuecomment-2287232150
uv pip install -U typing-extensions
uv pip install --compile-bytecode -r requirements-dev.txt -r requirements-ci.txt --verbose

- name: Set up Rust
Expand Down
3 changes: 2 additions & 1 deletion docs/requirements.txt
Original file line number Diff line number Diff line change
@@ -1,10 +1,11 @@
altair
pandas
pyarrow
graphviz
hvplot
matplotlib
seaborn
plotly
altair
numba
numpy

Expand Down
112 changes: 69 additions & 43 deletions docs/src/python/user-guide/misc/visualization.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,30 +3,33 @@

path = "docs/data/iris.csv"

df = pl.scan_csv(path).group_by("species").agg(pl.col("petal_length").mean()).collect()
df = pl.read_csv(path)
Copy link
Collaborator Author

@MarcoGorelli MarcoGorelli Aug 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've replaced the current example with a more colourful one

A couple of screenshots to demonstrate:

Screenshot 2024-08-11 160500

Screenshot 2024-08-11 160525

print(df)
# --8<-- [end:dataframe]

"""
# --8<-- [start:hvplot_show_plot]
df.plot.bar(
x="species",
y="petal_length",
import hvplot.polars
df.hvplot.scatter(
x="sepal_width",
y="sepal_length",
by="species",
width=650,
)
# --8<-- [end:hvplot_show_plot]
"""

# --8<-- [start:hvplot_make_plot]
import hvplot
import hvplot.polars

plot = df.plot.bar(
x="species",
y="petal_length",
plot = df.hvplot.scatter(
x="sepal_width",
y="sepal_length",
by="species",
width=650,
)
hvplot.save(plot, "docs/images/hvplot_bar.html")
with open("docs/images/hvplot_bar.html", "r") as f:
hvplot.save(plot, "docs/images/hvplot_scatter.html")
with open("docs/images/hvplot_scatter.html", "r") as f:
chart_html = f.read()
print(f"{chart_html}")
# --8<-- [end:hvplot_make_plot]
Expand All @@ -35,7 +38,12 @@
# --8<-- [start:matplotlib_show_plot]
import matplotlib.pyplot as plt

plt.bar(x=df["species"], height=df["petal_length"])
fig, ax = plt.subplots()
ax.scatter(
x=df["sepal_width"],
y=df["sepal_length"],
c=df["species"].cast(pl.Categorical).to_physical(),
)
# --8<-- [end:matplotlib_show_plot]
"""

Expand All @@ -44,34 +52,43 @@

import matplotlib.pyplot as plt

plt.bar(x=df["species"], height=df["petal_length"])
plt.savefig("docs/images/matplotlib_bar.png")
with open("docs/images/matplotlib_bar.png", "rb") as f:
fig, ax = plt.subplots()
ax.scatter(
x=df["sepal_width"],
y=df["sepal_length"],
c=df["species"].cast(pl.Categorical).to_physical(),
)
fig.savefig("docs/images/matplotlib_scatter.png")
with open("docs/images/matplotlib_scatter.png", "rb") as f:
png = base64.b64encode(f.read()).decode()
print(f'<img src="data:image/png;base64, {png}"/>')
# --8<-- [end:matplotlib_make_plot]

"""
# --8<-- [start:seaborn_show_plot]
import seaborn as sns
sns.barplot(
sns.scatterplot(
df,
x="species",
y="petal_length",
x="sepal_width",
y="sepal_length",
hue="species",
)
# --8<-- [end:seaborn_show_plot]
"""

# --8<-- [start:seaborn_make_plot]
import seaborn as sns
import matplotlib.pyplot as plt

sns.barplot(
fig, ax = plt.subplots()
ax = sns.scatterplot(
df,
x="species",
y="petal_length",
x="sepal_width",
y="sepal_length",
hue="species",
)
plt.savefig("docs/images/seaborn_bar.png")
with open("docs/images/seaborn_bar.png", "rb") as f:
fig.savefig("docs/images/seaborn_scatter.png")
with open("docs/images/seaborn_scatter.png", "rb") as f:
png = base64.b64encode(f.read()).decode()
print(f'<img src="data:image/png;base64, {png}"/>')
# --8<-- [end:seaborn_make_plot]
Expand All @@ -80,51 +97,60 @@
# --8<-- [start:plotly_show_plot]
import plotly.express as px

px.bar(
px.scatter(
df,
x="species",
y="petal_length",
width=400,
x="sepal_width",
y="sepal_length",
color="species",
width=650,
)
# --8<-- [end:plotly_show_plot]
"""

# --8<-- [start:plotly_make_plot]
import plotly.express as px

fig = px.bar(
fig = px.scatter(
df,
x="species",
y="petal_length",
x="sepal_width",
y="sepal_length",
color="species",
width=650,
)
fig.write_html("docs/images/plotly_bar.html", full_html=False, include_plotlyjs="cdn")
with open("docs/images/plotly_bar.html", "r") as f:
fig.write_html(
"docs/images/plotly_scatter.html", full_html=False, include_plotlyjs="cdn"
)
with open("docs/images/plotly_scatter.html", "r") as f:
chart_html = f.read()
print(f"{chart_html}")
# --8<-- [end:plotly_make_plot]

"""
# --8<-- [start:altair_show_plot]
import altair as alt

alt.Chart(df, width=700).mark_bar().encode(x="species:N", y="petal_length:Q")
(
df.plot.point(
x="sepal_length",
y="sepal_width",
color="species",
)
.properties(width=500)
.configure_scale(zero=False)
)
# --8<-- [end:altair_show_plot]
"""

# --8<-- [start:altair_make_plot]
import altair as alt

chart = (
alt.Chart(df, width=600)
.mark_bar()
.encode(
x="species:N",
y="petal_length:Q",
df.plot.point(
x="sepal_length",
y="sepal_width",
color="species",
)
.properties(width=500)
.configure_scale(zero=False)
)
chart.save("docs/images/altair_bar.html")
with open("docs/images/altair_bar.html", "r") as f:
chart.save("docs/images/altair_scatter.html")
with open("docs/images/altair_scatter.html", "r") as f:
chart_html = f.read()
print(f"{chart_html}")
# --8<-- [end:altair_make_plot]
58 changes: 43 additions & 15 deletions docs/user-guide/misc/visualization.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,17 +2,48 @@

Data in a Polars `DataFrame` can be visualized using common visualization libraries.

We illustrate plotting capabilities using the Iris dataset. We scan a CSV and then do a group-by on the `species` column and get the mean of the `petal_length`.
We illustrate plotting capabilities using the Iris dataset. We read a CSV and then
plot one column against another, colored by a yet another column.

{{code_block('user-guide/misc/visualization','dataframe',[])}}

```python exec="on" result="text" session="user-guide/misc/visualization"
--8<-- "python/user-guide/misc/visualization.py:dataframe"
```

## Built-in plotting with hvPlot
## Built-in plotting with Altair

Polars has a `plot` method to create interactive plots using [hvPlot](https://hvplot.holoviz.org/).
Polars has a `plot` method to create plots using [Altair](https://altair-viz.github.io/):

{{code_block('user-guide/misc/visualization','altair_show_plot',[])}}

```python exec="on" session="user-guide/misc/visualization"
--8<-- "python/user-guide/misc/visualization.py:altair_make_plot"
```

This is shorthand for:

```python
import altair as alt

(
alt.Chart(df).mark_point().encode(
x="sepal_length",
y="sepal_width",
color="species",
)
.properties(width=500)
.configure_scale(zero=False)
)
```

and is only provided for convenience, and to signal that Altair is known to work well with
Polars.

## hvPlot

If you import `hvplot.polars`, then it registers a `hvplot`
method which you can use to create interactive plots using [hvPlot](https://hvplot.holoviz.org/).

{{code_block('user-guide/misc/visualization','hvplot_show_plot',[])}}

Expand All @@ -22,18 +53,23 @@ Polars has a `plot` method to create interactive plots using [hvPlot](https://hv

## Matplotlib

To create a bar chart we can pass columns of a `DataFrame` directly to Matplotlib as a `Series` for each column. Matplotlib does not have explicit support for Polars objects but Matplotlib can accept a Polars `Series` because it can convert each Series to a numpy array, which is zero-copy for numeric
data without null values.
To create a scatter plot we can pass columns of a `DataFrame` directly to Matplotlib as a `Series` for each column.
Matplotlib does not have explicit support for Polars objects but can accept a Polars `Series` by
converting it to a NumPy array (which is zero-copy for numeric data without null values).

Note that because the column `'species'` isn't numeric, we need to first convert it to numeric values so that
it can be passed as an argument to `c`.

{{code_block('user-guide/misc/visualization','matplotlib_show_plot',[])}}

```python exec="on" session="user-guide/misc/visualization"
--8<-- "python/user-guide/misc/visualization.py:matplotlib_make_plot"
```

## Seaborn, Plotly & Altair
## Seaborn and Plotly

[Seaborn](https://seaborn.pydata.org/), [Plotly](https://plotly.com/) & [Altair](https://altair-viz.github.io/) can accept a Polars `DataFrame` by leveraging the [dataframe interchange protocol](https://data-apis.org/dataframe-api/), which offers zero-copy conversion where possible.
[Seaborn](https://seaborn.pydata.org/) and [Plotly](https://plotly.com/) can accept a Polars `DataFrame` by leveraging the [dataframe interchange protocol](https://data-apis.org/dataframe-api/), which offers zero-copy conversion where possible. Note
that the protocol does not support all Polars data types (e.g. `List`) so your mileage may vary here.

### Seaborn

Expand All @@ -50,11 +86,3 @@ data without null values.
```python exec="on" session="user-guide/misc/visualization"
--8<-- "python/user-guide/misc/visualization.py:plotly_make_plot"
```

### Altair

{{code_block('user-guide/misc/visualization','altair_show_plot',[])}}

```python exec="on" session="user-guide/misc/visualization"
--8<-- "python/user-guide/misc/visualization.py:altair_make_plot"
```
Loading