Skip to content

ENH: speed up DataFrame.plot using LineCollection #61532

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
Abdelgha-4 opened this issue Jun 1, 2025 · 5 comments
Open

ENH: speed up DataFrame.plot using LineCollection #61532

Abdelgha-4 opened this issue Jun 1, 2025 · 5 comments
Assignees
Labels
Performance Memory or execution speed performance Visualization plotting

Comments

@Abdelgha-4
Copy link

Description:

When plotting line charts with many columns or rows, DataFrame.plot() currently adds one Line2D object per column. This incurs significant overhead in large datasets.

Replacing this with a single LineCollection (from matplotlib.collections) can yield substantial speedups. In my benchmarks, plotting via LineCollection was ~2.5× faster on large DataFrames with many columns.

Minimal example:

# Imports and data generation
import itertools

import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
from matplotlib.collections import LineCollection

num_rows = 500
num_cols = 2000

test_df = pd.DataFrame(np.random.randn(num_rows, num_cols).cumsum(axis=0))

# Simply using DataFrame.plot, (5.6 secs)
test_df.plot(legend=False, figsize=(12, 8))
plt.show()

# Optimized version using LineCollection (2.2 secs)
x = np.arange(len(test_df.index))
lines = [np.column_stack([x, test_df[col].values]) for col in test_df.columns]
default_colors = plt.rcParams["axes.prop_cycle"].by_key()["color"]
color_cycle = list(itertools.islice(itertools.cycle(default_colors), len(lines)))

line_collection = LineCollection(lines, colors=color_cycle)
fig, ax = plt.subplots(figsize=(12, 8))
ax.add_collection(line_collection)
ax.margins(0.05)
plt.show()

Note: the ~2.5x speed improvement is specific to dataframes with integer index. For dataframes with DatetimeIndex the actual speed improvement is ~27x when combined with the workaround here: #61398

Thank you for considering this suggestion!

@shadnikn
Copy link

shadnikn commented Jun 1, 2025

Hello, I'd like to take an opportunity to resolve this issue. Could I get assigned to it?

@shadnikn
Copy link

shadnikn commented Jun 1, 2025

take

@arthurlw arthurlw added Visualization plotting Performance Memory or execution speed performance labels Jun 3, 2025
@arthurlw
Copy link
Member

arthurlw commented Jun 3, 2025

Confirmed on main and in my testing as well. I am aware that this is relatively linked to #61398, but I do think that this should be kept open as a separate issue since they tackle different performance bottlenecks.

Thanks for the report!

@Abdelgha-4
Copy link
Author

Thank you for your confirmation @arthurlw !

FI, I've also explained my POV on why this should be a separate issue here: #61398 (comment)

@shadnikn
Copy link

shadnikn commented Jun 4, 2025

I was reviewing your idea and realized it might be a bit too much for me to handle alone. I thought at least I could come up with some performance benchmarking that can track performance issues with larger datasets. I'll free up the the assignment so another can take a crack at it, and I'm open to any feedback on the PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Performance Memory or execution speed performance Visualization plotting
Projects
None yet
Development

No branches or pull requests

3 participants