ENH: speed up `DataFrame.plot` using `LineCollection` #61532

Abdelgha-4 · 2025-06-01T21:25:38Z

Description:

When plotting line charts with many columns or rows, DataFrame.plot() currently adds one Line2D object per column. This incurs significant overhead in large datasets.

Replacing this with a single LineCollection (from matplotlib.collections) can yield substantial speedups. In my benchmarks, plotting via LineCollection was ~2.5× faster on large DataFrames with many columns.

Minimal example:

# Imports and data generation
import itertools

import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
from matplotlib.collections import LineCollection

num_rows = 500
num_cols = 2000

test_df = pd.DataFrame(np.random.randn(num_rows, num_cols).cumsum(axis=0))

# Simply using DataFrame.plot, (5.6 secs)
test_df.plot(legend=False, figsize=(12, 8))
plt.show()

# Optimized version using LineCollection (2.2 secs)
x = np.arange(len(test_df.index))
lines = [np.column_stack([x, test_df[col].values]) for col in test_df.columns]
default_colors = plt.rcParams["axes.prop_cycle"].by_key()["color"]
color_cycle = list(itertools.islice(itertools.cycle(default_colors), len(lines)))

line_collection = LineCollection(lines, colors=color_cycle)
fig, ax = plt.subplots(figsize=(12, 8))
ax.add_collection(line_collection)
ax.margins(0.05)
plt.show()

Note: the ~2.5x speed improvement is specific to dataframes with integer index. For dataframes with DatetimeIndex the actual speed improvement is ~27x when combined with the workaround here: #61398

Thank you for considering this suggestion!

The text was updated successfully, but these errors were encountered:

shadnikn · 2025-06-01T23:40:17Z

Hello, I'd like to take an opportunity to resolve this issue. Could I get assigned to it?

shadnikn · 2025-06-01T23:48:34Z

take

arthurlw · 2025-06-03T07:09:36Z

Confirmed on main and in my testing as well. I am aware that this is relatively linked to #61398, but I do think that this should be kept open as a separate issue since they tackle different performance bottlenecks.

Thanks for the report!

Abdelgha-4 · 2025-06-03T21:04:54Z

Thank you for your confirmation @arthurlw !

FI, I've also explained my POV on why this should be a separate issue here: #61398 (comment)

shadnikn · 2025-06-04T00:23:14Z

I was reviewing your idea and realized it might be a bit too much for me to handle alone. I thought at least I could come up with some performance benchmarking that can track performance issues with larger datasets. I'll free up the the assignment so another can take a crack at it, and I'm open to any feedback on the PR.

Abdelgha-4 mentioned this issue Jun 1, 2025

BUG: Slower DataFrame.plot with DatetimeIndex #61398

Open

3 tasks

github-actions bot assigned shadnikn Jun 1, 2025

arthurlw added Visualization plotting Performance Memory or execution speed performance labels Jun 3, 2025

shadnikn mentioned this issue Jun 4, 2025

ENH: Adding DataFrame plotting benchmarks for large datasets #61546

Open

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

ENH: speed up `DataFrame.plot` using `LineCollection` #61532

ENH: speed up `DataFrame.plot` using `LineCollection` #61532

Abdelgha-4 commented Jun 1, 2025

shadnikn commented Jun 1, 2025

Uh oh!

shadnikn commented Jun 1, 2025

Uh oh!

arthurlw commented Jun 3, 2025

Uh oh!

Abdelgha-4 commented Jun 3, 2025

Uh oh!

shadnikn commented Jun 4, 2025

Uh oh!

Uh oh!

ENH: speed up DataFrame.plot using LineCollection #61532

ENH: speed up DataFrame.plot using LineCollection #61532

Comments

Abdelgha-4 commented Jun 1, 2025

shadnikn commented Jun 1, 2025

Uh oh!

shadnikn commented Jun 1, 2025

Uh oh!

arthurlw commented Jun 3, 2025

Uh oh!

Abdelgha-4 commented Jun 3, 2025

Uh oh!

shadnikn commented Jun 4, 2025

Uh oh!

ENH: speed up `DataFrame.plot` using `LineCollection` #61532

ENH: speed up `DataFrame.plot` using `LineCollection` #61532