What makes loading the NumPy array 5x slower than the Python list? #4732

christianrickert · 2024-11-26T18:49:25Z

christianrickert
Nov 26, 2024

I've run into an unexpected performance bottleneck with pyo3 and numpy: In short, loading a large 2-dimensional ndarrayinto a Rust function takes about five times longer than a nested Python list.

# Cargo.toml

[package]
name = "pyo3np"
version = "0.1.0"
edition = "2021"

[lib]
name = "pyo3np"
crate-type = ["cdylib"]

[dependencies]
pyo3 = "0.23.1"

# pyproject.toml

[build-system]
requires = ["maturin>=1.7,<2.0"]
build-backend = "maturin"

[project]
name = "pyo3np"
requires-python = ">=3.8"
classifiers = [
    "Programming Language :: Rust",
    "Programming Language :: Python :: Implementation :: CPython",
    "Programming Language :: Python :: Implementation :: PyPy",
]
dynamic = ["version"]

[tool.maturin]
profile = "release"
features = ["pyo3/extension-module"]

// lib.rs

use pyo3::prelude::*;

#[pyfunction]
fn return_vector(_input: Vec<Vec<f64>>) -> PyResult<()> {
    Ok(())
}

#[pymodule]
fn pyo3np(m: &Bound<'_, PyModule>) -> PyResult<()> {
    m.add_function(wrap_pyfunction!(return_vector, m)?)?;
    Ok(())
}

# load_data.py

import pyo3np
import numpy as np
import time

# create random data
print("Creating data...", flush=True, end='')
np.random.seed(42)
input = np.random.rand(10, 1024 * 1024 * 100)#.tolist()
print("done.")

# load random data
print("Loading data...", flush=True, end='')
start = time.time()
pyo3np.return_vector(_input=input)
duration = time.time() - start
print(f"{duration}s.")

It takes about 25 seconds if I pass the ndarray to the return_vector Rust function:

Creating data...done.
Loading data...24.40405297279358s.
Creating data...done.
Loading data...25.127217292785645s.
Creating data...done.
Loading data...25.697571754455566s.

In contrast, it only takes about 6 seconds to load the same data as a Python list:

Creating data...done.
Loading data...5.6855690479278564s.
Creating data...done.
Loading data...5.770709037780762s.
Creating data...done.
Loading data...5.724390983581543s.

However, converting the ndarray to a Python list comes at the cost of (mostly) making a copy in memory, even without returning the processed data. - I did have a look at the simple example for rust-numpy, but it adds a significant level of verbosity to the Rust code.

I'm glad that pyo3 works out of the box with both ndarray and Python lists - even without any changes to the Rust code! But is there something I missed that could explain the difference in performance?

Answered by alex

Nov 26, 2024

The reason is that to go from a numpy array to a Vec of f64, a new pyinteger object is allocated and then unboxed for each value. With a list the pyobjects already exist

View full answer

davidhewitt · 2024-11-26T20:20:02Z

davidhewitt
Nov 26, 2024
Maintainer

I don't see anything about ndarray in your snippet, so I'm a bit confused. In general anything with 2d arrays you will want to pass them around as numpy / ndarray arrays, not vecs-of-vecs.

1 reply

christianrickert Nov 26, 2024
Author

thank you @davidhewitt

I don't see anything about ndarray in your snippet, so I'm a bit confused.

# load_data.py

np.random.rand(10, 1024 * 1024 * 100)           # type() returns: <class 'numpy.ndarray'>
np.random.rand(10, 1024 * 1024 * 100).tolist()  # type() returns: <class 'list'>

I didn't want to copy/paste for a single function call tolist(), but I understand that it is too easy to miss. - Apologies!

In general anything with 2d arrays you will want to pass them around as numpy / ndarray arrays, not vecs-of-vecs.

I would really like to continue using Vec<Vec<f64>> on the Rust side if at all possible.

However, there must be a reason (implementation detail like repetitive type checks or iterative memory allocations) why the conversion to Vec<Vec<f64>> performs significantly worse for an ndarray than for a list that I am currently missing. Quite frankly, it doesn't make sense to me.

alex · 2024-11-26T23:01:18Z

alex
Nov 26, 2024
Collaborator

The reason is that to go from a numpy array to a Vec of f64, a new pyinteger object is allocated and then unboxed for each value. With a list the pyobjects already exist

…

On Tue, Nov 26, 2024, 5:52 PM Christian Rickert ***@***.***> wrote: thank you @davidhewitt <https://github.com/davidhewitt> I don't see anything about ndarray in your snippet, so I'm a bit confused. # load_data.py np.random.rand(10, 1024 * 1024 * 100) # type() returns: <class 'numpy.ndarray'>np.random.rand(10, 1024 * 1024 * 100).tolist() # type() returns: <class 'list'> I didn't want to copy/paste for a single function call tolist(), but I understand that it is too easy to miss. - Apologies! In general anything with 2d arrays you will want to pass them around as numpy / ndarray arrays, not vecs-of-vecs. I would really like to continue using Vec<Vec<f64>> on the Rust side if at all possible. However, there must be a reason (implementation detail like repetitive type checks or iterative memory allocations) why the conversion to Vec<Vec<f64>> performs significantly worse for an ndarray than for a list that I am currently missing. Quite frankly, it doesn't make sense to me. — Reply to this email directly, view it on GitHub <#4732 (reply in thread)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAAAGBHLU6NHVFUCHLYGV3T2CT3RNAVCNFSM6AAAAABSRD7C2SVHI2DSMVQWIX3LMV43URDJONRXK43TNFXW4Q3PNVWWK3TUHMYTCMZYHA4DMMI> . You are receiving this because you are subscribed to this thread.Message ID: ***@***.***>

1 reply

christianrickert Nov 26, 2024
Author

yep, that would do it!

Thank you both @davidhewitt and @alex.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

What makes loading the NumPy array 5x slower than the Python list? #4732

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 2 comments 2 replies

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

What makes loading the NumPy array 5x slower than the Python list? #4732

christianrickert Nov 26, 2024

Replies: 2 comments · 2 replies

davidhewitt Nov 26, 2024 Maintainer

christianrickert Nov 26, 2024 Author

alex Nov 26, 2024 Collaborator

christianrickert Nov 26, 2024 Author

christianrickert
Nov 26, 2024

Replies: 2 comments 2 replies

davidhewitt
Nov 26, 2024
Maintainer

christianrickert Nov 26, 2024
Author

alex
Nov 26, 2024
Collaborator

christianrickert Nov 26, 2024
Author