Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Work on performance issues in summary statistics due to using ArrayBase::sum #35

Open
jturner314 opened this issue Mar 31, 2019 · 1 comment
Labels
Enhancement New feature or request On Hold Issues we can't act upon due to external constraints (e.g. missing Rust feature, other libraries..)

Comments

@jturner314
Copy link
Member

jturner314 commented Mar 31, 2019

The summary statistics methods use ArrayBase::sum (directly or indirectly) in anticipation of pairwise summation (rust-ndarray/ndarray#577), which provides improved accuracy over naive summation using fold. However, to do this, some of the methods have unnecessary allocations or other performance issues.

For example, harmonic_mean is implemented like this:

self.map(|x| x.recip()).mean().map(|x| x.recip())

It's implemented this way to take advantage of .mean() (which is implemented in terms of .sum()), but this approach requires a temporary allocation for the result of self.map.

summary_statistics::means::moments has a similar issue:

for k in 2..=order {
    moments.push(a.map(|x| x.powi(k)).sum() / n_elements)
}

It's implemented this way to take advantage of .sum(). However, this implementation requires a temporary allocation for the result of a.map. Additionally, it would probably be faster to make the loop over k be the innermost loop to improve the locality of reference.

We should be able to resolve these issues with a lazy version of map combined with a pairwise summation method on that lazy map. Something like jturner314/nditer would work once it's stable.

[Edit: This issue also appears in the entropy methods.]

@jturner314 jturner314 added Enhancement New feature or request On Hold Issues we can't act upon due to external constraints (e.g. missing Rust feature, other libraries..) labels Mar 31, 2019
@vks
Copy link

vks commented Apr 8, 2019

I'm not sure whether this helps, but the average crate provides constant-memory algorithms to calculate various statistics.

@munckymagik munckymagik mentioned this issue May 28, 2019
20 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Enhancement New feature or request On Hold Issues we can't act upon due to external constraints (e.g. missing Rust feature, other libraries..)
Projects
None yet
Development

No branches or pull requests

2 participants