Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance of xt::nanmean #201

Open
zhujun98 opened this issue Aug 3, 2019 · 1 comment
Open

Performance of xt::nanmean #201

zhujun98 opened this issue Aug 3, 2019 · 1 comment

Comments

@zhujun98
Copy link
Contributor

zhujun98 commented Aug 3, 2019

In Python I have a 3D numpy array which is a stack of image data and I would like to calculate nanmean. I tried two different ways using xtensor-python:

template<typename T>
inline xt::pytensor<T, 2> nanmeanImages(const xt::pytensor<T, 3>& arr) {

  auto shape = arr.shape();
  auto mean = xt::pytensor<T, 2>({shape[1], shape[2]});
  
  for (std::size_t j=0; j < shape[1]; ++j) {
    for (std::size_t k=0; k < shape[2]; ++k) {
      T count = 0;
      T sum = 0;
      for (std::size_t i=0; i < shape[0]; ++i) {
        auto v = arr(i, j, k);
        if (! std::isnan(v)) {
          count += T(1);
          sum += v;
        }
      }
      mean(j, k) = sum / count;
    }
  }

  return mean;
}


template<typename T>
inline xt::pytensor<T, 2> nanmeanImagesOld(const xt::pytensor<T, 3>& arr) {
  return xt::nanmean<T>(arr, {0}, xt::evaluation_strategy::immediate);
}

and benchmarked with the following Python code:

        data = np.ones((64, 1024, 1024), dtype=np.float32)
        data[::2, ::2, ::2] = np.nan

        t0 = time.perf_counter()
        ret_cpp = xt_nanmean_images(data)
        dt_cpp = time.perf_counter() - t0

        t0 = time.perf_counter()
        ret_cpp = xt_nanmean_images_old(data)
        dt_cpp_old = time.perf_counter() - t0

The result is

nanmean_images with <class 'numpy.float32'> - dt (cpp): 0.1258, dt (cpp) old: 0.1573

I guess the first one is faster because xt::nanmean uses xt::nansum, xt::count_nonnan which needs to loop over the big array twice. Also xt::count_nonnan is twice as expensive as xt::nansum for whatever reason. I compile xtensor with xsimd and do not see any improvement. But I am quite new to xsimd and not sure whether I did everything correctly.

I would like to further improve the performance by using tbb. I am not sure whether it is the best way to go and would like to ask your opinion. Thanks a lot!

@wolfv
Copy link
Member

wolfv commented Aug 3, 2019

Hi @zhujun98! Thanks for the bug report.

Indeed there is a problem with the performance of mean.

We have a fix in this PR: xtensor-stack/xtensor#1627

I think I will take out the fix for mean by the beginning of next week so that we have that ready to go since the PR is blocked on some TBB issues that were not that straightforward to fix unfortunately.

Cheers!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants