-
Hi everyone, import numpy as np
import numba as nb
import awkward as ak
n_rep = 100
lists = [np.array([1, 2, 3]), np.array([1,2])] * n_rep
ak_array = ak.Array(lists)
lists_numba = nb.typed.List(lists) # numba typed list I want to take the mean with axis=1. Of course, I can do it natively with awkward arrays, but to do it with numba I have to iterate over each np.array in the list, so the function will look like something like this: @nb.njit
def nb_mean(arr):
means = np.empty(shape=len(arr))
for i in range(len(arr)):
means[i] = np.mean(arr[i])
return means
nb_mean(lists_numba) If I pass the awkward array to this function, numba raises this error: TypingError: Failed in nopython mode pipeline (step: nopython frontend)
No implementation of function Function(<function mean at 0x7f05dd254b00>) found for signature:
>>> mean(ak.ArrayView(ak.NumpyArrayType(array(int64, 1d, A), none, {}), None, ()))
There are 2 candidate implementations:
- Of which 2 did not match due to:
Overload in function 'Numpy_method_redirection.generic': File: numba/core/typing/npydecl.py: Line 379.
With argument(s): '(ak.ArrayView(ak.NumpyArrayType(array(int64, 1d, A), none, {}), None, ()))':
Rejected as the implementation raised a specific error:
TypeError: array does not have a field with key 'mean'
(https://github.com/scikit-hep/awkward-1.0/blob/1.10.1/src/awkward/_connect/_numba/layout.py#L339)
raised from /usr/local/lib/python3.7/dist-packages/awkward/_connect/_numba/layout.py:339
During: resolving callee type: Function(<function mean at 0x7f05dd254b00>)
During: typing of call at <ipython-input-27-1014e122c504> (8)
File "<ipython-input-27-1014e122c504>", line 8:
def nb_mean(arr):
<source elided>
for i in range(len(arr)):
means[i] = np.mean(arr[i])
^ The code works if I first convert the awkward array with np.array as in the following: @nb.njit
def nb_ak_mean(arr):
means = np.empty(shape=len(arr))
for i in range(len(arr)):
means[i] = np.mean(np.array(arr[i])) # convert first to np.array then apply np.mean
return means So basically, If I try to pass directly ak_array to the %timeit ak.mean(ak_array, axis=1)
%timeit nb_ak_mean(ak_array)
%timeit nb_mean(lists_numba)
1.42 ms ± 131 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) # native awkward
74 µs ± 1.33 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each) # numba + awkward
7.61 µs ± 41.6 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each) # numba + List The approach using the List object from numba seem to be significantly faster than using awkward arrays. Am I doing something wrong? |
Beta Was this translation helpful? Give feedback.
Replies: 2 comments 5 replies
-
@fspinna - which version of awkward do you use? >>> ak.__version__
'2.0.0rc1' |
Beta Was this translation helpful? Give feedback.
-
Awkward Array (and Numba's) performance can be measured approximately as If you change your array to be more like ~1,000,000 elements, then the performance difference between the two Numba jitted cases is closer to a factor of 4:1
The rest of this difference probably stems from the use of
Of course, Numba isn't always able to eek out the best performance if you use NumPy operations. A bare loop is usually the fastest solution, and can help avoid these kinds of quirks. In general, though, don't worry about this kind of performance difference if it's not bottlenecking your workflow. There are always things you can do to improve performance, e.g. handling raggedness explicitly in your kernel by flattening the array and passing in the sublist lengths, but it's not always worth the extra maintenance burden and code complexity. Footnotes
|
Beta Was this translation helpful? Give feedback.
Awkward Array (and Numba's) performance can be measured approximately as
time = initial_cost + rate*amount_of_work
. In this case, your array is too small for the work-scaling to be properly measured, i.e. the setup costs (initial_cost
) are dominating the performance.If you change your array to be more like ~1,000,000 elements, then the performance difference between the two Numba jitted cases is closer to a factor of 4:1