Speeding up Numba JIT-compiled functions that use ArrayBuilder #696

mova · 2021-02-02T09:07:21Z

mova
Feb 2, 2021

Hi there!
I'm trying to optimize the speed of building an ak.Array and I'm trying to use numba with the ak.array builder.
In this case, i have to pass the signature. What is the type of the ArrayBuilder?
I tried both ak.ArrayBuilder.numba_type and ak.ArrayBuilder.numba_type without success.

@numba.njit(
    (numba.types.int64, numba.types.float32[:, :, :], ak.ArrayBuilder.numba_type)
)
def map_calo_to_hits(
    eventNumber: int, caloimg: np.ndarray, builder: ak.ArrayBuilder
):
    ...

This results in

    134     (numba.types.int64, numba.types.float32[:, :, :], ak.ArrayBuilder.numba_type)
    135 )
--> 136 def map_calo_to_hits(
    137     eventNumber: int, caloimg: np.ndarray, builder: ak.ArrayBuilder
    138 ):

~/fgsim/.tox/py38/lib/python3.8/site-packages/numba/core/decorators.py in wrapper(func)
    219             with typeinfer.register_dispatcher(disp):
    220                 for sig in sigs:
--> 221                     disp.compile(sig)
    222                 disp.disable_compile()
    223         return disp

~/fgsim/.tox/py38/lib/python3.8/site-packages/numba/core/compiler_lock.py in _acquire_compile_lock(*args, **kwargs)
     30         def _acquire_compile_lock(*args, **kwargs):
     31             with self:
---> 32                 return func(*args, **kwargs)
     33         return _acquire_compile_lock
     34 

~/fgsim/.tox/py38/lib/python3.8/site-packages/numba/core/dispatcher.py in compile(self, sig)
    837         # Use counter to track recursion compilation depth
    838         with self._compiling_counter:
--> 839             args, return_type = sigutils.normalize_signature(sig)
    840             # Don't recompile if signature already exists
    841             existing = self.overloads.get(tuple(args))

~/fgsim/.tox/py38/lib/python3.8/site-packages/numba/core/sigutils.py in normalize_signature(sig)
     44         check_type(return_type)
     45     for ty in args:
---> 46         check_type(ty)
     47 
     48     return args, return_type

~/fgsim/.tox/py38/lib/python3.8/site-packages/numba/core/sigutils.py in check_type(ty)
     38     def check_type(ty):
     39         if not isinstance(ty, types.Type):
---> 40             raise TypeError("invalid type in signature: expected a type "
     41                             "instance, got %r" % (ty,))
     42 

TypeError: invalid type in signature: expected a type instance, got <property object at 0x2b993f4142c0>

Answered by jpivarski

Feb 2, 2021

Here's a way to find out: compile a function and look at its overloaded types.

>>> import awkward as ak
>>> import numba as nb
>>> @nb.njit
... def do_nothing(builder, array):
...   pass
... 
>>> do_nothing.overloads.keys()
odict_keys([])

(Nothing yet because we haven't compiled anything yet.)

>>> do_nothing(ak.ArrayBuilder(), ak.Array([[1, 2, 3], [], [4, 5]]))
>>> do_nothing.overloads.keys()
odict_keys([
    (ak.ArrayBuilderType(None),    # <---- the ArrayBuilder is opaque
     ak.ArrayView(                 # <---- the Array is specialized
         ak.ListArrayType(
             array(int64, 1d, C),
             ak.NumpyArrayType(array(int64, 1d, A), none, {}),
             none, {}),
  …

View full answer

jpivarski · 2021-02-02T15:56:00Z

jpivarski
Feb 2, 2021
Maintainer

Here's a way to find out: compile a function and look at its overloaded types.

>>> import awkward as ak
>>> import numba as nb
>>> @nb.njit
... def do_nothing(builder, array):
...   pass
... 
>>> do_nothing.overloads.keys()
odict_keys([])

(Nothing yet because we haven't compiled anything yet.)

>>> do_nothing(ak.ArrayBuilder(), ak.Array([[1, 2, 3], [], [4, 5]]))
>>> do_nothing.overloads.keys()
odict_keys([
    (ak.ArrayBuilderType(None),    # <---- the ArrayBuilder is opaque
     ak.ArrayView(                 # <---- the Array is specialized
         ak.ListArrayType(
             array(int64, 1d, C),
             ak.NumpyArrayType(array(int64, 1d, A), none, {}),
             none, {}),
         None, ())
    )
])

Calling it once compiles the function for the given argument types, and what you see above are the Numba types for the ArrayBuilder and the Array. The ArrayBuilder type is simple: it's opaque to Numba so that it can be filled with any type of data, determined at runtime. The Array type is specialized for the data that it contains so that it can generate faster code to iterate over inputs.

You can also access these types from the objects:

>>> ak.ArrayBuilder().numba_type
ak.ArrayBuilderType(None)

>>> ak.Array([[1, 2, 3], [], [4, 5]]).numba_type
ak.ArrayView(
    ak.ListArrayType(
        array(int64, 1d, C),
        ak.NumpyArrayType(array(int64, 1d, A), none, {}),
        none, {}),
    None, ())

The ArrayBuilder's Numba type is defined here, and its only parameter (showing up as None in this example) is for custom behavior, which is really only needed to pass through to the output, since you can't snapshot an ArrayBuilder inside of a JIT'ed function and therefore be able to use any custom behaviors. (Custom behaviors can define Numba specializations for Arrays, however. We're working on some for Vector.) The code to build a type for an Array is spread over the whole ak._connect._numba submodule. Also, note that this is a hidden submodule because the way Awkward Arrays get JIT'ed is an internal detail, not part of the public API.

That said, knowing the types of ArrayBuilder and Array doesn't help you make anything faster. As shown above, Numba compiles the function the first time it is called, when the full types of its arguments are known. Performing this compilation a millisecond earlier when the function is defined doesn't help anything. Maybe if you're defining the function nested within some other context so it gets redefined (and therefore recompiled) repeatedly, try to put it at global scope so that it gets defined exactly once per Python process. Numba also has cache=True to save the compiled function to avoid recompiling it when the Python process restarts, but that doesn't help if you're deploying a function to remote machines (since it caches it on the hard drive of the machine where it runs). Also, I haven't seen compilations that are slow enough to bother.

If it's runtime you're thinking about, there are things you can do. The first is to try to build the output without ArrayBuilder, if possible. ArrayBuilder is untyped—like Python, it has to discover the types of objects at runtime—and therefore it's slower compiled code than compiled code can be when its type is known. Right now, @ianna is working on a TypedArrayBuilder, which is more constrained in needing to know its type before being filled, and therefore should provide performance advantages.

For the time being, building an Awkward Array out of ak.layout.* primitives will be faster than letting ArrayBuilder do it, though it's more work to set it up and ensure that it's working. Here's an example of building a jagged array in Numba with ArrayBuilder:

In [1]: import awkward as ak
   ...: import numpy as np
   ...: import numba as nb

In [2]: @nb.njit
   ...: def build_with_ArrayBuilder(builder, num_lists, ave_length):
   ...:     for i in range(num_lists):
   ...:         builder.begin_list()
   ...:         count = np.random.poisson(ave_length)
   ...:         for j in range(count):
   ...:             builder.real(np.random.normal(0, 1))
   ...:         builder.end_list()
   ...:     return builder
   ...: 

In [3]: %%timeit
   ...: 
   ...: array = build_with_ArrayBuilder(ak.ArrayBuilder(), 10000000, 10).snapshot()
   ...: 
   ...: 
8.09 s ± 11.2 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

and here's an example of building it manually, by constructing the same structure out of ak.layout.ListOffsetArray and ak.layout.NumpyArray. Since the sizes of the output arrays have to be known and allocated before the can be filled, the program has to be rearranged to do the pass that fills offsets before the pass that fills content (but you avoid unnecessary guesses at allocation sizes). Note that in this case, Numba is only making flat NumPy arrays, which is what it was designed for.

In [4]: @nb.njit
   ...: def build_manually(num_lists, ave_length):
   ...:     offsets = np.empty(num_lists + 1, np.int64)
   ...:     offsets[0] = 0
   ...:     for i in range(num_lists):
   ...:         count = np.random.poisson(ave_length)
   ...:         offsets[i + 1] = offsets[i] + count
   ...:     content = np.empty(offsets[-1], np.float64)
   ...:     for i in range(num_lists):
   ...:         start = offsets[i]
   ...:         stop = offsets[i + 1]
   ...:         for j in range(start, stop):
   ...:             content[j] = np.random.normal(0, 1)
   ...:     return offsets, content
   ...: 

In [5]: %%timeit
   ...: 
   ...: offsets, content = build_manually(10000000, 10)
   ...: array = ak.Array(
   ...:     ak.layout.ListOffsetArray64(
   ...:         ak.layout.Index64(offsets),
   ...:         ak.layout.NumpyArray(content),
   ...:     ),
   ...: )
   ...: 
   ...: 
3.88 s ± 12.1 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

It's about a factor of 2 in this case, though the use of random number algorithms (which adds to both cases) might be significant.

As it turns out, the answer is yes: computing the random numbers accounts for nearly all of the time in the second case:

In [8]: @nb.njit
   ...: def just_random_numbers(num_lists, ave_length):
   ...:     out1 = 0     # to make sure LLVM doesn't compile them away
   ...:     out2 = 0.0
   ...:     for i in range(num_lists):
   ...:         out1 = np.random.poisson(ave_length)
   ...:         for j in range(out1):
   ...:             out2 = np.random.normal(0, 1)
   ...:     return out1, out2
   ...: 

In [9]: %%timeit
   ...: 
   ...: just_random_numbers(10000000, 10)
   ...: 
   ...: 
3.42 s ± 13.6 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

so really, building manually would be much faster than the ArrayBuilder if not for the random numbers. If the other parts of your program are significantly faster than generating random numbers, you can gain a lot from manually building the output, rather than using ArrayBuilder. If not, then don't optimize this when something else is your bottleneck.

By the way, I'm giving a tutorial on Numba tomorrow: https://github.com/jpivarski-talks/2021-02-03-pyhep-numba-tutorial

1 reply

mova May 10, 2021
Author

Thank you so much, that helped a lot.
Sorry for the very late reply.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Speeding up Numba JIT-compiled functions that use ArrayBuilder #696

{{title}}

Replies: 1 comment 1 reply

{{title}}

{{title}}

Select a reply

Speeding up Numba JIT-compiled functions that use ArrayBuilder #696

mova Feb 2, 2021

Replies: 1 comment · 1 reply

jpivarski Feb 2, 2021 Maintainer

mova May 10, 2021 Author

mova
Feb 2, 2021

Replies: 1 comment 1 reply

jpivarski
Feb 2, 2021
Maintainer

mova May 10, 2021
Author