Memory requirement of ak.sum vs np.sum #2480
-
Following some investigations of @ekauffma into memory use, I observed that there seems to be a significant difference between using import numpy as np
import awkward as ak
x = np.random.random(size=(1000, 1000, 100))
ak_res = ak.sum(x, axis=1)
np_res = np.sum(x, axis=1)
assert np.allclose(ak_res, np_res) I am running the script above through rm -r output.bin
rm -r memray-flamegraph-output.html
memray run -o output.bin test.py
memray flamegraph output.bin
open memray-flamegraph-output.html I see a 763 MiB allocation for the array Surprisingly, the addition of I had naively assumed thus far that I am using Python 3.9.16, |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 2 replies
-
It's the other way around: if What you're doing here is calling We could install a check in all of the NumPy functions we NEP18-overload to see if there are no Awkward Arrays in the arguments and then pass it over to the equivalent NumPy function (here). But, there's an exception to that: one of the arguments might not be an Awkward Array, but it might be an iterable that should be interpreted as an Awkward Array instead of a NumPy array (e.g. it's a list of variable-length lists), in which case you would not want the NumPy version to take over. So the hardest part of this optimization would be determining if the arguments of |
Beta Was this translation helpful? Give feedback.
It's the other way around: if
np.sum
is called on anak.Array
, it checks to see ifak.Array
has a__array_function__
, which it does, and then NumPy calls that instead of its usualnp.sum
operation.What you're doing here is calling
ak.sum
on a NumPy array, which converts the NumPy array into an Awkward Array (which is not expensive) and then does general variable-length reduction on that Awkward Array (which is expensive). There are more intermediate arrays involved inak.sum
, and that will account for more memory.We could install a check in all of the NumPy functions …