-
Notifications
You must be signed in to change notification settings - Fork 89
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Prevent reducers like ak.sum on records (v1 and v2) #1375
Comments
Note that until this gets fixed, users will need some way to add up arbitrary numbers of vectors per list in an Awkward Array. This example code does that with Numba: import time
import numpy as np
import awkward as ak
import numba as nb
import vector
vector.register_awkward()
fake_o = {"pt": 1.1, "phi": 2.2, "eta": 3.3, "mass": 10}
array = ak.Array(
[
[fake_o, fake_o, fake_o],
[],
[fake_o, fake_o],
[fake_o, fake_o, fake_o, fake_o],
],
with_name="Momentum4D",
)
@nb.jit
def add_lists_of_vectors(array):
output_px = np.empty(len(array))
output_py = np.empty(len(array))
output_pz = np.empty(len(array))
output_E = np.empty(len(array))
for index, single_list in enumerate(array):
sum_of_vectors = vector.obj(px=0.0, py=0.0, pz=0.0, E=0.0)
for input_vector in single_list:
sum_of_vectors = sum_of_vectors + input_vector
pass
output_px[index] = sum_of_vectors.px
output_py[index] = sum_of_vectors.py
output_pz[index] = sum_of_vectors.pz
output_E[index] = sum_of_vectors.E
return output_px, output_py, output_pz, output_E
starttime = time.time()
output_px, output_py, output_pz, output_E = add_lists_of_vectors(array)
print("compilation + runtime:", time.time() - starttime)
output = ak.zip(
{"px": output_px, "py": output_py, "pz": output_pz, "E": output_E},
with_name="Momentum4D",
)
print(output) The use of Numba (or some compilation somewhere) is necessary. In some tests, I found that Numba compilation time is 1.5 seconds and runtime begins to dominate when you multiply the length of the But the pure Python runtime per fake-o group (i.e. drop the The reason the output is given as NumPy arrays for each component ( |
We implement this vector sum as a mixin method in coffea vector classes: def sum(self, axis=-1):
"""Sum an array of vectors elementwise using `x` and `y` components"""
out = awkward.zip(
{
"x": awkward.sum(self.x, axis=axis),
"y": awkward.sum(self.y, axis=axis),
},
with_name="TwoVector",
highlevel=False,
)
return awkward.Array(out, behavior=self.behavior) Unless I am missing something, I don't see why it would need to be done in numba. |
You're not missing anything; I just didn't think of that yesterday when I was talking with @kpedro88. What you have here works just as well. This issue is to prevent In your uses, is array = [
[vector_A, vector_B, vector_C],
[vector_D, None, vector_E],
[vector_F],
] I guess there are cases when you'd want to get [vector_A + vector_D + vector_F, vector_B, vectorC + vector_E] I had been thinking of having the reducer-overload exclude |
The other difference is that the Numba solution adds the vectors in their native representation, while the Coffea solution requires converting to Cartesian, correct? |
That's true, though I'm pretty sure that all of Vector's Nope! I'm wrong! If two vectors both have rho-phi azimuthal coordinates, they will be added in rho-phi that system and the result will be rho-phi. So, for instance, these vectors stay in rho-phi: >>> vector.obj(rho=1.1, phi=0.3) + vector.obj(rho=2.2, phi=-0.4)
vector.obj(rho=3.122793010504687, phi=-0.17108096774473536) But anything else will go to x-y: >>> vector.obj(rho=1.1, phi=0.3) + vector.obj(x=2.2, y=4.4)
vector.obj(x=3.2508701380381666, y=4.725072227327474) Longitudinal and temporal coordinates always convert to Cartesian before adding. (Currently. Changing that shouldn't break anything downstream if you always get coordinates from properties, rather than field values.) |
If you're summing a bunch then doing the cartesian conversion once is more efficient anyway |
The two issues described here have been split: the part about adding the ability to override reducers in v2 is now issue #1423. |
Version of Awkward Array
HEAD
Description and code to reproduce
@chadfreer just brought this up, but I seem to remember @lgray and/or @nsmith- asking about it a long time ago.
The issue is this: if we have records whose fields represent non-Cartesian data, they should not be naively added (or otherwise "reduced").
Knowing that this is a generic array, we can shrug and say, "I guess it would do that. It has no way of knowing what
rho
andphi
mean." But if it were a VectorAwkward2D, we'd really expect it to take the meaning ofrho
andphi
into consideration when it adds them, because that's what+
does:When either @lgray or @nsmith- brought this up, I said that we couldn't let third-party libraries override reducers like
ak.sum
because they are implemented (in Awkward 1.x) in C++, where it would be very difficult to apply overrides implemented in Python. But in Awkward v2, the relevant code is Python and it would be possible.But there are two issues here:
+
and notak.sum
is dangerously misleading: attempts toak.sum
records should be prevented for exactly the same reason as+
is prevented between non-overridden records. That is, apply the rationale behind Removing ufunc-broadcasting across record fields #457 to reducers, not just ufuncs.and the
special_function
would be invoked somewhere in here, for ListOffsetArray for example. (Although I thought so while talking with @chadfreer, this can't be implemented through #1126, since this is a behavior that might be discovered deep inside an array, not at the level where a Python object exists.)There would have to be some rules on when
special_function
is applied. Overrides foraxis <
the depth of the matched name (e.g."VectorArray2D"
) wouldn't make a lot of sense, since they cross list/record object boundaries:While thinking about expanding capabilities in v2 is fun, item (1) in the above list is the higher-priority item: we need to avoid misleading users into thinking that
ak.sum(geometric_vectors)
does what they might think it does. An error message is better than a wrong answer.(This issue is a rare bug-feature-policy!)
The text was updated successfully, but these errors were encountered: