Any equivalent operation like np.argwhere? #3015
Replies: 6 comments 5 replies
-
The best answer to this is probably "what will you do with that index?". Awkward Array has a powerful ragged indexing system that supports structure-preserving integer/boolean indices. We touch on using these indexing arrays in our user guide. The semantics of this "ragged indexing" are also mentioned in the API reference (for now). For example, if you need to find the indices of even numbers: import awkward as ak
array = ak.Array(
[
[1, 2, 3],
[],
[4, 5],
[6, 7, 8, 9],
]
)
ix = ak.local_index(array)
ix_even = ix[array % 2 == 0] This array_even = array[array % 2 == 0] |
Beta Was this translation helpful? Give feedback.
-
@agoose77 is right; you might be trying to find a long way to do what could be done with a slice. But assuming that you need positions of matching indexes as tuples, it can be done in your two-dimensional example like this: import numpy as np
import awkward as ak
array = ak.Array(
[
[1, 2, 3],
[],
[4, 5],
[6, 7, 8, 9],
]
)
second = ak.local_index(array)[array % 2 == 0] # [[1], [], [0], [0, 2]], what @agoose77 suggested
first = np.arange(len(second)) # [ 0 , 1, 2 , 3 ]
first, _ = ak.broadcast_arrays(first, second) # [[0], [], [2], [3, 3]]
result = ak.flatten(ak.zip((first, second))) # [(0, 1), (2, 0), (3, 0), (3, 2)] This technique would require a different number of steps for each number of dimensions. Also, if you actually wanted lists (contiguous data) instead of tuples (not contiguous), you can do it by concatenating instead of zipping: ak.flatten(ak.concatenate((first[..., np.newaxis], second[..., np.newaxis]), axis=-1))
# [[0, 1], [2, 0], [3, 0], [3, 2]] (Which one you want depends on how you're going to use it...) |
Beta Was this translation helpful? Give feedback.
-
Thank you so much for replying so quickly! Yeah, it's all about what I'm gonna do with these indexes. Let me explain more: Given the output root file from Delphes in Madgraph5, I want to rebuild fatjets according to constituents, which is basically the same thing asked in stack overflow in 2019. @jpivarski already answered that time. But it's still kind of hard to do it step by step:
The questioner John Karkas gave some hints in the reply:
It hits me that there does exist a subbranch named "fUniqueID". After a few try, I find that the "refs" of "FatJet.Constituents" stores the unique id that corresponds to a I need to search If it's a "one element's location in one array", I would use import numpy as np
import uproot
import awkward as ak
import vector
vector.register_awkward()
filepath = "../data/pp2wz/Events/run_01_decayed_1/tag_1_delphes_events.root"
events = uproot.open(f"{filepath}:Delphes")
all_ref_ids = events["FatJet.Constituents"].array()["refs"]
all_tracks = events["EFlowTrack.fUniqueID"].array()
all_photons = events["EFlowPhoton.fUniqueID"].array()
all_neutral_hadrons = events["EFlowNeutralHadron.fUniqueID"].array()
all_tracks = ak.zip(
{
"pt": events["EFlowTrack.PT"].array(),
"eta": events["EFlowTrack.Eta"].array(),
"phi": events["EFlowTrack.Phi"].array(),
"mass": events["EFlowTrack.Mass"].array(),
"id": all_tracks,
},
with_name="Momentum4D",
)
all_photons = ak.zip(
{
"pt": events["EFlowPhoton.ET"].array(),
"eta": events["EFlowPhoton.Eta"].array(),
"phi": events["EFlowPhoton.Phi"].array(),
"mass": ak.zeros_like(events["EFlowPhoton.ET"].array()),
"id": all_photons,
},
with_name="Momentum4D",
)
all_neutral_hadrons = ak.zip(
{
"pt": events["EFlowNeutralHadron.ET"].array(),
"eta": events["EFlowNeutralHadron.Eta"].array(),
"phi": events["EFlowNeutralHadron.Phi"].array(),
"mass": ak.zeros_like(events["EFlowNeutralHadron.ET"].array()),
"id": all_neutral_hadrons,
},
with_name="Momentum4D",
)
all_constituents = []
for tracks, photons, neutral_hadrons, ref_ids in zip(
all_tracks, all_photons, all_neutral_hadrons, all_ref_ids
):
constituents = []
for ref_id in ref_ids:
matched_tracks = tracks[np.isin(tracks.id, ref_id)]
matched_photons = photons[np.isin(photons.id, ref_id)]
matched_neutral_hadrons = neutral_hadrons[np.isin(neutral_hadrons.id, ref_id)]
assert len(ref_id) == (
len(matched_tracks) + len(matched_photons) + len(matched_neutral_hadrons)
)
constituents.append(
ak.concatenate([matched_tracks, matched_photons, matched_neutral_hadrons])
)
all_constituents.append(constituents)
all_constituents = ak.from_iter(all_constituents) The
The type could be explained as: 100 events, var jets, var constituents. The last var indicates the constituents, since they're reclustered to build the fatjet, it's always 1: import fastjet as fj
particles = ak.zip(
{
"pt": all_constituents.pt,
"eta": all_constituents.eta,
"phi": all_constituents.phi,
"mass": all_constituents.mass,
},
with_name="Momentum4D",
)
jet_def = fj.JetDefinition(fj.antikt_algorithm, 100.0)
cluster = fj.ClusterSequence(particles, jet_def)
jets = cluster.inclusive_jets()
# To index, for example, the second fatjet in the seventh event
# jets[6, 1, 0]
# the last index is always 0 I know it's still not "perfect": I have to loop twice: one for event, and one for fatjets since it's a recluster not a cluster one (I'll post a cluster one to remove one loop). It's not a long time after I use |
Beta Was this translation helpful? Give feedback.
-
This is another version of event loop that collects all constituents of one event and cluster jets: all_constituents = []
for tracks, photons, neutral_hadrons, ref_ids in zip(
all_tracks, all_photons, all_neutral_hadrons, all_ref_ids
):
matched_tracks = tracks[np.isin(tracks.id, ak.flatten(ref_ids))]
matched_photons = photons[np.isin(photons.id, ak.flatten(ref_ids))]
matched_neutral_hadrons = neutral_hadrons[
np.isin(neutral_hadrons.id, ak.flatten(ref_ids))
]
constituents = ak.concatenate(
[matched_tracks, matched_photons, matched_neutral_hadrons]
)
all_constituents.append(constituents) This version don't differentiate constituents of different jets so the type is "100 * var ...". Cluster jets as the previous one, then index for example, the second fatjet in the seventh event:
|
Beta Was this translation helpful? Give feedback.
-
Thank you so much @agoose77 @jpivarski . I've learned a lot from your codes and discussions! Here's a summary of all kinds of "in" operation that may help me figure out which one I need:
The new proposed The code snippets help me a lot how to take the advantage of @nb.njit
def my_isin(array, test_array):
results = []
for record, test_record in zip(array, test_array):
mask = np.zeros(len(record), dtype=np.bool_)
for i in range(len(record)):
for j in range(len(test_record)):
if record[i] == test_record[j]:
mask[i] = True
break
results.append(mask)
return results It assumes that inputs are stacked by records. Now the summary table is:
Since the first dimension means the total number of records, so it essentially is a I don't care about which jet the constituents belong to. I want to cluster jets from all of them just like the I also use the magical command
Below is the complete code: import awkward as ak
import fastjet as fj
import numba as nb
import numpy as np
import uproot
import vector
vector.register_awkward()
filepath = "../data/pp2wz/Events/run_01_decayed_1/tag_1_delphes_events.root"
events = uproot.open(f"{filepath}:Delphes")
all_ref_ids = events["FatJet.Constituents"].array()["refs"]
all_ref_ids = ak.flatten(all_ref_ids, axis=-1) # ---> New: 100 * var * var * int32 -> 100 * var * int32
# flatten the last dimension to remove jet group.
all_tracks = events["EFlowTrack.fUniqueID"].array()
all_photons = events["EFlowPhoton.fUniqueID"].array()
all_neutral_hadrons = events["EFlowNeutralHadron.fUniqueID"].array()
all_tracks = ak.zip(
{
"pt": events["EFlowTrack.PT"].array(),
"eta": events["EFlowTrack.Eta"].array(),
"phi": events["EFlowTrack.Phi"].array(),
"mass": events["EFlowTrack.Mass"].array(),
"id": all_tracks,
},
with_name="Momentum4D",
)
all_photons = ak.zip(
{
"pt": events["EFlowPhoton.ET"].array(),
"eta": events["EFlowPhoton.Eta"].array(),
"phi": events["EFlowPhoton.Phi"].array(),
"mass": ak.zeros_like(events["EFlowPhoton.ET"].array()),
"id": all_photons,
},
with_name="Momentum4D",
)
all_neutral_hadrons = ak.zip(
{
"pt": events["EFlowNeutralHadron.ET"].array(),
"eta": events["EFlowNeutralHadron.Eta"].array(),
"phi": events["EFlowNeutralHadron.Phi"].array(),
"mass": ak.zeros_like(events["EFlowNeutralHadron.ET"].array()),
"id": all_neutral_hadrons,
},
with_name="Momentum4D",
)
@nb.njit
def my_isin(array, test_array):
results = []
for record, test_record in zip(array, test_array):
mask = np.zeros(len(record), dtype=np.bool_)
for i in range(len(record)):
for j in range(len(test_record)):
if record[i] == test_record[j]:
mask[i] = True
break
results.append(mask)
return results
matched_tracks = all_tracks[my_isin(all_tracks.id, all_ref_ids)]
matched_photons = all_photons[my_isin(all_photons.id, all_ref_ids)]
matched_neutral_hadrons = all_neutral_hadrons[
my_isin(all_neutral_hadrons.id, all_ref_ids)
]
all_constituents = ak.concatenate(
[matched_tracks, matched_photons, matched_neutral_hadrons], axis=1
)
particles = ak.zip(
{
"pt": all_constituents.pt,
"eta": all_constituents.eta,
"phi": all_constituents.phi,
"mass": all_constituents.mass,
},
with_name="Momentum4D",
)
jet_def = fj.JetDefinition(fj.antikt_algorithm, 1.0)
cluster = fj.ClusterSequence(particles, jet_def)
jets = cluster.inclusive_jets()
print(f"pt: {jets[6, 0].pt}")
print(f"eta: {jets[6, 0].eta}")
print(f"phi: {jets[6, 0].phi}")
print(f"mass: {jets[6, 0].m}")
# pt: 563.0124337033961
# eta: 1.0792125821032512
# phi: 2.0480975146837075
# mass: 178.84373299160538 |
Beta Was this translation helpful? Give feedback.
-
Since @nb.njit
def find_1d_in_1d(a, b):
index_array = []
for record, test_record in zip(a, b):
indices = []
for i in range(len(record)):
for j in range(len(test_record)):
if record[i] == test_record[j]:
indices.append(i)
break
index_array.append(indices)
return index_array In the question example, it can be used like: import awkward as ak
array = ak.Array(
[
[1, 2, 3],
[],
[4, 5],
[6, 7, 8, 9],
]
)
test_array = ak.Array(
[
[0, 2],
[],
[4],
[6, 8],
]
)
print(find_1d_in_1d(array, test_array))
# [[1], [], [0], [0, 2]]
# Index the corresponding elements in array
print(array[find_1d_in_1d(array, test_array)])
# [[2], [], [4], [6, 8]] And this is the 2d case: @nb.njit
def find_1d_in_2d(a, b):
index_array = []
for record_a, record_b in zip(a, b):
indices_per_a = []
for i in range(len(record_b)):
indices_per_b = []
for j in range(len(record_b[i])):
for k in range(len(record_a)):
if record_b[i][j] == record_a[k]:
indices_per_b.append(k)
indices_per_a.append(indices_per_b)
index_array.append(indices_per_a)
return index_array import awkward as ak
array = ak.Array(
[
[1, 2, 3],
[],
[4, 5],
[6, 7, 8, 9],
]
)
test_array = ak.Array(
[
[[0, 2], [1, 2, 3]],
[[]],
[[4]],
[[6, 8]],
]
)
print(find_1d_in_2d(array, test_array))
# [[[1], [0, 1, 2]], [[]], [[0]], [[0, 2]]] However, after I check link, there's no "fancy index" in awkward like numpy. If |
Beta Was this translation helpful? Give feedback.
-
Hi, developers of awkward,
I'm wondering if there is some function like
np.argwhere
to find the element index?Beta Was this translation helpful? Give feedback.
All reactions