-
Version of Awkward Array2.6.4 Description and code to reproduceThe value of >>> data = ak.Array({"a": [1, 2, 3], "b": [3, 4, 5]})
>>> data.layout.is_record
True
>>> data[data.b == 3].layout.is_record
False is this intended? |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment
-
This is not a bug. When you look at the >>> data = ak.Array({"a": [1, 2, 3], "b": [3, 4, 5]})
>>> data
<Array [{a: 1, b: 3}, {...}, {a: 3, b: 5}] type='3 * {a: int64, b: int64}'> The high-level type is an array of records: >>> data2 = data[data.b == 3]
>>> data2
<Array [{a: 1, b: 3}] type='1 * {a: int64, b: int64}'> The high-level type is an array of records: But whereas the layout of >>> data.layout
<RecordArray is_tuple='false' len='3'>
<content index='0' field='a'>
<NumpyArray dtype='int64' len='3'>[1 2 3]</NumpyArray>
</content>
<content index='1' field='b'>
<NumpyArray dtype='int64' len='3'>[3 4 5]</NumpyArray>
</content>
</RecordArray> The layout of >>> data2.layout
<IndexedArray len='1'>
<index><Index dtype='int64' len='1'>
[0]
</Index></index>
<content><RecordArray is_tuple='false' len='3'>
<content index='0' field='a'>
<NumpyArray dtype='int64' len='3'>[1 2 3]</NumpyArray>
</content>
<content index='1' field='b'>
<NumpyArray dtype='int64' len='3'>[3 4 5]</NumpyArray>
</content>
</RecordArray></content>
</IndexedArray> The IndexedArray was introduced by the slice for performance reasons: slicing a record doesn't immediately make new arrays of all of its fields: it makes a view of the sliced items. This is because some RecordArrays are very wide, with thousands of fields, and after slicing it, you might only be interested in a few of those fields. The effort of slicing all of them if you're not going to look at all of them would be wasted, so our low-level implementation choice is to do this "lazy slice" with an IndexedArray (#261). |
Beta Was this translation helpful? Give feedback.
This is not a bug. When you look at the
layout
, you're looking at the low-level memory layout ("how the data are implemented", rather than "what the data mean"). Some of the layout nodes are invisible to high-level types.The high-level type is an array of records:
3 * {a: int64, b: int64}
.The high-level type is an array of records:
1 * {a: int64, b: int64}
.But whereas the layout of
data
is a RecordArray of two NumpyArrays,