-
Notifications
You must be signed in to change notification settings - Fork 89
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RecordArray with duplicated field names cause issues with to_buffers, unpickling and loading from arrow #3247
Comments
I reserve judgement about the round-trip through Arrow or Parquet (it depends on whether those libraries are okay with duplicated field names), but the round-trip through buffers seems very fixable. It already has a Form distinct >>> a = ak.Array({"a": [1, 2, 3]})[["a", "a"]]
>>> form, length, containers = ak.to_buffers(a)
>>> print(form)
{
"class": "RecordArray",
"fields": [
"a",
"a"
],
"contents": [
{
"class": "NumpyArray",
"primitive": "int64",
"form_key": "node1"
},
{
"class": "NumpyArray",
"primitive": "int64",
"form_key": "node2"
}
],
"form_key": "node0"
} The only problem is that the containers only has >>> containers
{'node1-data': array([1, 2, 3])} Ideally, It does iterate over both fields named "a": awkward/src/awkward/contents/recordarray.py Lines 334 to 337 in 1c368f1
>>> a.layout._fields
['a', 'a']
>>> a.layout._contents
[<NumpyArray dtype='int64' len='3'>[1 2 3]</NumpyArray>,
<NumpyArray dtype='int64' len='3'>[1 2 3]</NumpyArray>] The problem is The solution to this would be to enumerate over the fields by index and get the right Form child by index: for index, content in enumerate(self._contents):
content._to_buffers(
form.content(index), getkey, container, backend, byteorder
) (See definition of Form.content.) But that's exactly the same as the no-fields case above this code, so the whole if-then-else can be removed in favor of the first case: awkward/src/awkward/contents/recordarray.py Lines 328 to 337 in 1c368f1
|
Version of Awkward Array
2.6.8
Description and code to reproduce
It seems RecordArray allows for duplicated fields, e.g. when constructing via the Layout API
Another possibility this can happen is if one (like me, accidentally) repeats a record field twice when selecting multiple record fields:
Now, such arrays cause issues when
to_buffers
:one can see
node2-data
is missingProbably one could just not allow arrays with duplicated field names. I'm not sure if there is any useful application of this - when i discovered this in my code this was also actually something i did not intend to do (just accidentally repeated a field name), so writing this minimal reproducer was already worth it :)
The text was updated successfully, but these errors were encountered: