Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

You know, that worked accidentally. We should explore this more in Awkward itself to make sure that it isn't doing misleading things. #218

Closed
jpivarski opened this issue Apr 5, 2023 · 1 comment

Comments

@jpivarski
Copy link
Collaborator

          You know, that worked accidentally. We should explore this more in Awkward itself to make sure that it isn't doing misleading things.

Like, ak.zip's dict case just calls ak.to_layout on each of the values of the dict.

https://github.com/scikit-hep/awkward/blob/6a24ed0d436bcd158f634d9bd9f6d664fff6bd2b/src/awkward/operations/ak_zip.py#L174-L190

If it sees another dict, that might mean that it switches over to ak.from_iter, which wouldn't be what you want if the arrays are large. It also wouldn't be the right type: struct of arrays versus array of structs.

Yeah, as I thought:

>>> array = ak.zip({
...     "a": {"b": np.arange(10, dtype=np.int8), "c": np.arange(10, dtype=np.int16)},
...     "d": {"e": np.arange(10, dtype=np.int32), "f": np.arange(10, dtype=np.float32)},
... })
>>> array.show(type=True)
type: {
    a: {
        b: var * int64,
        c: var * int64
    },
    d: {
        e: var * int64,
        f: var * float64
    }
}
{a: {b: [0, 1, 2, 3, 4, ..., 6, 7, 8, 9], c: [0, 1, ..., 9]},
 d: {e: [0, 1, 2, 3, 4, ..., 6, 7, 8, 9], f: [0, 1, ..., 9]}}

whereas

>>> array2 = ak.zip({"b": np.arange(10, dtype=np.int8), "c": np.arange(10, dtype=np.int16)})
>>> array2.show(type=True)
type: 10 * {
    b: int8,
    c: int16
}
[{b: 0, c: 0},
 {b: 1, c: 1},
 {b: 2, c: 2},
 {b: 3, c: 3},
 {b: 4, c: 4},
 {b: 5, c: 5},
 {b: 6, c: 6},
 {b: 7, c: 7},
 {b: 8, c: 8},
 {b: 9, c: 9}]

You see all of the integer types turn into int64 and float32 into float64 because ak.from_iter treats them as Python int and float, which loses dtype. You also see a different structure for the nested object.

I'll raise this as an issue in Awkward, that ak.zip seems to work recursively, but it doesn't, and what it's doing now is misleading. We can continue the discussion there about what the right behavior is—it's not obvious to me. Treating any expected array-like uniformly with ak.to_layout is good for consistency, but your interpretation is natural, too.

Originally posted by @jpivarski in #213 (comment)

@jpivarski
Copy link
Collaborator Author

Darn, my mistake.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant