Ragged data in Awkward vs. Scipp #1663
Replies: 1 comment 11 replies
-
How does this behave in practice? Do you throw errors when attempting arithmetic operations on coordinate fields? Awkward does not have this distinction at the type level, instead we allow users to define how records should be added; we recently changed our model so that records by default do not add together, but library authors can opt-in by writing an implementation. You can look at the vector library as an example of this, although it's a bit indirect as a consequence of supporting multiple "backends" besides Awkward. The design prototype perhaps shows how vector uses Awkward Array in a trivial sense. I wonder @jpivarski @ioanaif if there are merits for inverting this logic, so that the default case works, and behavior authors can opt-out? I believe one would do this by setting an
Awkward's CPU-kernels are currently single-threaded. We don't use NumPy for many of our "kernels" (the compiled loops that underpin the high-level operations and reductions). We are working towards CUDA support, which would give us local parallelism, although we do not yet have all the requried kernels implemented there (see our roadmap). I'm not totally sure on what kind of work we'll be doing to this in the near future (as in, I genuinely don't know). @jpivarski will have an answer. Separately, we also have |
Beta Was this translation helpful? Give feedback.
-
Hi!
I am working on Scipp, which supports some types of ragged data (see Scipp Documentation on Binned Data for details).
Given the interest in pydata/xarray#4285 for using Awkward arrays with Xarray, which is conceptually very close to what
scipp.DataArray
can do, I thought it is high time to kick off a discussion between us. I think there are a lot of similarities but also differences and it would be valuable to understand those, which might at some point even lead to support of a common API or functionality. Please also have a look in https://discuss.scientific-python.org/t/ragged-array-summit/465.For starters here are high level similarities and differences I am aware of (note that I have used Awkward for little more than 10 minutes):
ak.layout.ListArray64
. We store an ND-array of start and stop indices, and a "buffer" (I think Awkward calls this "content")? I managed to convert scipp binned data into an Awkward list array without copying the buffer/content.scipp.DataArray
andscipp.Dataset
(similar toxarray.DataArray/Dataset
) as the content. This seems similar to using records in Awkward"s content, with slightly more structure. In particular, we distinguish data fields and coordinate fields (and only data fields are operated on in, e.g., arithmetic operations). This does not really matter in terms of data layout and memory handling, i.e., one could imagine swapping out the internals of how things are stored.Beta Was this translation helpful? Give feedback.
All reactions