Replies: 6 comments
-
Actually, all the array transformations are put under an While possible, it should be clear that this is a big task. A new "layer 3" in Rust or golang would be responsible for defining array types, allocating, tracking ownership, and deleting array memory. When Rust or golang call the "layer 4" functions, the arrays already have to be made—the shared code only fills preallocated arrays. If Rust or golang have C++ bindings (i.e. not FFI, but something more specific, an equivalent of pybind11 or Cxx.jl), then less work would have to be done. You could put a Rust or golang wrapper over "layer 3" the same way that it's done in Awkward with pybind11. In most languages, though, that's not an option. FFI is there common denominator, not any C++ interface. |
Beta Was this translation helpful? Give feedback.
-
Oh, a couple of other things—while there would be less to write if building a new "layer 2" (on top of C++, equivalent to pybind11 or Cxx.jl) than a new "layer 3" (on top of the The other thing is that you don't need to wait for Awkward to be ready. It's not ready for general data analysis users, but it's sufficiently well-defined to start building new interfaces. We're at the stage where all the array types have been defined for general use (though a few more will be needed for interface with ROOT files and Arrow buffers) and we're writing functions on all of these types. We can discuss the particulars, but there's no need to wait. |
Beta Was this translation helpful? Give feedback.
-
@jpivarski This is great news! Thanks for planning this out and giving so much detail. For Rust, I'd probably try to use the new cxx library to do type-safe interop between C++ and Rust. |
Beta Was this translation helpful? Give feedback.
-
I'd like to see how this works out! To help with your planning, I can point out exactly which parts of C++ memory management we use. All of the arrays of primitive numbers and booleans, that which is directly shared with NumPy, are in It's often the case that multiple array nodes use the same buffer, so they share the Integer arrays that the C++ needs to understand to do its work (e.g. the All leaves of the nested structure are either NumpyArray (intended for Python, but can be used in C++) or RawArray (only intended for C++; a templated and header-only class). These carry All array nodes are passed around as Each of these nodes can optionally carry Identities, which are two-dimensional arrays fulfilling a similar role to Pandas indexes. They're optional (the Each array node also passes through a string → JSON mapping that indicates how it should be interpreted in the high-level interface. For instance, internally, strings are just ListOffsetArrays of 8-bit NumpyArrays, but they carry a parameter The identities and parameters are the only mutable attributes that the array nodes have. All the rest are immutable, and the data in the arrays themselves should be regarded as immutable. Actually, there's one exception to that: RecordArray will have mutable To make structures, we build them up with an append-only FillableArray, which is dynamically typed. It discovers the type of the data as it's being filled. Faster filling when the type is known or partially known in advance will be special cases, not filled with FillableArray (see root.h for a first example of filling data when it's known to be
I don't know how well this memory model maps onto Rust, however: maybe some kind of |
Beta Was this translation helpful? Give feedback.
-
On second thought, Since these nodes are lightweight handles pointing to large arrays (shared wherever possible), there's no performance advantage to modifying things in place. There can be a conceptual advantage to high-level users ("I just want to add this one field!"), but that mutability can be concentrated in a layer that's easy to debug. |
Beta Was this translation helpful? Give feedback.
-
This isn't any less relevant, but it's not a to-do item and I'd like to close it to see the list of remaining work more clearly. If you're planning to build other-language versions on top of either the C layer or the C++ layer, you don't have to wait for me. Awkward 1.0 is "doneish" right now (and it will become more "doneish" as time goes on). |
Beta Was this translation helpful? Give feedback.
-
I'm interested in using this library (when it's ready) from Rust and Golang.
Are there plans for this? I think it would mean directly using the C++ headers, and skipping anything that uses Python.
Beta Was this translation helpful? Give feedback.
All reactions