-
I'm trying to write a robust and zero-copy de-serializer of 2D jagged numerical vectors, written to disk according to this specification, into Awkward 2.x arrays. This is what I have written so far: def _ak_from_buffers(flattened_data: NDArray, cumulative_length: NDArray):
# should also run ak.is_valid() before returning?
return ak.from_buffers(
{
"class": "ListOffsetArray",
"offsets": "i64", # how to get this from cumulative_length.dtype?
"content": {
"class": "NumpyArray",
"primitive": ak.types.numpytype.dtype_to_primitive(flattened_data.dtype),
"form_key": "node1",
},
"form_key": "node0",
},
len(cumulative_length) - 1,
{
"node0-offsets": cumulative_length, # how to prepend the zero but avoid copy?
"node1-data": flattened_data,
},
) and I have a couple of questions (also inlined in the code):
|
Beta Was this translation helpful? Give feedback.
Answered by
agoose77
Nov 22, 2023
Replies: 1 comment 4 replies
-
Hi @gipert, interesting discussion!
offsets = np.empty(len(cumulative_length) + 1, dtype=cumulative_length.dtype)
offsets[1:] = cumulative_length
offsets[0] = 0
layout = ak.contents.ListOffsetArray(
offsets=ak.index.Index(offsets),
content=ak.contents.NumpyArray(flattened_data)
) |
Beta Was this translation helpful? Give feedback.
4 replies
Answer selected by
gipert
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Hi @gipert, interesting discussion!
int64
will often be copied once the array is manipulated.ak.from_buffers
here. You can also directly construct the layout nodes using our layout system. This would mean that you didn't need to figure out the index-type associated with a dtype.e.g.