Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: define conversion rules #59

Merged
merged 16 commits into from
Jan 25, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
14 changes: 5 additions & 9 deletions Project.toml
Original file line number Diff line number Diff line change
Expand Up @@ -4,23 +4,19 @@ authors = ["Jim Pivarski <[email protected]>", "Jerry Ling <jerry.ling@cern
version = "0.1.2"

[deps]
Conda = "8f4d0f93-b110-5947-807f-2305c1781a2d"
JSON = "682c06a0-de6a-54ab-a142-c8b1cf79cde6"
Tables = "bd369af6-aec1-5ad0-b16a-f7cc5008161c"

[weakdeps]
PythonCall = "6099a3de-0909-46bc-b1f4-468b9a2dfc0d"

[extensions]
AwkwardPythonCallExt = "PythonCall"
Tables = "bd369af6-aec1-5ad0-b16a-f7cc5008161c"

[compat]
JSON = "0.21.4"
julia = "1.9"
Tables = "1.11.1"
julia = "1.9"
PythonCall = "0.9"

[extras]
PythonCall = "6099a3de-0909-46bc-b1f4-468b9a2dfc0d"
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Moelf - it could be due to my own setup, but I did not manage to get this PythonCall dependency when running from Python REPL. This is an attempt to fix it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

before it was set up in a way such that installing AwkwardArray.jl does NOT imply needing a Python. If I were to do this, I would just make sure from the python side, it calls something to install PythonCall.

The way this PR implements it, you cannot use this package as a pure-Julia package, which may or may not be what you want.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's what we discussed today. Although it would be better (more opportunities) for this to be usable as a pure-Julia package, @ianna ran into some problems attempting to do so, and I think its primary value would be for users of both Python and Julia. (In Julia, there are already packages for some of these use-cases, such as ArrayOfArrays and StructArrays.)

So the decision between optionally depending on Python and strictly depending on Python is the result of a cost-benefit analysis. The benefit is small but positive; how big is the cost?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the way this PR does it sounds reasonable to me.

Test = "8dfed614-e22c-5e08-85e1-65c5234f0b40"

[targets]
test = ["Test", "PythonCall"]
test = ["Test"]
35 changes: 0 additions & 35 deletions ext/AwkwardPythonCallExt/AwkwardPythonCallExt.jl

This file was deleted.

5 changes: 3 additions & 2 deletions src/AwkwardArray.jl
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,8 @@ import Tables
include("./all_implementations.jl")
include("./tables.jl")

# stub for PythonCall Extention
function convert end
include("./AwkwardPythonCallExt.jl")
using .AwkwardPythonCallExt: convert


end # module AwkwardArray
116 changes: 116 additions & 0 deletions src/AwkwardPythonCallExt.jl
Original file line number Diff line number Diff line change
@@ -0,0 +1,116 @@
module AwkwardPythonCallExt
using PythonCall
using JSON
import AwkwardArray

function AwkwardArray.convert(layout::AwkwardArray.Content)::Py
form, len, containers = AwkwardArray.to_buffers(layout)

py_buffers = Dict{String,Any}()

for (key, buffer) in containers
py_buffers[key] = pyimport("numpy").asarray(buffer, dtype = pyimport("numpy").uint8)
end

pyimport("awkward").from_buffers(form, len, py_buffers)
end

function AwkwardArray.convert(array::Py)::AwkwardArray.Content
form, len, _containers = pyimport("awkward").to_buffers(array)
containers = pyconvert(Dict, _containers)

julia_buffers = Dict{String,Vector{UInt8}}()

for (key, buffer) in containers
julia_buffers[key] = reinterpret(UInt8, buffer)
end

AwkwardArray.from_buffers(
pyconvert(String, form.to_json()),
pyconvert(Int, len),
julia_buffers,
)
end

# rule functions
function pyconvert_rule_awkward_array_primitive(::Type{AwkwardArray.PrimitiveArray}, x::Py)
jpivarski marked this conversation as resolved.
Show resolved Hide resolved
array = AwkwardArray.convert(x)
return PythonCall.pyconvert_return(array)
end
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These rule functions should return pyconvert_unconverted() if the conversion was not possible.

Copy link
Member

@jpivarski jpivarski Jan 24, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is that distinct from encountering an Exception and raising it?

For example, in Python, some overloading functions let you return NotImplemented to say, "I'm not handling it; let someone else handle it," but raise XYZ is different.

For Awkward Array conversion, there isn't any Python Awkward Array that would return pyconvert_unconverted() rather than a Julia Awkward Array or vice-versa. But it could encounter an exception in the attempt (e.g. out of memory or recognizing that the array doesn't satisfy a validity rule if the Python Awkward Array was hacked together with __new__ or something).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These rule functions should must return pyconvert_unconverted() if the conversion was not possible.

https://github.com/JuliaPy/PythonCall.jl/blob/13f596d6a7d60ef7bfcee2d538cd895f59826d95/src/Convert/pyconvert.jl#L38C1-L45


function pyconvert_rule_awkward_array_empty(::Type{AwkwardArray.EmptyArray}, x::Py)
array = AwkwardArray.convert(x)
return PythonCall.pyconvert_return(array)
end

function pyconvert_rule_awkward_array_listoffset(::Type{AwkwardArray.ListOffsetArray}, x::Py)
array = AwkwardArray.convert(x)
return PythonCall.pyconvert_return(array)
end

function pyconvert_rule_awkward_array_list(::Type{AwkwardArray.ListArray}, x::Py)
array = AwkwardArray.convert(x)
return PythonCall.pyconvert_return(array)
end
function pyconvert_rule_awkward_array_regular(::Type{AwkwardArray.RegularArray}, x::Py)
array = AwkwardArray.convert(x)
return PythonCall.pyconvert_return(array)
end

function pyconvert_rule_awkward_array_record(::Type{AwkwardArray.RecordArray}, x::Py)
array = AwkwardArray.convert(x)
return PythonCall.pyconvert_return(array)
end

function pyconvert_rule_awkward_array_tuple(::Type{AwkwardArray.TupleArray}, x::Py)
array = AwkwardArray.convert(x)
return PythonCall.pyconvert_return(array)
end

function pyconvert_rule_awkward_array_indexed(::Type{AwkwardArray.IndexedArray}, x::Py)
array = AwkwardArray.convert(x)
return PythonCall.pyconvert_return(array)
end

function pyconvert_rule_awkward_array_indexedoption(::Type{AwkwardArray.IndexedOptionArray}, x::Py)
array = AwkwardArray.convert(x)
return PythonCall.pyconvert_return(array)
end

function pyconvert_rule_awkward_array_bytemasked(::Type{AwkwardArray.ByteMaskedArray}, x::Py)
array = AwkwardArray.convert(x)
return PythonCall.pyconvert_return(array)
end

function pyconvert_rule_awkward_array_bitmasked(::Type{AwkwardArray.BitMaskedArray}, x::Py)
array = AwkwardArray.convert(x)
return PythonCall.pyconvert_return(array)
end

function pyconvert_rule_awkward_array_unmasked(::Type{AwkwardArray.UnmaskedArray}, x::Py)
array = AwkwardArray.convert(x)
return PythonCall.pyconvert_return(array)
end

function pyconvert_rule_awkward_array_union(::Type{AwkwardArray.UnionArray}, x::Py)
array = AwkwardArray.convert(x)
return PythonCall.pyconvert_return(array)
end

function __init__()
PythonCall.pyconvert_add_rule("awkward.highlevel:Array", AwkwardArray.PrimitiveArray, pyconvert_rule_awkward_array_primitive, PythonCall.PYCONVERT_PRIORITY_ARRAY)
PythonCall.pyconvert_add_rule("awkward.highlevel:Array", AwkwardArray.EmptyArray, pyconvert_rule_awkward_array_empty, PythonCall.PYCONVERT_PRIORITY_ARRAY)
PythonCall.pyconvert_add_rule("awkward.highlevel:Array", AwkwardArray.ListOffsetArray, pyconvert_rule_awkward_array_listoffset, PythonCall.PYCONVERT_PRIORITY_ARRAY)
PythonCall.pyconvert_add_rule("awkward.highlevel:Array", AwkwardArray.ListArray, pyconvert_rule_awkward_array_list, PythonCall.PYCONVERT_PRIORITY_ARRAY)
PythonCall.pyconvert_add_rule("awkward.highlevel:Array", AwkwardArray.RegularArray, pyconvert_rule_awkward_array_regular, PythonCall.PYCONVERT_PRIORITY_ARRAY)
PythonCall.pyconvert_add_rule("awkward.highlevel:Array", AwkwardArray.RecordArray, pyconvert_rule_awkward_array_record, PythonCall.PYCONVERT_PRIORITY_ARRAY)
PythonCall.pyconvert_add_rule("awkward.highlevel:Array", AwkwardArray.TupleArray, pyconvert_rule_awkward_array_tuple, PythonCall.PYCONVERT_PRIORITY_ARRAY)
PythonCall.pyconvert_add_rule("awkward.highlevel:Array", AwkwardArray.IndexedArray, pyconvert_rule_awkward_array_indexed, PythonCall.PYCONVERT_PRIORITY_ARRAY)
PythonCall.pyconvert_add_rule("awkward.highlevel:Array", AwkwardArray.IndexedOptionArray, pyconvert_rule_awkward_array_indexedoption, PythonCall.PYCONVERT_PRIORITY_ARRAY)
PythonCall.pyconvert_add_rule("awkward.highlevel:Array", AwkwardArray.ByteMaskedArray, pyconvert_rule_awkward_array_bytemasked, PythonCall.PYCONVERT_PRIORITY_ARRAY)
PythonCall.pyconvert_add_rule("awkward.highlevel:Array", AwkwardArray.BitMaskedArray, pyconvert_rule_awkward_array_bitmasked, PythonCall.PYCONVERT_PRIORITY_ARRAY)
PythonCall.pyconvert_add_rule("awkward.highlevel:Array", AwkwardArray.UnmaskedArray, pyconvert_rule_awkward_array_unmasked, PythonCall.PYCONVERT_PRIORITY_ARRAY)
PythonCall.pyconvert_add_rule("awkward.highlevel:Array", AwkwardArray.UnionArray, pyconvert_rule_awkward_array_union, PythonCall.PYCONVERT_PRIORITY_ARRAY)
end

end # module
168 changes: 168 additions & 0 deletions test/runpytests.jl
Original file line number Diff line number Diff line change
Expand Up @@ -30,3 +30,171 @@ end

@test array == [[1.1, 2.2, 3.3], [], [4.4, 5.5]]
end

# Test pyconvert Python Awkwar Array to Julia Awkward Array
@testset "convert # PrimitiveArray" begin
layout = pyimport("awkward").contents.NumpyArray(
pyimport("numpy").array([1.1, 2.2, 3.3, 4.4, 5.5, 6.6, 7.7, 8.8, 9.9], dtype=pyimport("numpy").float64)
)
py_array = pyimport("awkward").Array(layout)

array = pyconvert(AwkwardArray.PrimitiveArray, py_array)
jpivarski marked this conversation as resolved.
Show resolved Hide resolved
@test array isa AwkwardArray.PrimitiveArray
end

@testset "convert # EmptyArray" begin
layout = pyimport("awkward").contents.EmptyArray()
py_array = pyimport("awkward").Array(layout)

array = pyconvert(AwkwardArray.EmptyArray, py_array)
@test array isa AwkwardArray.EmptyArray
end

@testset "convert # ListOffsetArray" begin
py_array = pyimport("awkward").Array([[1.1, 2.2, 3.3], [], [4.4, 5.5]])

array = pyconvert(AwkwardArray.ListOffsetArray, py_array)
@test array isa AwkwardArray.ListOffsetArray
end

@testset "convert # ListArray" begin
content = pyimport("awkward").contents.NumpyArray(
pyimport("numpy").array([1.1, 2.2, 3.3, 4.4, 5.5, 6.6, 7.7, 8.8, 9.9], dtype=pyimport("numpy").float64)
)
starts = pyimport("awkward").index.Index64(pyimport("numpy").array([0, 3, 3, 5, 6], dtype=pyimport("numpy").int64))
stops = pyimport("awkward").index.Index64(pyimport("numpy").array([3, 3, 5, 6, 9], dtype=pyimport("numpy").int64))
offsets = pyimport("awkward").index.Index64(pyimport("numpy").array([0, 3, 3, 5, 6, 9], dtype=pyimport("numpy").int64))
layout = pyimport("awkward").contents.ListArray(starts, stops, content)

py_array = pyimport("awkward").Array(layout)

array = pyconvert(AwkwardArray.ListArray, py_array)
@test array isa AwkwardArray.ListArray
end

@testset "convert # RegularArray" begin
content = pyimport("awkward").contents.NumpyArray(
pyimport("numpy").array([0.0, 1.1, 2.2, 3.3, 4.4, 5.5, 6.6, 7.7, 8.8, 9.9], dtype=pyimport("numpy").float64)
)
offsets = pyimport("awkward").index.Index64(pyimport("numpy").array([0, 3, 3, 5, 6, 10, 10], dtype=pyimport("numpy").int64))
listoffsetarray = pyimport("awkward").contents.ListOffsetArray(offsets, content)
regulararray = pyimport("awkward").contents.RegularArray(listoffsetarray, 2, zeros_length=0)

py_array = pyimport("awkward").Array(regulararray)

array = pyconvert(AwkwardArray.RegularArray, py_array)
@test array isa AwkwardArray.RegularArray
end

@testset "convert # RecordArray" begin
content1 = pyimport("awkward").contents.NumpyArray(pyimport("numpy").array([1, 2, 3, 4, 5], dtype=pyimport("numpy").int64))
content2 = pyimport("awkward").contents.NumpyArray(
pyimport("numpy").array([1.1, 2.2, 3.3, 4.4, 5.5, 6.6, 7.7, 8.8, 9.9], dtype=pyimport("numpy").float64)
)
offsets = pyimport("awkward").index.Index64(pyimport("numpy").array([0, 3, 3, 5, 6, 9], dtype=pyimport("numpy").int64))
listoffsetarray = pyimport("awkward").contents.ListOffsetArray(offsets, content2)
recordarray = pyimport("awkward").contents.RecordArray(
[content1, listoffsetarray, content2, content1],
fields=["one", "two", "2", "wonky"],
)

py_array = pyimport("awkward").Array(recordarray)

array = pyconvert(AwkwardArray.RecordArray, py_array)
@test array isa AwkwardArray.RecordArray
end

@testset "convert # TupleArray" begin
tuplearray = pyimport("awkward").contents.RecordArray([pyimport("awkward").contents.NumpyArray(pyimport("numpy").arange(10, dtype=pyimport("numpy").int64))], pybuiltins.None)

py_array = pyimport("awkward").Array(tuplearray)

array = pyconvert(AwkwardArray.TupleArray, py_array)
@test array isa AwkwardArray.TupleArray
end

@testset "convert # IndexedArray" begin
content = pyimport("awkward").contents.NumpyArray(pyimport("numpy").array([0.0, 1.1, 2.2, 3.3, 4.4], dtype=pyimport("numpy").float64))

ind = pyimport("numpy").array([2, 2, 0, 3, 4], dtype=pyimport("numpy").int32)
index = pyimport("awkward").index.Index32(ind)
indexedarray = pyimport("awkward").contents.IndexedArray(index, content)

py_array = pyimport("awkward").Array(indexedarray)

array = pyconvert(AwkwardArray.IndexedArray, py_array)
@test array isa AwkwardArray.IndexedArray
end

@testset "convert # IndexedOptionArray" begin
content = pyimport("awkward").contents.NumpyArray(pyimport("numpy").array([0.0, 1.1, 2.2, 3.3, 4.4], dtype=pyimport("numpy").float64))
index = pyimport("awkward").index.Index64(pyimport("numpy").array([2, 2, 0, -1, 4], dtype=pyimport("numpy").int64))
indexedoptionarray = pyimport("awkward").contents.IndexedOptionArray(index, content)

py_array = pyimport("awkward").Array(indexedoptionarray)

array = pyconvert(AwkwardArray.IndexedOptionArray, py_array)
@test array isa AwkwardArray.IndexedOptionArray
end

@testset "convert # ByteMaskedArray" begin
layout = pyimport("awkward").contents.ByteMaskedArray(
pyimport("awkward").index.Index8(pyimport("numpy").array([0, 1, 0, 1, 0], dtype=pyimport("numpy").int8)),
pyimport("awkward").contents.NumpyArray(pyimport("numpy").arange(5, dtype=pyimport("numpy").int64)),
valid_when=pybuiltins.True,
)
py_array = pyimport("awkward").Array(layout)

array = pyconvert(AwkwardArray.ByteMaskedArray, py_array)
@test array isa AwkwardArray.ByteMaskedArray
end

@testset "convert # BitMaskedArray" begin
content = pyimport("awkward").operations.from_iter(
[[0.0, 1.1, 2.2], [3.3, 4.4], [5.5], [6.6, 7.7, 8.8, 9.9]], highlevel=pybuiltins.False
)
mask = pyimport("awkward").index.IndexU8(pyimport("numpy").array([66], dtype=pyimport("numpy").uint8))
maskedarray = pyimport("awkward").contents.BitMaskedArray(
mask, content, valid_when=pybuiltins.False, length=4, lsb_order=pybuiltins.True
)
py_array = pyimport("awkward").Array(maskedarray)

array = pyconvert(AwkwardArray.BitMaskedArray, py_array)
@test array isa AwkwardArray.BitMaskedArray
end

@testset "convert # UnmaskedArray" begin
unmaskedarray = pyimport("awkward").contents.UnmaskedArray(
pyimport("awkward").contents.NumpyArray(
pyimport("numpy").array([0.0, 1.1, 2.2, 3.3], dtype=pyimport("numpy").float64)
)
)
py_array = pyimport("awkward").Array(unmaskedarray)

array = pyconvert(AwkwardArray.UnmaskedArray, py_array)
@test array isa AwkwardArray.UnmaskedArray
end

@testset "convert # UnionArray" begin
layout = pyimport("awkward").contents.unionarray.UnionArray(
pyimport("awkward").index.Index(pyimport("numpy").array([1, 1, 0, 0, 1, 0, 1], dtype=pyimport("numpy").int8)),
pyimport("awkward").index.Index(pyimport("numpy").array([4, 3, 0, 1, 2, 2, 4, 100], dtype=pyimport("numpy").int64)),
[
pyimport("awkward").contents.recordarray.RecordArray(
[pyimport("awkward").from_iter(["1", "2", "3"], highlevel=pybuiltins.False)], ["nest"]
),
pyimport("awkward").contents.recordarray.RecordArray(
[
pyimport("awkward").contents.numpyarray.NumpyArray(
pyimport("numpy").array([1.1, 2.2, 3.3, 4.4, 5.5], dtype=pyimport("numpy").float64)
)
],
["nest"],
),
],
)
py_array = pyimport("awkward").Array(layout)

array = pyconvert(AwkwardArray.UnionArray, py_array)
@test array isa AwkwardArray.UnionArray
end
Loading