Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Minimal example of Awkward, Arrow, C++ roundtrip #1

Open
matthewfeickert opened this issue Jun 14, 2021 · 7 comments
Open

Minimal example of Awkward, Arrow, C++ roundtrip #1

matthewfeickert opened this issue Jun 14, 2021 · 7 comments

Comments

@matthewfeickert
Copy link
Collaborator

@lukasheinrich has pointed out that in interesting minimal example would be: awkward -> arrow -> some C++ -> awkward. This would be done just with regular CMake.

@matthewfeickert
Copy link
Collaborator Author

matthewfeickert commented Jun 15, 2021

At the moment we're seeing that after a successful build, importing the babel module fails

$ make debug 
[root@f13972b564bb awkward-arrow-cmake-pybind11]# bash build.sh 
[root@f13972b564bb awkward-arrow-cmake-pybind11]# python -c 'import build.babel'
Traceback (most recent call last):
  File "<string>", line 1, in <module>
ImportError: /home/feickert/Code/GitHub/AMGLab/awkward-arrow-cmake-pybind11/build/babel.cpython-38-x86_64-linux-gnu.so: undefined symbol: _ZN5arrow2py12unwrap_arrayEP7_object

This seems similar to pybind/pybind11#1403 which means that we might have a linker error somewhere in

# awkward+pybind11+arrow
pybind11_add_module(babel babel/python.cpp)
set_target_properties(babel PROPERTIES CXX_VISIBILITY_PRESET hidden)
target_link_libraries(babel PRIVATE ${PYARROW_LIBRARIES})

@kratsg
Copy link
Owner

kratsg commented Jun 15, 2021

For what it's worth...

[root@58f63a73f9c6 awkward-arrow-cmake-pybind11]# nm /usr/local/venv/lib/python3.8/site-packages/pyarrow/libarrow_python.so.400 | grep unwrap_array
0000000000156838 b _ZN12_GLOBAL__N_1L46__pyx_api_f_7pyarrow_3lib_pyarrow_unwrap_arrayE
000000000007f0a0 T _ZN5arrow2py12unwrap_arrayEP7_object
0000000000066a57 t _ZN5arrow2py12unwrap_arrayEP7_object.cold
000000000007f0a0 t _ZN5arrow2py12unwrap_arrayEP7_object.localalias
000000000007f1d0 T _ZN5arrow2py12unwrap_arrayEP7_objectPSt10shared_ptrINS_5ArrayEE
0000000000066a7c t _ZN5arrow2py12unwrap_arrayEP7_objectPSt10shared_ptrINS_5ArrayEE.cold

it's certainly in that .so.

@kratsg
Copy link
Owner

kratsg commented Jun 15, 2021

@kratsg
Copy link
Owner

kratsg commented Jun 15, 2021

[root@9d9fbe543059 awkward-arrow-cmake-pybind11]# find / -name "*libarrow*"
/usr/local/venv/lib/python3.8/site-packages/pyarrow/libarrow_python_flight.so.400
/usr/local/venv/lib/python3.8/site-packages/pyarrow/libarrow_flight.so.400
/usr/local/venv/lib/python3.8/site-packages/pyarrow/includes/libarrow_cuda.pxd
/usr/local/venv/lib/python3.8/site-packages/pyarrow/includes/libarrow.pxd
/usr/local/venv/lib/python3.8/site-packages/pyarrow/includes/libarrow_dataset.pxd
/usr/local/venv/lib/python3.8/site-packages/pyarrow/includes/libarrow_fs.pxd
/usr/local/venv/lib/python3.8/site-packages/pyarrow/includes/libarrow_flight.pxd
/usr/local/venv/lib/python3.8/site-packages/pyarrow/libarrow_python.so.400
/usr/local/venv/lib/python3.8/site-packages/pyarrow/libarrow_dataset.so.400
/usr/local/venv/lib/python3.8/site-packages/pyarrow/libarrow.so.400
/usr/lib64/libarrow.so
/usr/lib64/libarrow-glib.so.400
/usr/lib64/libarrow_bundled_dependencies.a
/usr/lib64/libarrow-glib.so.400.1.0
/usr/lib64/libarrow_dataset.a
/usr/lib64/libarrow.so.400.1.0
/usr/lib64/libarrow_dataset.so.400
/usr/lib64/libarrow_dataset.so.400.1.0
/usr/lib64/libarrow.a
/usr/lib64/libarrow_dataset.so
/usr/lib64/libarrow-glib.a
/usr/lib64/libarrow.so.400
/usr/lib64/libarrow-glib.so

It does seem like we have a libarrow.so.400 from the apt-install but a different one from the pip install.

@kratsg
Copy link
Owner

kratsg commented Jun 15, 2021

Fixed in #4 and #5.

@lukasheinrich
Copy link
Collaborator

just dropping some pseudo code fo what would be nice

on the python side

import awkward1 as ak
arr = ak.Array([{'pt': 1.0, 'eta': 2.0},{'pt': 3.0, 'eta': 4.0}])
arrow_arr = ak.to_arrow(arr)

import pybind_module as pm
calibrated_arrow_arr = pm.calibrate(arrow_arr)
calib_arr = ak.from_arrow(arr)

on the C++ side

some python code for pybind_module with 
...
void calibrate(arrow::Array ) {
}

.def("calibrate",&calibrate)
...

@lukasheinrich
Copy link
Collaborator

this already works in the image

>>> awkward.to_arrow(awkward.Array([{'pt': 1.0, 'eta': 2.0}, {'pt': 2, 'eta': 3.}]))
<pyarrow.lib.StructArray object at 0x7f1071eb1460>
-- is_valid: all not null
-- child 0 type: double
  [
    1,
    2
  ]
-- child 1 type: double
  [
    2,
    3
  ]

so we just need to teach babel to pick up this StructArray in C++

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants