Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[R] How to pass Arrow objects like Table between C++ and R? #43675

Closed
ajinkya-k opened this issue Aug 13, 2024 · 10 comments
Closed

[R] How to pass Arrow objects like Table between C++ and R? #43675

ajinkya-k opened this issue Aug 13, 2024 · 10 comments
Labels
Component: R Type: usage Issue is a user question

Comments

@ajinkya-k
Copy link

Describe the usage question you have. Please include as many useful details as possible.

I was curious how I can pass arrow objects from R to C++ (kind of like R vectors via Rcpp::NumericVector). Here's an example of what I am looking for:

Say I have a function sample_post in R that takes in an arrow table and some parameters:

# R code
sample_post <- function(arrow_tbl, some_vec, params) {
    # do some clean up
    post_object <- .some_rcpp_fn(arrow_tbl, some_vec, params)

    # do some processing on the post object
   return(post_object)
}

For a little more concreteness, let's say some_fn_cpp is doing group_by summaries in each iteration of a loop.
In some_fn_rcpp what should be the type for the first argument that corresponds to arrow_tbl?

NOTE: I would prefer using Rcpp but not tied to it. I am okay using something else.

Component(s)

R

@ajinkya-k ajinkya-k added the Type: usage Issue is a user question label Aug 13, 2024
@assignUser
Copy link
Member

Do you want to operate on the arrow table with libarrow or some custom C++ code (potentially using other libraries)?

@ajinkya-k
Copy link
Author

I want to use custom C++ code that will need other libraries

@assignUser
Copy link
Member

Sorry for the late reply, I am not sure that's possible with cpp11 (which the arrow uses) but that's not my speciality. I found this related issue: #36274

@ajinkya-k
Copy link
Author

So is this an inherent limitation of cpp11?

@assignUser
Copy link
Member

I don't really know, sorry. Maybe @jonkeane or @paleolimbot can chime in?

@amoeba
Copy link
Member

amoeba commented Sep 12, 2024

It seems like this should be possible @ajinkya-k, see #36274 (comment) and let us know if you think you could adapt that code to your use case. Also note the caveats in that thread.

@ajinkya-k
Copy link
Author

Hi @amoeba, thanks for sharing the thread. As is clear in the thread, there is no guarantee of stability which means I cannot roll it up into a package. I was hoping there would be a more stable and permanent way to do this. If not, it might be worth putting in a feature request.

I think being able to access the exact same Arrow object from both R and C++ would be very important to enable more scalable Bayesian analyses that have to rely on C++ code out of necessity. In some of the applications that I am thinking of, summary statistics of specific subsets of the data are required to be computed in C++. This can be very efficiently be achieved using filter and group_by + summarize in C++. But in every iteration of the MCMC loop the subset of units to be filered on or grouped will differ. This is why the arrow object must be available in C++

@amoeba
Copy link
Member

amoeba commented Sep 13, 2024

The examples given in #36274 should be stable because they use the Arrow C Data Interface, with the help of the nanoarrow package, to pass the arrow::Table between C++ and R. My interpretation of @paleolimbot 's comment was that it's specifically passing pointers to arrow::Tables that's not considered stable. But going through the C Data Interface is stable and is even the Arrow project's recommended way of doing this kind of thing.

@ajinkya-k
Copy link
Author

Thanks! I will give it a try

@amoeba
Copy link
Member

amoeba commented Oct 10, 2024

Hi @ajinkya-k, I'm going to close this for now but please feel free to re-open and/or comment here. I'm curious if you were able to get something to work.

@amoeba amoeba closed this as completed Oct 10, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Component: R Type: usage Issue is a user question
Projects
None yet
Development

No branches or pull requests

3 participants