-
Notifications
You must be signed in to change notification settings - Fork 330
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add bit64 support #590
base: main
Are you sure you want to change the base?
Add bit64 support #590
Conversation
src/python.cpp
Outdated
vec[i] = PyInt_AsLong(PyList_GetItem(x, i)); | ||
for (Py_ssize_t i = 0; i<len; i++) { | ||
long num = PyInt_AsLong(PyList_GetItem(x, i)); | ||
if(num > std::numeric_limits<int>::max()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alternatively, could loop thru array once first to check max value, then create the right vector type the first time
I don't think it's a good idea to have the output vector type depend on the runtime value -- I think we need to ensure type stability, and the user has to opt-in to requesting bit64 vectors in some way. This is especially true since there are operations that can be performed on 'native' R integers that will not work with a bit64 vector. I don't yet know what this interface should look like, though. My instinct says that there should be some way for users to provide custom converters, that accept R objects and return Python objets, and vice versa. Also worth saying: the underlying SEXP type of an |
I don't disagree w.r.t opting in with some option. I've never built an enterprise-grade R package before, so not sure how to be incorporate that, but could imagine that being configured by the user after loading the reticulate package |
It's the right call. I am using |
I will take a stab at implementing an option tonight |
Any news on this one? Looking forward to a solution, as this would solve a bunch of problems! |
Added two options which default to
|
Thanks -- I'll try to re-review this soon. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall looks good, but I really wonder if we should either always or never convert to integer64
. I worry that there will be surprises if this only happens sometimes.
If we really want to allow this to happen automatically, we could allow for the option to be FALSE
, TRUE
, or "auto"
or something like that. I would advocate for just allowing the conversion to always happen or never happen though.
T getConfig(std::string config, T defValue) { | ||
Environment base( "package:base" ) ; | ||
Function getOption = base["getOption"]; | ||
SEXP s = getOption(config, defValue); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe we could use Rf_GetOption
directly here.
|
||
else if (scalarType == INTSXP) { | ||
long val = PyLong_AsLong(x); | ||
if((val > std::numeric_limits<int>::max() || val < std::numeric_limits<int>::min()) && convertLongToBit64()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I worry a bit that this could be confusing if values were converted to integer64
only "sometimes". That is, I'm not sure if the runtime type should depend on the runtime value. Would it make sense to always make this conversion if the user has opted in?
vec[i] = PyInt_AsLong(PyList_GetItem(x, i)); | ||
for (Py_ssize_t i = 0; i<len; i++) { | ||
long num = PyLong_AsLong(PyList_GetItem(x, i)); | ||
if((num > std::numeric_limits<int>::max() || num < std::numeric_limits<int>::min()) && convertLongToBit64()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same concern here as above.
if (LENGTH(sexp) == 1) { | ||
double value = REAL(sexp)[0]; | ||
return PyFloat_FromDouble(value); | ||
if(isInt64) { | ||
return PyInt_FromLong(reinterpret_cast<long&>(value)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it really safe to reinterpret a value as a reference? I've usually seen type punning done as e.g.
*(long*)(&value)
This is still technically undefined behavior, though. My understanding is that the only standards-compliant way of performing type punning is with memcpy()
. See: https://stackoverflow.com/a/17790026/1342082
(In C++20, we will have bitcast: https://en.cppreference.com/w/cpp/numeric/bit_cast)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See https://godbolt.org/z/VDCcVF for an example of a helper function that could be used for casting here.
@@ -58,6 +58,27 @@ std::wstring s_python_v3; | |||
std::string s_pythonhome; | |||
std::wstring s_pythonhome_v3; | |||
|
|||
const std::string CONFIG_LONG_AS_BIT64="reticulate.long_as_bit64"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should these be static const
? (since they should have internal linkage)
} | ||
|
||
int narrow_array_typenum(PyArray_Descr* descr) { | ||
return narrow_array_typenum(descr->type_num); | ||
return narrow_array_typenum(typenum(descr->type_num)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
return narrow_array_typenum(typenum(descr->type_num)); | |
return narrow_array_typenum(typenum(descr)); |
if((num > std::numeric_limits<int>::max() || num < std::numeric_limits<int>::min()) && convertLongToBit64()) { | ||
//We need to start over an interpret as 64 bit int | ||
Rcpp::NumericVector nVec(len); | ||
long long* res_ptr = (long long*) dataptr(nVec); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that using long long*
explicitly requires us to use C++11. This is okay but I think we need to update SystemRequirements
to indicate that.
Thanks for the pull request! We require a Contributor License Agreement before we can accept pull requests on RStudio repositories. Would you be willing to fill one out, and send it to the e-mail indicated in the form? (Please also reference this PR in your e-mail.) |
This is really exciting stuff! Unfortunately, I gave it a try and ran into a few painful points:
There are definitely some nice things here, though! Thanks! EDIT: The crashing was due to #723, I believe and reinstalling library(reticulate)
options(reticulate.long_as_bit64=TRUE)
options(reticulate.ulong_as_bit64=TRUE)
reticulate::py_run_string('myvar = 2422110596;')
py$myvar
#> [1] 1.196682e-314
#> attr(,"class")
#> [1] "integer64"
bit64::as.integer64(1234)
#> integer64
#> [1] 1234
py$myvar
#> integer64
#> [1] 2422110596 Created on 2020-04-19 by the reprex package (v0.3.0) |
Our |
Dear @kevinushey @kdkavanagh Are there any plans to accept this pull request in to reticulate? |
Is there any update on this issue? I am starting to use |
Allow for 64bit integers to be passed between Py <-> R using the bit64 package, inspired by https://gallery.rcpp.org/articles/creating-integer64-and-nanotime-vectors/