Add bit64 support #590

kdkavanagh · 2019-09-02T21:52:55Z

Allow for 64bit integers to be passed between Py <-> R using the bit64 package, inspired by https://gallery.rcpp.org/articles/creating-integer64-and-nanotime-vectors/

kdkavanagh · 2019-09-03T01:48:24Z

src/python.cpp

-        vec[i] = PyInt_AsLong(PyList_GetItem(x, i));
+      for (Py_ssize_t i = 0; i<len; i++) {
+        long num = PyInt_AsLong(PyList_GetItem(x, i));
+        if(num > std::numeric_limits<int>::max()) {


Alternatively, could loop thru array once first to check max value, then create the right vector type the first time

kevinushey · 2019-09-03T18:16:13Z

I don't think it's a good idea to have the output vector type depend on the runtime value -- I think we need to ensure type stability, and the user has to opt-in to requesting bit64 vectors in some way. This is especially true since there are operations that can be performed on 'native' R integers that will not work with a bit64 vector.

I don't yet know what this interface should look like, though. My instinct says that there should be some way for users to provide custom converters, that accept R objects and return Python objets, and vice versa.

Also worth saying: the underlying SEXP type of an integer64 vector is REALSXP; we'd likely need to audit a lot of reticulate code to ensure that we're doing the right thing when we receive a REALSXP that happens to be an integer64 vector.

kdkavanagh · 2019-09-03T18:36:56Z

I don't disagree w.r.t opting in with some option. I've never built an enterprise-grade R package before, so not sure how to be incorporate that, but could imagine that being configured by the user after loading the reticulate package

eddelbuettel · 2019-09-03T18:40:04Z

It's the right call. I am using bit64 aka integer64 aka nanotime very carefully at work to transfer full resolution timestamo, and some int64_t index values. One has to be really careful to 'cast corectly' and treat it correctly on the receiving end or else one mistakes it for a standard numeric filled with ... garbage bits. This works, but only with some care. It is probably not the right tool for a general purpose converter.

kdkavanagh · 2019-09-03T18:52:21Z

I will take a stab at implementing an option tonight

OssiLehtinen · 2020-01-13T15:35:39Z

Any news on this one? Looking forward to a solution, as this would solve a bunch of problems!

kdkavanagh · 2020-02-15T19:53:03Z

Added two options which default to FALSE

  options(reticulate.long_as_bit64=TRUE)
  options(reticulate.ulong_as_bit64=TRUE)

kevinushey · 2020-02-25T23:39:04Z

Thanks -- I'll try to re-review this soon.

kevinushey

Overall looks good, but I really wonder if we should either always or never convert to integer64. I worry that there will be surprises if this only happens sometimes.

If we really want to allow this to happen automatically, we could allow for the option to be FALSE, TRUE, or "auto" or something like that. I would advocate for just allowing the conversion to always happen or never happen though.

kevinushey · 2020-03-24T00:29:59Z

src/python.cpp

+T getConfig(std::string config, T defValue) {
+  Environment base( "package:base" ) ;
+  Function getOption = base["getOption"];
+  SEXP s = getOption(config, defValue);


I believe we could use Rf_GetOption directly here.

kevinushey · 2020-03-24T00:32:52Z

src/python.cpp

-
+    else if (scalarType == INTSXP) {
+      long val = PyLong_AsLong(x);
+      if((val > std::numeric_limits<int>::max() || val < std::numeric_limits<int>::min()) && convertLongToBit64()) {


I worry a bit that this could be confusing if values were converted to integer64 only "sometimes". That is, I'm not sure if the runtime type should depend on the runtime value. Would it make sense to always make this conversion if the user has opted in?

kevinushey · 2020-03-24T00:33:11Z

src/python.cpp

-        vec[i] = PyInt_AsLong(PyList_GetItem(x, i));
+      for (Py_ssize_t i = 0; i<len; i++) {
+        long num = PyLong_AsLong(PyList_GetItem(x, i));
+        if((num > std::numeric_limits<int>::max() || num < std::numeric_limits<int>::min()) && convertLongToBit64()) {


Same concern here as above.

kevinushey · 2020-03-24T00:39:42Z

src/python.cpp

    if (LENGTH(sexp) == 1) {
      double value = REAL(sexp)[0];
-      return PyFloat_FromDouble(value);
+      if(isInt64) {
+        return PyInt_FromLong(reinterpret_cast<long&>(value));


Is it really safe to reinterpret a value as a reference? I've usually seen type punning done as e.g.

*(long*)(&value)

This is still technically undefined behavior, though. My understanding is that the only standards-compliant way of performing type punning is with memcpy(). See: https://stackoverflow.com/a/17790026/1342082

(In C++20, we will have bitcast: https://en.cppreference.com/w/cpp/numeric/bit_cast)

See https://godbolt.org/z/VDCcVF for an example of a helper function that could be used for casting here.

kevinushey · 2020-03-24T00:40:28Z

src/python.cpp

@@ -58,6 +58,27 @@ std::wstring s_python_v3;
 std::string s_pythonhome;
 std::wstring s_pythonhome_v3;

+const std::string CONFIG_LONG_AS_BIT64="reticulate.long_as_bit64";


Should these be static const? (since they should have internal linkage)

kevinushey · 2020-03-24T00:45:33Z

src/python.cpp

 }

 int narrow_array_typenum(PyArray_Descr* descr) {
-  return narrow_array_typenum(descr->type_num);
+  return narrow_array_typenum(typenum(descr->type_num));


Suggested change

return narrow_array_typenum(typenum(descr->type_num));

return narrow_array_typenum(typenum(descr));

kevinushey · 2020-03-24T02:37:55Z

src/python.cpp

+        if((num > std::numeric_limits<int>::max() || num < std::numeric_limits<int>::min()) && convertLongToBit64()) {
+          //We need to start over an interpret as 64 bit int
+          Rcpp::NumericVector nVec(len);
+          long long* res_ptr  = (long long*) dataptr(nVec);


Note that using long long* explicitly requires us to use C++11. This is okay but I think we need to update SystemRequirements to indicate that.

kevinushey · 2020-03-24T17:17:57Z

Thanks for the pull request! We require a Contributor License Agreement before we can accept pull requests on RStudio repositories. Would you be willing to fill one out, and send it to the e-mail indicated in the form? (Please also reference this PR in your e-mail.)

colearendt · 2020-04-20T01:40:34Z

This is really exciting stuff! Unfortunately, I gave it a try and ran into a few painful points:

something weird in my environment is causing certain operations to crash the rsession (import pickle, create a big integer, etc.), so I haven't been able to create a reprex. (NOTE: this is happening for the CRAN released version of reticulate as well, so I suspect this problem is elsewhere).
the print method if bit64 has not been loaded/referenced is pretty confusing:

sync_res$items[[1]]$id
[1] 1.196682e-314
attr(,"class")
[1] "integer64"

However, just calling a bit64 function directly fixes:

> bit64::as.integer64(1234)
integer64
[1] 1234
> sync_res$items[[1]]$id
integer64
[1] 2422110596

There are definitely some nice things here, though! Thanks!

EDIT: The crashing was due to #723, I believe and reinstalling Rcpp fixed! 😄 Reprex below:

library(reticulate)
options(reticulate.long_as_bit64=TRUE)
options(reticulate.ulong_as_bit64=TRUE)

reticulate::py_run_string('myvar = 2422110596;')
py$myvar
#> [1] 1.196682e-314
#> attr(,"class")
#> [1] "integer64"
bit64::as.integer64(1234)
#> integer64
#> [1] 1234
py$myvar
#> integer64
#> [1] 2422110596

^{Created on 2020-04-19 by the reprex package (v0.3.0)}

eddelbuettel · 2020-04-20T02:58:51Z

However, just calling a bit64 function directly fixes

Our nanotime package also wraps around bit64, and I have seen this too. Now, it is a little tricky as we generally want an optional component to be truly optional, but generally speaking an if (requireNamespace(...))) test (and load of the package) before code blocks with 64-bit ints seems to help (and I think that is what data.table does too). Making it a globally imported package helps too, but that may be too blunt.

GreenGrassBlueOcean · 2020-09-28T21:27:15Z

Dear @kevinushey @kdkavanagh

Are there any plans to accept this pull request in to reticulate?
This would be very useful for my own r-package as this api often returns large integers which then turn out to be -1.

spittssj · 2021-04-28T22:08:44Z

Is there any update on this issue?

I am starting to use rgee which wraps the Python Google Earth Engine API through reticulate and I ran into this issue with the time field that comes with the ImageCollections of time series satellite data.

kdkavanagh added 2 commits September 2, 2019 16:27

Add bit64 support

91e8747

Update tests and add dep on bit64

4a697bb

kdkavanagh mentioned this pull request Sep 2, 2019

Converting non-32-bit integers from python to R to use bit64 #323

Open

kdkavanagh commented Sep 3, 2019

View reviewed changes

kdkavanagh added 2 commits February 15, 2020 13:43

Add options to configure bit64 usage

791bd66

Merge branch 'master' of https://github.com/rstudio/reticulate

5dfec60

kevinushey mentioned this pull request Mar 3, 2020

int64 type in python, got numeric type in R #729

Open

kevinushey requested changes Mar 24, 2020

View reviewed changes

kevinushey reviewed Mar 24, 2020

View reviewed changes

spittssj mentioned this pull request Apr 28, 2021

rgee has problems with 64-bit integers r-spatial/rgee#150

Closed

3 tasks

Merge branch 'main' into master

4a66b3a

This comment was marked as off-topic.

Sign in to view

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add bit64 support #590

Add bit64 support #590

kdkavanagh commented Sep 2, 2019

kdkavanagh Sep 3, 2019

kevinushey commented Sep 3, 2019

kdkavanagh commented Sep 3, 2019

eddelbuettel commented Sep 3, 2019

kdkavanagh commented Sep 3, 2019

OssiLehtinen commented Jan 13, 2020

kdkavanagh commented Feb 15, 2020

kevinushey commented Feb 25, 2020

kevinushey left a comment

kevinushey Mar 24, 2020

kevinushey Mar 24, 2020

kevinushey Mar 24, 2020

kevinushey Mar 24, 2020

kevinushey Mar 24, 2020

kevinushey Mar 24, 2020

kevinushey Mar 24, 2020

kevinushey Mar 24, 2020

kevinushey commented Mar 24, 2020

colearendt commented Apr 20, 2020 •

edited

Loading

eddelbuettel commented Apr 20, 2020

GreenGrassBlueOcean commented Sep 28, 2020

spittssj commented Apr 28, 2021

This comment was marked as off-topic.

	return narrow_array_typenum(typenum(descr->type_num));
	return narrow_array_typenum(typenum(descr));

Add bit64 support #590

Are you sure you want to change the base?

Add bit64 support #590

Conversation

kdkavanagh commented Sep 2, 2019

Choose a reason for hiding this comment

kevinushey commented Sep 3, 2019

kdkavanagh commented Sep 3, 2019

eddelbuettel commented Sep 3, 2019

kdkavanagh commented Sep 3, 2019

OssiLehtinen commented Jan 13, 2020

kdkavanagh commented Feb 15, 2020

kevinushey commented Feb 25, 2020

kevinushey left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kevinushey commented Mar 24, 2020

colearendt commented Apr 20, 2020 • edited Loading

eddelbuettel commented Apr 20, 2020

GreenGrassBlueOcean commented Sep 28, 2020

spittssj commented Apr 28, 2021

This comment was marked as off-topic.

colearendt commented Apr 20, 2020 •

edited

Loading