-
-
Notifications
You must be signed in to change notification settings - Fork 218
Description
There is a bug in accessing the elements of a MatrixRow of a large matrix:
b <- Rcpp::sourceCpp(code="
#include \"Rcpp.h\"
//[[Rcpp::export(rng=false)]]
double get_bottom_right(Rcpp::NumericMatrix x) {
auto row = x.row(x.nrow() - 1);
return row[x.ncol() - 1];
}
")
# Okay:
mat1 <- matrix(1:20000 + 0.0, nrow=20000, ncol=1e5)
get_bottom_right(mat1)
## [1] 20000
mat1[20000,1e5]
## [1] 20000
# Incorrect:
mat2 <- matrix(1:30000 + 0.0, nrow=30000, ncol=1e5)
get_bottom_right(mat2)
## [1] 13216
mat2[30000,1e5]
## [1] 30000This is because the row[] indexing is done through the usual offset calculation:
Rcpp/inst/include/Rcpp/vector/MatrixRow.h
Lines 171 to 173 in ae1eccc
| inline reference operator[]( int i ) const { | |
| return parent[ row + i * parent_nrow ] ; | |
| } |
but both i and parent_nrow are (32-bit) ints:
Rcpp/inst/include/Rcpp/vector/MatrixRow.h
Line 206 in ae1eccc
| int parent_nrow ; |
which causes an integer overflow when computing i * parent_nrow for large matrices. This causes us to access some random memory address, leading to the incorrect result observed above. We have also observed segfaults from similar code on other machines, though I don't have the details of those systems.
The fix seems relatively simple: just cast parent_nrow to a R_xlen_t before multiplication. This should cause promotion upon multiplication and avoid any issues, given that Matrix::operator[] already accepts a R_xlen_t.
Seems likely that MatrixColumn will need a similar fix in its constructor:
Rcpp/inst/include/Rcpp/vector/MatrixColumn.h
Line 132 in ae1eccc
| const_start(parent.begin() + i *n) |
Session information
R version 4.3.0 (2023-04-21)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 18.04.6 LTS
Matrix products: default
BLAS: /usr/local/lib/R/lib/libRblas.so
LAPACK: /usr/local/lib/R/lib/libRlapack.so; LAPACK version 3.11.0
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=en_US.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
time zone: Etc/UTC
tzcode source: system (glibc)
attached base packages:
[1] stats graphics grDevices utils datasets methods base
loaded via a namespace (and not attached):
[1] compiler_4.3.0 tools_4.3.0 Rcpp_1.0.11
Compiler information
gcc (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0