Description
There is a bug in accessing the elements of a MatrixRow
of a large matrix:
b <- Rcpp::sourceCpp(code="
#include \"Rcpp.h\"
//[[Rcpp::export(rng=false)]]
double get_bottom_right(Rcpp::NumericMatrix x) {
auto row = x.row(x.nrow() - 1);
return row[x.ncol() - 1];
}
")
# Okay:
mat1 <- matrix(1:20000 + 0.0, nrow=20000, ncol=1e5)
get_bottom_right(mat1)
## [1] 20000
mat1[20000,1e5]
## [1] 20000
# Incorrect:
mat2 <- matrix(1:30000 + 0.0, nrow=30000, ncol=1e5)
get_bottom_right(mat2)
## [1] 13216
mat2[30000,1e5]
## [1] 30000
This is because the row[]
indexing is done through the usual offset calculation:
Rcpp/inst/include/Rcpp/vector/MatrixRow.h
Lines 171 to 173 in ae1eccc
but both i
and parent_nrow
are (32-bit) int
s:
Rcpp/inst/include/Rcpp/vector/MatrixRow.h
Line 206 in ae1eccc
which causes an integer overflow when computing i * parent_nrow
for large matrices. This causes us to access some random memory address, leading to the incorrect result observed above. We have also observed segfaults from similar code on other machines, though I don't have the details of those systems.
The fix seems relatively simple: just cast parent_nrow
to a R_xlen_t
before multiplication. This should cause promotion upon multiplication and avoid any issues, given that Matrix::operator[]
already accepts a R_xlen_t
.
Seems likely that MatrixColumn
will need a similar fix in its constructor:
Rcpp/inst/include/Rcpp/vector/MatrixColumn.h
Line 132 in ae1eccc
Session information
R version 4.3.0 (2023-04-21)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 18.04.6 LTS
Matrix products: default
BLAS: /usr/local/lib/R/lib/libRblas.so
LAPACK: /usr/local/lib/R/lib/libRlapack.so; LAPACK version 3.11.0
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=en_US.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
time zone: Etc/UTC
tzcode source: system (glibc)
attached base packages:
[1] stats graphics grDevices utils datasets methods base
loaded via a namespace (and not attached):
[1] compiler_4.3.0 tools_4.3.0 Rcpp_1.0.11
Compiler information
gcc (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0