Skip to content

Integer overflow in MatrixRow offset calculations for large matrices  #1280

Closed
@LTLA

Description

@LTLA

There is a bug in accessing the elements of a MatrixRow of a large matrix:

b <- Rcpp::sourceCpp(code="
#include \"Rcpp.h\"

//[[Rcpp::export(rng=false)]]
double get_bottom_right(Rcpp::NumericMatrix x) {
    auto row = x.row(x.nrow() - 1);
    return row[x.ncol() - 1];
}
")

# Okay:
mat1 <- matrix(1:20000 + 0.0, nrow=20000, ncol=1e5)
get_bottom_right(mat1)
## [1] 20000
mat1[20000,1e5]
## [1] 20000

# Incorrect:
mat2 <- matrix(1:30000 + 0.0, nrow=30000, ncol=1e5)
get_bottom_right(mat2)
## [1] 13216
mat2[30000,1e5]
## [1] 30000

This is because the row[] indexing is done through the usual offset calculation:

inline reference operator[]( int i ) const {
return parent[ row + i * parent_nrow ] ;
}

but both i and parent_nrow are (32-bit) ints:

int parent_nrow ;

which causes an integer overflow when computing i * parent_nrow for large matrices. This causes us to access some random memory address, leading to the incorrect result observed above. We have also observed segfaults from similar code on other machines, though I don't have the details of those systems.

The fix seems relatively simple: just cast parent_nrow to a R_xlen_t before multiplication. This should cause promotion upon multiplication and avoid any issues, given that Matrix::operator[] already accepts a R_xlen_t.

Seems likely that MatrixColumn will need a similar fix in its constructor:

const_start(parent.begin() + i *n)

Session information
R version 4.3.0 (2023-04-21)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 18.04.6 LTS

Matrix products: default
BLAS:   /usr/local/lib/R/lib/libRblas.so
LAPACK: /usr/local/lib/R/lib/libRlapack.so;  LAPACK version 3.11.0

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C
 [9] LC_ADDRESS=C               LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

time zone: Etc/UTC
tzcode source: system (glibc)

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base

loaded via a namespace (and not attached):
[1] compiler_4.3.0 tools_4.3.0    Rcpp_1.0.11
Compiler information

gcc (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions