Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

combineList: BACKEND Argument not working as intended #119

Open
Apompetti-Cori opened this issue Apr 8, 2023 · 1 comment
Open

combineList: BACKEND Argument not working as intended #119

Apompetti-Cori opened this issue Apr 8, 2023 · 1 comment
Labels

Comments

@Apompetti-Cori
Copy link

Apompetti-Cori commented Apr 8, 2023

#test combineList function bsseq
#Noticed it still loads assays into memory after combination
M <- matrix(0:8, 3, 3)
Cov <- matrix(1:9, 3, 3)
hdf5_M <- writeHDF5Array(M)
hdf5_Cov <- writeHDF5Array(Cov)
hdf5_BS1 <- BSseq(chr = c("chr1", "chr2", "chr1"),
                  pos = c(1, 2, 3),
                  M = hdf5_M,
                  Cov = hdf5_Cov,
                  sampleNames = c("A", "B", "C"))

hdf5_BS1

hdf5_BS2 <- BSseq(chr = c("chr1", "chr1", "chr1"),
                  pos = c(3, 4, 5),
                  M = hdf5_M,
                  Cov = hdf5_Cov,
                  sampleNames = c("D", "E", "F"))

hdf5_BS2

x <- combineList(list(hdf5_BS1, hdf5_BS2), BACKEND = "HDF5Array")
x
> sessionInfo()
R version 4.2.1 (2022-06-23)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: CentOS Linux 7 (Core)

Matrix products: default
BLAS/LAPACK: /usr/lib64/libopenblasp-r0.3.3.so

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats4    stats     graphics  grDevices utils     datasets  methods  
[8] base     

other attached packages:
 [1] shiny_1.7.4                 HDF5Array_1.26.0           
 [3] DelayedArray_0.24.0         Matrix_1.5-3               
 [5] plotly_4.10.1               PCAtools_2.10.0            
 [7] ggrepel_0.9.3               lubridate_1.9.2            
 [9] forcats_1.0.0               stringr_1.5.0              
[11] dplyr_1.1.1                 purrr_1.0.1                
[13] readr_2.1.4                 tidyr_1.3.0                
[15] tibble_3.2.1                ggplot2_3.4.1              
[17] tidyverse_2.0.0             MethylResolver_0.1.0       
[19] methylSig_1.10.0            here_1.0.1                 
[21] bsseq_1.34.0                SummarizedExperiment_1.28.0
[23] Biobase_2.58.0              MatrixGenerics_1.10.0      
[25] matrixStats_0.63.0          GenomicRanges_1.50.2       
[27] GenomeInfoDb_1.34.9         IRanges_2.32.0             
[29] S4Vectors_0.36.2            BiocGenerics_0.44.0        
[31] rhdf5_2.42.0               

loaded via a namespace (and not attached):
  [1] snow_0.4-4                plyr_1.8.8               
  [3] lazyeval_0.2.2            splines_4.2.1            
  [5] BiocParallel_1.32.6       digest_0.6.31            
  [7] foreach_1.5.2             htmltools_0.5.5          
  [9] rsconnect_0.8.29          fansi_1.0.4              
 [11] magrittr_2.0.3            memoise_2.0.1            
 [13] BSgenome_1.66.3           ScaledMatrix_1.6.0       
 [15] doParallel_1.0.17         tzdb_0.3.0               
 [17] limma_3.54.2              Metrics_0.1.4            
 [19] Biostrings_2.66.0         R.utils_2.12.2           
 [21] timechange_0.2.0          colorspace_2.1-0         
 [23] xfun_0.38                 crayon_1.5.2             
 [25] RCurl_1.98-1.12           jsonlite_1.8.4           
 [27] iterators_1.0.14          glue_1.6.2               
 [29] polyclip_1.10-4           gtable_0.3.3             
 [31] zlibbioc_1.44.0           XVector_0.38.0           
 [33] BiocSingular_1.14.0       Rhdf5lib_1.20.0          
 [35] DEoptimR_1.0-11           scales_1.2.1             
 [37] Rcpp_1.0.10               viridisLite_0.4.1        
 [39] xtable_1.8-4              DSS_2.46.0               
 [41] dqrng_0.3.0               rsvd_1.0.5               
 [43] htmlwidgets_1.6.2         httr_1.4.5               
 [45] ellipsis_0.3.2            varhandle_2.0.5          
 [47] pkgconfig_2.0.3           XML_3.99-0.14            
 [49] R.methodsS3_1.8.2         farver_2.1.1             
 [51] sass_0.4.5                locfit_1.5-9.7           
 [53] utf8_1.2.3                tidyselect_1.2.0         
 [55] rlang_1.1.0               reshape2_1.4.4           
 [57] later_1.3.0               munsell_0.5.0            
 [59] tools_4.2.1               cachem_1.0.7             
 [61] cli_3.6.1                 generics_0.1.3           
 [63] fastmap_1.1.1             yaml_2.3.7               
 [65] knitr_1.42                robustbase_0.95-1        
 [67] randomForest_4.7-1.1      sparseMatrixStats_1.10.0 
 [69] mime_0.12                 R.oo_1.25.0              
 [71] compiler_4.2.1            rstudioapi_0.14          
 [73] tweenr_2.0.2              job_0.3.0                
 [75] bslib_0.4.2               stringi_1.7.12           
 [77] lattice_0.20-45           permute_0.9-7            
 [79] vctrs_0.6.1               trqwe_0.1                
 [81] pillar_1.9.0              lifecycle_1.0.3          
 [83] rhdf5filters_1.10.0       jquerylib_0.1.4          
 [85] data.table_1.14.8         cowplot_1.1.1            
 [87] bitops_1.0-7              irlba_2.3.5.1            
 [89] httpuv_1.6.9              rtracklayer_1.58.0       
 [91] R6_2.5.1                  BiocIO_1.8.0             
 [93] promises_1.2.0.1          codetools_0.2-19         
 [95] MASS_7.3-58.3             gtools_3.9.4             
 [97] rprojroot_2.0.3           rjson_0.2.21             
 [99] withr_2.5.0               GenomicAlignments_1.34.1 
[101] Rsamtools_2.14.0          GenomeInfoDbData_1.2.9   
[103] parallel_4.2.1            doSNOW_1.0.20            
[105] hms_1.1.3                 grid_4.2.1               
[107] beachmat_2.14.0           DelayedMatrixStats_1.20.0
[109] ggforce_0.4.1             restfulr_0.0.15 

When supplying the BACKEND="HDF5Array" as an argument for combineList, the resulting combined object is still loaded in memory.

Here is my output:

> x <- combineList(list(hdf5_BS1, hdf5_BS2), BACKEND = "HDF5Array")
> x
An object of type 'BSseq' with
  5 methylation loci
  6 samples
has not been smoothed
All assays are in-memory
@PeteHaitch
Copy link
Contributor

Thanks for your patience while I was on leave.

I can confirm that bsseq::combineList() seems to be ignoring the BACKEND argument in this case.
I'm not sure exactly why and haven't had time to dig into this further.
But a workaround is to call HDF5Array::setAutoRealizationBackend("HDF5Array") before running bsseq::combineList(), as shown in the example below.

suppressPackageStartupMessages(library(bsseq))
suppressPackageStartupMessages(library(HDF5Array))

M <- matrix(0:8, 3, 3)
Cov <- matrix(1:9, 3, 3)
hdf5_M <- writeHDF5Array(M)
hdf5_Cov <- writeHDF5Array(Cov)
hdf5_BS1 <- BSseq(
  chr = c("chr1", "chr2", "chr1"),
  pos = c(1, 2, 3),
  M = hdf5_M,
  Cov = hdf5_Cov,
  sampleNames = c("A", "B", "C"))
hdf5_BS2 <- BSseq(
  chr = c("chr1", "chr1", "chr1"),
  pos = c(3, 4, 5),
  M = hdf5_M,
  Cov = hdf5_Cov,
  sampleNames = c("D", "E", "F"))

# Assay is in-memory despite specifying `BACKEND = "HDF5Arra"`
x <- combineList(list(hdf5_BS1, hdf5_BS2), BACKEND = "HDF5Array")
x
#> An object of type 'BSseq' with
#>   5 methylation loci
#>   6 samples
#> has not been smoothed
#> All assays are in-memory
showtree(assay(x))
#> 5x6 integer: DelayedMatrix object
#> └─ 5x6 integer: Set dimnames
#>    └─ 5x6 integer: [seed] matrix object

# Assay is on-disk (as expected)
setAutoRealizationBackend("HDF5Array")
y <- combineList(list(hdf5_BS1, hdf5_BS2))
y
#> An object of type 'BSseq' with
#>   5 methylation loci
#>   6 samples
#> has not been smoothed
#> Some assays are HDF5Array-backed
showtree(assay(y))
#> 5x6 integer: DelayedMatrix object
#> └─ 5x6 integer: Set dimnames
#>    └─ 5x6 integer: [seed] HDF5ArraySeed object
Session info
sessioninfo::session_info()
#> ─ Session info ───────────────────────────────────────────────────────────────
#>  setting  value
#>  version  R Under development (unstable) (2023-02-13 r83829)
#>  os       macOS Ventura 13.3.1
#>  system   aarch64, darwin20
#>  ui       X11
#>  language (EN)
#>  collate  en_US.UTF-8
#>  ctype    en_US.UTF-8
#>  tz       Australia/Melbourne
#>  date     2023-05-02
#>  pandoc   2.19.2 @ /Applications/RStudio.app/Contents/Resources/app/quarto/bin/tools/ (via rmarkdown)
#> 
#> ─ Packages ───────────────────────────────────────────────────────────────────
#>  package              * version   date (UTC) lib source
#>  beachmat               2.17.0    2023-04-25 [1] Bioconductor
#>  Biobase              * 2.61.0    2023-04-25 [1] Bioconductor
#>  BiocGenerics         * 0.47.0    2023-04-25 [1] Bioconductor
#>  BiocIO                 1.11.0    2023-04-25 [1] Bioconductor
#>  BiocParallel           1.35.0    2023-04-25 [1] Bioconductor
#>  Biostrings             2.69.0    2023-04-25 [1] Bioconductor
#>  bitops                 1.0-7     2021-04-24 [1] CRAN (R 4.3.0)
#>  BSgenome               1.69.0    2023-04-25 [1] Bioconductor
#>  bsseq                * 1.37.0    2023-04-25 [1] Bioconductor
#>  cli                    3.6.1     2023-03-23 [1] CRAN (R 4.3.0)
#>  codetools              0.2-19    2023-02-01 [1] CRAN (R 4.3.0)
#>  colorspace             2.1-0     2023-01-23 [1] CRAN (R 4.3.0)
#>  crayon                 1.5.2     2022-09-29 [1] CRAN (R 4.3.0)
#>  data.table             1.14.8    2023-02-17 [1] CRAN (R 4.3.0)
#>  DelayedArray         * 0.27.0    2023-04-25 [1] Bioconductor
#>  DelayedMatrixStats     1.23.0    2023-04-25 [1] Bioconductor
#>  digest                 0.6.31    2022-12-11 [1] CRAN (R 4.3.0)
#>  evaluate               0.20      2023-01-17 [1] CRAN (R 4.3.0)
#>  fastmap                1.1.1     2023-02-24 [1] CRAN (R 4.3.0)
#>  fs                     1.6.2     2023-04-25 [1] CRAN (R 4.3.0)
#>  GenomeInfoDb         * 1.37.0    2023-04-25 [1] Bioconductor
#>  GenomeInfoDbData       1.2.10    2023-03-26 [1] Bioconductor
#>  GenomicAlignments      1.37.0    2023-04-25 [1] Bioconductor
#>  GenomicRanges        * 1.53.0    2023-04-25 [1] Bioconductor
#>  glue                   1.6.2     2022-02-24 [1] CRAN (R 4.3.0)
#>  gtools                 3.9.4     2022-11-27 [1] CRAN (R 4.3.0)
#>  HDF5Array            * 1.29.0    2023-04-25 [1] Bioconductor
#>  htmltools              0.5.5     2023-03-23 [1] CRAN (R 4.3.0)
#>  IRanges              * 2.35.0    2023-04-25 [1] Bioconductor
#>  knitr                  1.42      2023-01-25 [1] CRAN (R 4.3.0)
#>  lattice                0.21-8    2023-04-05 [1] CRAN (R 4.3.0)
#>  lifecycle              1.0.3     2022-10-07 [1] CRAN (R 4.3.0)
#>  limma                  3.57.0    2023-04-25 [1] Bioconductor
#>  locfit                 1.5-9.7   2023-01-02 [1] CRAN (R 4.3.0)
#>  Matrix               * 1.5-4     2023-04-04 [1] CRAN (R 4.3.0)
#>  MatrixGenerics       * 1.13.0    2023-04-25 [1] Bioconductor
#>  matrixStats          * 0.63.0    2022-11-18 [1] CRAN (R 4.3.0)
#>  munsell                0.5.0     2018-06-12 [1] CRAN (R 4.3.0)
#>  permute                0.9-7     2022-01-27 [1] CRAN (R 4.3.0)
#>  R.methodsS3            1.8.2     2022-06-13 [1] CRAN (R 4.3.0)
#>  R.oo                   1.25.0    2022-06-12 [1] CRAN (R 4.3.0)
#>  R.utils                2.12.2    2022-11-11 [1] CRAN (R 4.3.0)
#>  R6                     2.5.1     2021-08-19 [1] CRAN (R 4.3.0)
#>  Rcpp                   1.0.10    2023-01-22 [1] CRAN (R 4.3.0)
#>  RCurl                  1.98-1.12 2023-03-27 [1] CRAN (R 4.3.0)
#>  reprex                 2.0.2     2022-08-17 [1] CRAN (R 4.3.0)
#>  restfulr               0.0.15    2022-06-16 [1] CRAN (R 4.3.0)
#>  rhdf5                * 2.45.0    2023-04-25 [1] Bioconductor
#>  rhdf5filters           1.13.2    2023-04-30 [1] Bioconductor
#>  Rhdf5lib               1.23.0    2023-04-25 [1] Bioconductor
#>  rjson                  0.2.21    2022-01-09 [1] CRAN (R 4.3.0)
#>  rlang                  1.1.1     2023-04-28 [1] CRAN (R 4.3.0)
#>  rmarkdown              2.21      2023-03-26 [1] CRAN (R 4.3.0)
#>  Rsamtools              2.17.0    2023-04-25 [1] Bioconductor
#>  rstudioapi             0.14      2022-08-22 [1] CRAN (R 4.3.0)
#>  rtracklayer            1.61.0    2023-04-25 [1] Bioconductor
#>  S4Vectors            * 0.39.0    2023-04-25 [1] Bioconductor
#>  scales                 1.2.1     2022-08-20 [1] CRAN (R 4.3.0)
#>  sessioninfo            1.2.2     2021-12-06 [1] CRAN (R 4.3.0)
#>  sparseMatrixStats      1.13.0    2023-04-25 [1] Bioconductor
#>  SummarizedExperiment * 1.31.0    2023-04-25 [1] Bioconductor
#>  withr                  2.5.0     2022-03-03 [1] CRAN (R 4.3.0)
#>  xfun                   0.39      2023-04-20 [1] CRAN (R 4.3.0)
#>  XML                    3.99-0.14 2023-03-19 [1] CRAN (R 4.3.0)
#>  XVector                0.41.0    2023-04-25 [1] Bioconductor
#>  yaml                   2.3.7     2023-01-23 [1] CRAN (R 4.3.0)
#>  zlibbioc               1.47.0    2023-04-25 [1] Bioconductor
#> 
#>  [1] /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/library
#> 
#> ──────────────────────────────────────────────────────────────────────────────

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants