Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HDF5-backed objects saved with a "prefix" fail to smooth #98

Open
Nick-Eagles opened this issue Oct 12, 2020 · 2 comments
Open

HDF5-backed objects saved with a "prefix" fail to smooth #98

Nick-Eagles opened this issue Oct 12, 2020 · 2 comments

Comments

@Nick-Eagles
Copy link

Hello,

I frequently make use of the prefix argument to saveHDF5SummarizedExperiment from the HDF5Array package when working with bsseq objects. However, HDF5-backed bsseq objects saved with a prefix seem to not correctly smooth with BSmooth. Below I provide a small reproducible example which generates the issue. If you omit the prefix argument in the below call to saveHDF5SummarizedExperiment, the entire code segment works as expected.

library('bsseq')
library('HDF5Array')
library('sessioninfo')

write_dir = 'hdf5_object'

#  Load the memory-backed example bsseq object
data(BS.chr22)

#  Suppose I wish to manipulate an HDF5-backed version of the object. I
#  provide an arbitrarily named prefix; without supplying the 'prefix'
#  argument, the below smoothing code works fine!
bs_disk = saveHDF5SummarizedExperiment(BS.chr22, dir=write_dir, prefix='CpG',
                                       verbose=TRUE)

##Start writing assay 1/2 to HDF5 file:
##  hdf5_object/CpGassays.h5
##Realizing block 1/1 ... OK, writing it ... OK
##Finished writing assay 1/2 to HDF5 file:
##  hdf5_object/CpGassays.h5
##
##Start writing assay 2/2 to HDF5 file:
##  hdf5_object/CpGassays.h5
##Realizing block 1/1 ... OK, writing it ... OK
##Finished writing assay 2/2 to HDF5 file:
##  hdf5_object/CpGassays.h5

Serialize BSseq object to RDS file:
  hdf5_object/CpGse.rds

#  Attempt to smooth the HDF5-backed object 'bs_disk'
BSmooth(bs_disk, verbose=TRUE)

##An object of type 'BSseq' with
##  494728 methylation loci
##  2 samples
##has been smoothed with
##  BSmooth (ns = 70, h = 1000, maxGap = 100000000)
##Some assays are HDF5Array-backed
##Warning message:
##In BSmooth(bs_disk, verbose = TRUE) :
##  'BSseq' was not created with either read.bismark() or
##  HDF5Array::saveHDF5SummarizedExperiment(). BSmooth() is using
##  automatically created HDF5 file(s) (see ?HDF5Array::setHDF5DumpFile) to
##  store smoothing result. You will need to run
##  HDF5Array::saveHDF5SummarizedExperiment() on the returned object if you
##  wish to save the returned object.

#  Show that the on-disk object was indeed not smoothed
getMeth(bs_disk, type='smooth')

##Error in getMeth(bs_disk, type = "smooth") :
##  'type=smooth' requires the object to have been smoothed.

session_info()

##─ Session info ───────────────────────────────────────────────────────────────
## setting  value
## version  R version 4.0.2 Patched (2020-06-24 r78746)
## os       CentOS Linux 7 (Core)
## system   x86_64, linux-gnu
## ui       X11
## language (EN)
## collate  en_US.UTF-8
## ctype    en_US.UTF-8
## tz       US/Eastern
## date     2020-10-12
##
##─ Packages ───────────────────────────────────────────────────────────────────
## package              * version  date       lib source
## assertthat             0.2.1    2019-03-21 [2] CRAN (R 4.0.0)
## Biobase              * 2.48.0   2020-04-27 [2] Bioconductor
## BiocGenerics         * 0.34.0   2020-04-27 [2] Bioconductor
## BiocParallel           1.22.0   2020-04-27 [2] Bioconductor
## Biostrings             2.56.0   2020-04-27 [2] Bioconductor
## bitops                 1.0-6    2013-08-17 [2] CRAN (R 4.0.0)
## BSgenome               1.56.0   2020-04-27 [2] Bioconductor
## bsseq                * 1.24.0   2020-04-28 [2] Bioconductor
## cli                    2.0.2    2020-02-28 [2] CRAN (R 4.0.0)
## colorspace             1.4-1    2019-03-18 [2] CRAN (R 4.0.0)
## crayon                 1.3.4    2017-09-16 [2] CRAN (R 4.0.0)
## data.table             1.12.8   2019-12-09 [2] CRAN (R 4.0.0)
## DelayedArray         * 0.14.0   2020-04-27 [2] Bioconductor
## DelayedMatrixStats     1.10.0   2020-04-27 [2] Bioconductor
## fansi                  0.4.1    2020-01-08 [2] CRAN (R 4.0.0)
## GenomeInfoDb         * 1.24.0   2020-04-27 [2] Bioconductor
## GenomeInfoDbData       1.2.3    2020-05-18 [2] Bioconductor
## GenomicAlignments      1.24.0   2020-04-27 [2] Bioconductor
## GenomicRanges        * 1.40.0   2020-04-27 [2] Bioconductor
## glue                   1.4.2    2020-08-27 [1] CRAN (R 4.0.2)
## gtools                 3.8.2    2020-03-31 [2] CRAN (R 4.0.0)
## HDF5Array            * 1.16.0   2020-04-27 [2] Bioconductor
## IRanges              * 2.22.1   2020-04-28 [2] Bioconductor
## lattice                0.20-41  2020-04-02 [3] CRAN (R 4.0.2)
## lifecycle              0.2.0    2020-03-06 [2] CRAN (R 4.0.0)
## limma                  3.44.1   2020-04-28 [2] Bioconductor
## locfit                 1.5-9.4  2020-03-25 [2] CRAN (R 4.0.0)
## Matrix                 1.2-18   2019-11-27 [3] CRAN (R 4.0.2)
## matrixStats          * 0.56.0   2020-03-13 [2] CRAN (R 4.0.0)
## munsell                0.5.0    2018-06-12 [2] CRAN (R 4.0.0)
## permute                0.9-5    2019-03-12 [2] CRAN (R 4.0.0)
## R.methodsS3            1.8.0    2020-02-14 [2] CRAN (R 4.0.0)
## R.oo                   1.23.0   2019-11-03 [2] CRAN (R 4.0.0)
## R.utils                2.9.2    2019-12-08 [2] CRAN (R 4.0.0)
## R6                     2.4.1    2019-11-12 [2] CRAN (R 4.0.0)
## Rcpp                   1.0.4.6  2020-04-09 [2] CRAN (R 4.0.0)
## RCurl                  1.98-1.2 2020-04-18 [2] CRAN (R 4.0.0)
## rhdf5                * 2.32.0   2020-04-27 [2] Bioconductor
## Rhdf5lib               1.10.0   2020-04-27 [2] Bioconductor
## rlang                  0.4.7    2020-07-09 [1] CRAN (R 4.0.2)
## Rsamtools              2.4.0    2020-04-27 [2] Bioconductor
## rtracklayer            1.48.0   2020-04-27 [2] Bioconductor
## S4Vectors            * 0.26.1   2020-05-16 [2] Bioconductor
## scales                 1.1.1    2020-05-11 [2] CRAN (R 4.0.0)
## sessioninfo          * 1.1.1    2018-11-05 [2] CRAN (R 4.0.0)
## SummarizedExperiment * 1.18.1   2020-04-30 [2] Bioconductor
## withr                  2.2.0    2020-04-20 [2] CRAN (R 4.0.0)
## XML                    3.99-0.3 2020-01-20 [2] CRAN (R 4.0.0)
## XVector                0.28.0   2020-04-27 [2] Bioconductor
## zlibbioc               1.34.0   2020-04-27 [2] Bioconductor
##
##[1] /users/neagles/R/4.0
##[2] /jhpce/shared/jhpce/core/conda/miniconda3-4.6.14/envs/svnR-4.0/R/4.0/lib64/R/site-library
##[3] /jhpce/shared/jhpce/core/conda/miniconda3-4.6.14/envs/svnR-4.0/R/4.0/lib64/R/library
@PeteHaitch
Copy link
Contributor

Thanks for the report.
I have a suspicion of the cause and will try to fix before the next release (but no promises as my next week or so is pretty packed ...)

@PeteHaitch
Copy link
Contributor

Afraid I haven't found time for this yet and with the next release so close it'll have to wait until the next cycle, sorry.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants