Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mistake in BSseq constructor? #80

Open
rcavalcante opened this issue Feb 2, 2019 · 9 comments
Open

Mistake in BSseq constructor? #80

rcavalcante opened this issue Feb 2, 2019 · 9 comments

Comments

@rcavalcante
Copy link

rcavalcante commented Feb 2, 2019

Hello,

I'm trying to get the code for methylSig in working order with the Bioc 3.8 and devel versions of bsseq and I'm running into a problem I didn't have with previous versions.

Namely,

library(bsseq)

gr =GRanges(
    seqnames = c('chr21','chr21','chr21'),
    ranges = IRanges(start = c(1,3,5), end = c(1,3,5)),
    strand = '*'
)

m = matrix(
    data = c(34,10,0,300,87,434), 
    nrow = 3, ncol = 2,
    dimnames = list(NULL, c('test_1','test_2')))

cov = matrix(
    data = c(34,10,0,300,96,458), 
    nrow = 3, ncol = 2,
    dimnames = list(NULL, c('test_1','test_2')))

pData = data.frame(
    Sample_Name = c('test_1','test_2'),
    Group = factor(1,0),
    row.names = c('test_1','test_2')
)

bs = bsseq::BSseq(gr = gr, M = m, Cov = cov, pData = pData)

Gives the following error despite the rownames of pData being correct relative to the M and Cov parameters:

Error in `rownames<-`(`*tmp*`, value = .get_colnames_from_assays(assays)) : 
  invalid rownames length

I've tracked down the problem to this block of the constructor (lines 91-111 of BSseq-class.R):

    # Process 'sampleNames' and 'pData'.
    if (is.null(sampleNames)) {
        if (is.null(pData)) {
            # BSseq object will have no colnames.
            pData <- S4Vectors:::new_DataFrame(nrows = ncol(M))
        } else {
            # BSseq object will have 'sampleNames' as colnames.
            pData <- DataFrame(row.names = sampleNames)
        }
    } else {
        if (is.null(pData)) {
            # BSseq object will have 'sampleNames' as colnames.
            pData <- DataFrame(row.names = sampleNames)
        } else {
            if (is.null(rownames(pData))) {
                rownames(pData) <- sampleNames
            } else {
                stopifnot(identical(rownames(pData), sampleNames))
            }
        }
    }

The small example I've given above has NULL sampleNames but non-NULL pData, so it tries to run pData <- DataFrame(row.names = sampleNames), which obliterates the pData I've passed and causes the invalid rownames error.

I think, at least. Please let me know if I've misunderstood or you can't reproduce with the example above. Thanks in advance!

Here is my sessionInfo(), I'm running in the Bioc 3.8 release Docker image with updated packages:

R version 3.5.1 (2018-07-02)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Debian GNU/Linux 9 (stretch)

Matrix products: default
BLAS: /usr/lib/openblas-base/libblas.so.3
LAPACK: /usr/lib/libopenblasp-r0.2.19.so

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=C             
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] parallel  stats4    stats     graphics  grDevices utils     datasets 
[8] methods   base     

other attached packages:
 [1] SummarizedExperiment_1.12.0 DelayedArray_0.8.0         
 [3] BiocParallel_1.16.5         matrixStats_0.54.0         
 [5] Biobase_2.42.0              GenomicRanges_1.34.0       
 [7] GenomeInfoDb_1.18.1         IRanges_2.16.0             
 [9] S4Vectors_0.20.1            BiocGenerics_0.28.0        
[11] methylSig_0.6.0             testthat_2.0.1             
[13] BiocManager_1.30.4         

loaded via a namespace (and not attached):
 [1] DSS_2.30.1                    bitops_1.0-6                 
 [3] fs_1.2.6                      usethis_1.4.0                
 [5] devtools_2.0.1                bit64_0.9-7                  
 [7] bsseq_1.18.0                  progress_1.2.0               
 [9] httr_1.4.0                    rprojroot_1.3-2              
[11] tools_3.5.1                   backports_1.1.3              
[13] R6_2.3.0                      HDF5Array_1.10.1             
[15] DBI_1.0.0                     lazyeval_0.2.1               
[17] colorspace_1.4-0              permute_0.9-4                
[19] withr_2.1.2                   tidyselect_0.2.5             
[21] prettyunits_1.0.2             processx_3.2.1               
[23] bit_1.1-14                    compiler_3.5.1               
[25] cli_1.0.1                     desc_1.2.0                   
[27] rtracklayer_1.42.1            scales_1.0.0                 
[29] readr_1.3.1                   callr_3.1.1                  
[31] stringr_1.3.1                 digest_0.6.18                
[33] Rsamtools_1.34.1              R.utils_2.7.0                
[35] XVector_0.22.0                pkgconfig_2.0.2              
[37] htmltools_0.3.6               sessioninfo_1.1.1            
[39] limma_3.38.3                  BSgenome_1.50.0              
[41] regioneR_1.14.0               rlang_0.3.1                  
[43] rstudioapi_0.9.0              RSQLite_2.1.1                
[45] DelayedMatrixStats_1.4.0      shiny_1.2.0                  
[47] bindr_0.1.1                   gtools_3.8.1                 
[49] R.oo_1.22.0                   dplyr_0.7.8                  
[51] RCurl_1.95-4.11               magrittr_1.5                 
[53] GenomeInfoDbData_1.2.0        Matrix_1.2-15                
[55] Rhdf5lib_1.4.2                Rcpp_1.0.0                   
[57] munsell_0.5.0                 R.methodsS3_1.7.1            
[59] stringi_1.2.4                 yaml_2.2.0                   
[61] zlibbioc_1.28.0               rhdf5_2.26.2                 
[63] pkgbuild_1.0.2                plyr_1.8.4                   
[65] AnnotationHub_2.14.3          grid_3.5.1                   
[67] blob_1.1.1                    promises_1.0.1               
[69] crayon_1.3.4                  lattice_0.20-38              
[71] Biostrings_2.50.2             GenomicFeatures_1.34.3       
[73] hms_0.4.2                     locfit_1.5-9.1               
[75] ps_1.3.0                      pillar_1.3.1                 
[77] boot_1.3-20                   reshape2_1.4.3               
[79] biomaRt_2.38.0                pkgload_1.0.2                
[81] XML_3.98-1.16                 glue_1.3.0                   
[83] annotatr_1.8.0                data.table_1.12.0            
[85] remotes_2.0.2                 httpuv_1.4.5.1               
[87] gtable_0.2.0                  purrr_0.3.0                  
[89] assertthat_0.2.0              ggplot2_3.1.0                
[91] mime_0.6                      xtable_1.8-3                 
[93] later_0.7.5                   tibble_2.0.1                 
[95] GenomicAlignments_1.18.1      AnnotationDbi_1.44.0         
[97] memoise_1.1.0                 bindrcpp_0.2.2               
[99] interactiveDisplayBase_1.20.0
@PeteHaitch
Copy link
Contributor

Thanks for the example. I'll take a look this week and get back to you.

@lcolladotor
Copy link

Hi @PeteHaitch,

I ran into a very similar issue. I believe that the culprit is this: https://github.com/hansenlab/bsseq/blame/master/R/BSseq-class.R#L98 since to get to it, the sampleNames have to be NULL and pData is not NULL. So it looks to me like

bsseq/R/BSseq-class.R

Lines 96 to 99 in 3281f31

} else {
# BSseq object will have 'sampleNames' as colnames.
pData <- DataFrame(row.names = sampleNames)
}
needs to be just a single } (remove the else clause).

Best,
Leo

@lcolladotor
Copy link

Ok, I just tested and it works for me using data from https://github.com/LieberInstitute/brain-epigenomics/blob/master/DMR_acf/compute_DMR_acf.R (under the following test case https://github.com/LieberInstitute/brain-epigenomics/blob/master/DMR_acf/compute_DMR_acf.R#L29).

> packageVersion('bsseq')
[1] ‘1.18.0

@lcolladotor
Copy link

This PR #81 resolves this

@lcolladotor
Copy link

If possible, can you port it also to the RELEASE branch? Thx!

@lcolladotor
Copy link

Hi Pete,

I git cherry-picked my commit to my fork of the RELEASE_3_8 branch and was able to install it with devtools::install_github('lcolladotor/bsseq@RELEASE_3_8'), so there's no hurry from my side about this. My old code now works again ^^

Best,
Leo

@PeteHaitch
Copy link
Contributor

Not forgotten this, just been swamped preparing for conference next week.
Thanks for your patience.

@rcavalcante
Copy link
Author

I completely understand. No problem.

@rcavalcante
Copy link
Author

It appears that one easy fix is to always include both pData and sampleNames.

rcavalcante added a commit to sartorlab/methylSig that referenced this issue Mar 10, 2019
* The issue (hansenlab/bsseq#80) is fixed by including both pData and sampleNames, even though you shouldn't really have to. There's no harm in doing so.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants