Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BSseq Constructor fails if rmZeroCov = TRUE #95

Open
mdperry opened this issue Jul 23, 2020 · 1 comment
Open

BSseq Constructor fails if rmZeroCov = TRUE #95

mdperry opened this issue Jul 23, 2020 · 1 comment

Comments

@mdperry
Copy link

mdperry commented Jul 23, 2020

I have installed bsseq version 1.24.4 (as part of my efforts to get a 5hmC pipeline named mint (2017) running). mint requires the BioConductor package named dss, and somewhere along the way dss uses bsseq.

The error message I get from R (version 4.0) is:

Error in validObject(.Object) :
invalid class "RangedSummarizedExperiment" object:
'x@assays' is not parallel to 'x'

I have tracked down the bug to this section of the BSseq constructor:

`# Collapse duplicate loci --------------------------------------------------
is_duplicated <- duplicated(gr)
if (any(is_duplicated)) {
    warning("Detected duplicate loci. Collapsing counts in 'M' and 'Cov' ",
            "at these positions.")
    if (!is.null(coef) || !is.null(se.coef)) {
        stop("Cannot collapse when 'coef' or 'se.coef' are non-NULL.")
    }
    loci <- gr[!is_duplicated]
    ol <- findOverlaps(gr, loci, type = "equal")
    M <- rowsum(x = M, group = subjectHits(ol), reorder = FALSE)
    rownames(M) <- NULL
    Cov <- rowsum(x = Cov, group = subjectHits(ol), reorder = FALSE)
    rownames(Cov) <- NULL
} else {
    loci <- gr
}


# Optionally, remove positions with zero coverage --------------------------

if (rmZeroCov) {
    loci_with_zero_cov <- rowAlls(Cov, value = 0)
    if (any(loci_with_zero_cov)) {
        loci_with_nonzero_cov <- !loci_with_zero_cov
        gr <- gr[loci_with_nonzero_cov]
        M <- M[loci_with_nonzero_cov, , drop = FALSE]
        Cov <- Cov[loci_with_nonzero_cov, , drop = FALSE]
        coef <- coef[loci_with_nonzero_cov, , drop = FALSE]
        se.coef <- se.coef[loci_with_nonzero_cov, , drop = FALSE]
    }
}

# Construct BSseq object ---------------------------------------------------

assays <- SimpleList(M = M, Cov = Cov, coef = coef, se.coef = se.coef)
assays <- assays[!S4Vectors:::sapply_isNULL(assays)]
se <- SummarizedExperiment(
    assays = assays,
    rowRanges = loci,
    colData = pData)
.BSseq(se, trans = trans, parameters = parameters)`

If there are no duplicated loci then the GRanges object gets stored in the variable named loci.
If the rmZeroCov = TRUE then next block of code identifies data rows where all of the values are zero, and tags
them as such. Then the data rows that contain non-zero data are also tagged, and used to filter the original GRanges object, named gr, as well as the methylation matrix named 'M' and the coverage matrix named 'Cov'. So far, so good. However at the next step, while these filtered M and Cov matrices, which are now much smaller, are used to create the new BSseq object, the current version of the code DOES NOT use the smaller filtered GRanges object named gr, but instead sends the relatively larger, unfiltered, GRanges object named 'loci' as the rowRanges parameter.

One fix would be to change the RangedSummarizedExperiment constructor to:
rowRanges = gr

alternatively, higher up you could modify loci (instead of gr):
loci <- loci[loci_with_non_zero_cov]

Thanks

@kasperdanielhansen
Copy link
Contributor

kasperdanielhansen commented Jul 24, 2020 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants