-
Notifications
You must be signed in to change notification settings - Fork 695
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Limma mixed models feature #6753
base: master
Are you sure you want to change the base?
Conversation
@@ -4,17 +4,19 @@ process LIMMA_DIFFERENTIAL { | |||
|
|||
conda "${moduleDir}/environment.yml" | |||
container "${ workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container ? | |||
'https://depot.galaxyproject.org/singularity/bioconductor-limma:3.54.0--r42hc0cfd56_0' : | |||
'biocontainers/bioconductor-limma:3.54.0--r42hc0cfd56_0' }" | |||
'oras://community.wave.seqera.io/library/bioconductor-edger_bioconductor-ihw_bioconductor-limma_r-dplyr_r-readr:7fc48564d286c1c6' : |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we shouldn't have any oras protocol in the modules, it doesn't work with NXF_SINGULARITY_CACHEDIR
Can you swittch to https?
'https://depot.galaxyproject.org/singularity/bioconductor-limma:3.54.0--r42hc0cfd56_0' : | ||
'biocontainers/bioconductor-limma:3.54.0--r42hc0cfd56_0' }" | ||
'oras://community.wave.seqera.io/library/bioconductor-edger_bioconductor-ihw_bioconductor-limma_r-dplyr_r-readr:7fc48564d286c1c6' : | ||
'community.wave.seqera.io/library/bioconductor-edger_bioconductor-ihw_bioconductor-limma_r-dplyr_r-readr:edea0f9fbaeba3a0' }" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
'community.wave.seqera.io/library/bioconductor-edger_bioconductor-ihw_bioconductor-limma_r-dplyr_r-readr:edea0f9fbaeba3a0' }" | |
'nf-core/bioconductor-edger_bioconductor-ihw_bioconductor-limma_r-dplyr_r-readr:edea0f9fbaeba3a0' }" |
I pulled and pushed to quay.io
So we can change the registry from all container in a simpler way.
We'll be using community.wave.seqera.io
as registry when we do the switch
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not keen on this @maxulysse, there's no way for someone without privilege to do this and maintain the module.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are a few changes I'd like to see here.
I understand the attraction of readr etc, but until we have full Seqera Containers integration with nf-core tooling (which shouldn't be long), extra packages create challenges.
The forking of logic is also a bit ugly. If the logic is sufficiently different to require two scripts it should be two modules- though I'd favour integration if it didn't create excess complexity in the R code.
dependencies: | ||
- bioconda::bioconductor-limma=3.54.0 | ||
- bioconda::bioconductor-edger=4.0.16 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This multi-package thing still creates difficulties right now- which is why I wrote this module depending on a single Biocontainer.
@@ -4,17 +4,19 @@ process LIMMA_DIFFERENTIAL { | |||
|
|||
conda "${moduleDir}/environment.yml" | |||
container "${ workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container ? | |||
'https://depot.galaxyproject.org/singularity/bioconductor-limma:3.54.0--r42hc0cfd56_0' : | |||
'biocontainers/bioconductor-limma:3.54.0--r42hc0cfd56_0' }" | |||
'oras://community.wave.seqera.io/library/bioconductor-edger_bioconductor-ihw_bioconductor-limma_r-dplyr_r-readr:7fc48564d286c1c6' : |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These oras links don't work well with nf-core tooling right now, unfortunately
@@ -23,5 +25,10 @@ process LIMMA_DIFFERENTIAL { | |||
task.ext.when == null || task.ext.when | |||
|
|||
script: | |||
template 'limma_de.R' | |||
if (type == 'rnaseq') { | |||
template 'limma_de_rnaseq.R' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not massively keen on the parallel script.
Either the new thing should be a separate module, or they should be properly integrated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How you envision "proper" integration? As single template? I will refactor that if needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I mean, if the two scripts share enough logic, they should be one with a simple conditional. If the logic is quite divergent such that doing that adds too much complexity, these should be separate modules.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well I had similar doubts in the one hand this is still limma and differential abundance analysis and on the other site this one is combined with voom and there is lot of differences. So maybe we will create new module named limma/differential-voom? or simply limma/voom - what are your thoughts on the module naming?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this should be limma/differential-voom.
As in the comments below, I also think this should be, as much as possible, a thin wrapper around Limma functions only. Otherwise it's just your custom script, rather than a 'standard' Limma module.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updating this older comment after newer ones below: I don't believe additional logic required for Voom merits its own module. We can add the Voom part in a conditional, and other changes (e.g. duplicateCorrelation) etc, apply equally to non-Voom Limma.
- bioconda::bioconductor-limma=3.54.0 | ||
- bioconda::bioconductor-edger=4.0.16 | ||
- bioconda::bioconductor-ihw=1.28.0 | ||
- bioconda::bioconductor-limma=3.58.1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Limma is a dependency of edger, you don't need to include it here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is true however without this we won't be able to control limma version which will be used
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We only pin the primary dependency in the modules repo, unless you have a really compelling reason to do otherwise. We will at some point have lock files that pin all dependencies.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(as in other comments, I'm not sure we need edgeR here, so we'd only pin Limma)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well we noticed that results are slightly different depending on limma versions so we decided to pin it to achieve reproducibility. I will adopt your comment in standalone module
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, I see your point on this one, apologies. Limma is the primary dependency so it's pinned, but we need edgeR.
dim(DGE) | ||
|
||
# Calculate normalization factors | ||
DGE <- calcNormFactors(DGE) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please correct me if I'm wrong, but this and the DGEList are the only things that need edgeR here, right? voom() will do the normalisation things, and will just take a matrix. Then we don't need the edgeR dependency.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We use edgeR dependency to read raw matrix count to create DGEList object which is one of required ways to provide expression data to voom: "counts - a numeric matrix containing raw counts, or an ExpressionSet containing raw counts, or a DGEList object." : https://www.rdocumentation.org/packages/limma/versions/3.28.14/topics/voom
So we either need BioBase or edgeR dependency.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, as I say, we don't need edgeR, and that will simplify the software dependencies
Actually, after reminding myself, I take this back, think it probably does need the DGEList etc the way you're doing.
## Generate outputs | ||
# Differential expression table | ||
|
||
if (opt\$IHW_correction) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMO if we're writing 'limma' modules they should be thin-as-possible wrappers around Limma functions. We shouldn't we embedding lots of additional logic, aside from the boilerplate required to pass inputs to those functions correctly and write the outputs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Extensive customisations should probably be in local modules in the workflow, and we'd want to discuss to determine if this was part of standard best practice.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've just had some time to review this again, and I think the R part also needs work. There's no need to treat the pairing variable in a special way as far as I can tell, it can be a blocking variable like any other. At that point, the main difference between the two modes is the use of duplicateCorrelation, for which there could just be a separate flag.
Fundamentally, this would integrate much better into the existing differentialabundance logic if started from the existing limma module.
I actually don't think the logic is very different with using Voom, so for basic Voom integration the existing module can just have a conditional in it something like this, about here:
if (!is.null(opt$use_voom) && opt$use_voom) {
# Create a DGEList object for RNA-seq data
dge <- DGEList(counts = intensities.table)
# Normalize counts using TMM
dge <- calcNormFactors(dge, method = "TMM")
# Run voom to transform the data
voom_result <- voom(dge, design)
data_for_fit <- voom_result
} else {
# Use as.matrix for regular microarray analysis
data_for_fit <- as.matrix(intensities.table)
}
All the other logic can be the same, and then we can add in any features you think are missing from there.
# Calculate normalization factors | ||
DGE <- calcNormFactors(DGE) | ||
|
||
if (opt\$analysis_type == "pairwise") { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if (opt\$analysis_type == "pairwise") { | |
if (opt\$analysis_type == "pairwise") { |
I don't think you even need this extra mode. Sample pairing can be handled as a blocking variable, no? Even if it was needed, you're duplicating a lot of code between the two modes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well we are duplicating some code. Unfortunately the mixed model integration have large amount of small additions which resulted in multiple if's example can be voom which is called twice. I will try to incorporate voom into original script. I'm not yet fulli convinced about mixedmodel mode.
Just to re-state the above, and after refreshing myself on the Voom logic, I actually don't think Voom needs to be its own module. I think this can be integrated into the existing module. The voom-specific component is a matter of a few lines. All the other changes are not Voom specific. |
I missed your earlier comment I will take a closer look on this one and figure out how to incorporate mixedmodel in the nice clean way. At least thanks to new module we have nf-test almost ready :D |
@pinin4fjords I've extended original limma module with voom and IHW correction. This is still a draft as my results are now significantly different and sitll this have to be sorted out. I wanted to ask you if you fill that this is a right direction. Also any suggestion regarding incorporation of mixed models logic will be welcome :) Thank you in advance. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the ongoing work- this is definitely going the right way.
But I still think the IHW code doesn't belong in a module wrapping standard Limma functions.
@@ -322,18 +357,54 @@ ebayes_args = c( | |||
|
|||
fit2 <- do.call(eBayes, ebayes_args) | |||
|
|||
# Run topTable() to generate a results data frame | |||
if ((!is.null(opt\$use_voom) && opt\$use_voom) && (!is.null(opt\$IHW_correction) && opt\$IHW_correction)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm sorry, but I don't think the IHW correction belongs in this module.
nf-core modules should be thin wrappers around underlying software. We have some unavoidable boilerplate when we're wrapping library code like this, but we shouldn't be adding to that with our own judgement calls- it makes the code harder to maintain, and harder for users expecting Limma-native functionality to understand.
data_for_fit <- voom_result | ||
|
||
# Write the normalized counts matrix to a TSV file | ||
normalized_counts <- voom_result\$E |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we're adding export of normalised values (which I think is a good thing) we should do the same for the microarry (e.g. with normalizeBetweenArrays
), and do the writing outside this conditional.
This extends limma functionality with option to process rna-seq data in pairwise and mixed mode. It will be used in differentialabundance pipeline and in future can be used in other pipelines which are using limma.
PR checklist
Closes #XXX
versions.yml
file.label
nf-core modules test <MODULE> --profile docker
nf-core modules test <MODULE> --profile singularity
nf-core modules test <MODULE> --profile conda
nf-core subworkflows test <SUBWORKFLOW> --profile docker
nf-core subworkflows test <SUBWORKFLOW> --profile singularity
nf-core subworkflows test <SUBWORKFLOW> --profile conda