Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Publish a condensed MultiQC report with the relevant statistics #99

Open
abhi18av opened this issue May 17, 2022 · 9 comments · Fixed by #116 · May be fixed by #221
Open

Publish a condensed MultiQC report with the relevant statistics #99

abhi18av opened this issue May 17, 2022 · 9 comments · Fixed by #116 · May be fixed by #221
Assignees
Labels
enhancement New feature or request priority
Milestone

Comments

@abhi18av
Copy link
Member

The inspiration for this task is the way nf-core uses the MultiQC reports to provide a single-file summary stats about various steps.

@abhi18av
Copy link
Member Author

Meeting notes 23-Aug-2022:

@TimHHH to share the exact outputs to be added to MULTIQC report, for e.g. Cohort stats (output of UTILS_COHORT_STATS )

@TimHHH
Copy link
Collaborator

TimHHH commented Aug 29, 2022

Besides what's already in the MultiQC I would add these:

RAW_TOTAL_SEQS
AVERAGE_QUALITY
AVG_INSERT_SIZE
MAPPED_PERCENTAGE
MEAN_COVERAGE
MEDIAN_COVERAGE
PCT_EXC_MAPQ
PCT_EXC_DUPE
PCT_EXC_UNPAIRED
PCT_EXC_BASEQ
PCT_EXC_OVERLAP
PCT_EXC_TOTAL
PCT_1X
PCT_5X
PCT_10X
PCT_30X
NTM_FRACTION

These can go in the General Statistics section of MultiQC. There is a bit of overlap between this list and the the already present stats (e.g. RAW_TOTAL_SEQS and M Seq) but lets leave these for now, I would like to see if they concur.
We could also add the full quanttb output as seen in the .quanttb_cohort_stats.tsv output.
cheers

@abhi18av
Copy link
Member Author

Reopening this one as this is an area we can improve upon.

@abhi18av abhi18av reopened this Sep 27, 2022
@abhi18av abhi18av added this to the v1.0.0 milestone Sep 27, 2022
@abhi18av abhi18av removed this from the v1.0.0 milestone Nov 6, 2022
@abhi18av
Copy link
Member Author

abhi18av commented Feb 8, 2023

CC @vrennie and @mdediegofuertes regarding the files we need to include in the final MultiQC report

@abhi18av
Copy link
Member Author

Porting over the email response from Vincent and Miguel


Boxplot: Mapped percentage 
Boxplot: Mean Coverage
Barplot: Samples passed to Cohort (yes/no)
Table: Samples that did not pass and reason that they did not pass
Barplot: Samples with Multiple Infection detected by TBProfiler
Barplot: The distribution of resistance profiles (S/Rif-Mono/MDR/Pre-XDR/XDR)
Barplot: The distribution of lineages (e.g. lin1/lin2/lin3/lin4/etc)
Table: Samples identified as a cluster using a 12-SNP Cluster 

@abhi18av abhi18av added this to the v1.2.0 milestone Mar 8, 2024
@abhi18av abhi18av pinned this issue Jul 31, 2024
@Mxrcon
Copy link
Collaborator

Mxrcon commented Aug 22, 2024

👋, Hey there, This week on foward I'll start to implement a multiqc report for the most relevant data from magma.
In the first development iteration between me and @abhi18av, we decided that focused files for the first implementation would be the white files on the following image using custom content and a heatmap.
magma-multiqc

After the first deployment, we'll discuss future enhancements as well as possible new plots and figures using multiqc plugins.

Hope to hear your thoughts on the report once we have a successful implementation!
Kindly,
Davi

@abhi18av
Copy link
Member Author

Hi @Mxrcon , thanks for taking this forward.

I think that for the first iteration of the MultiQC report we can focus on the joint.merged_cohort_stats.tsv as that particular file is generated even if the main MERGE_WF is skipped via skip_merge_analysis parameter.

Once that is done, we can focus on the drug resistance reports.

Therefore, for the time being, the roadmap I have in mind is

  1. V1: joint.merged_cohort_stats.tsv + snp_dists.tsv
  2. V2: drug resistance reports (JSONs)
  3. V3: Phylogeny and Clustering analysis

@abhi18av
Copy link
Member Author

@Mxrcon , I think the smaller milestone might be just to implement MULTIQC within this chunk

MAGMA/main.nf

Line 52 in ee4fece

if (params.only_validate_fastqs) {

This way, anyone who is doing a pre-flight check for quality control of the samples can have shareable results.

As soon as this is done, we merge this PR to master and continue other improvements separately.

What do you think?

@Mxrcon
Copy link
Collaborator

Mxrcon commented Jan 17, 2025

Hmm, I agreed we could start by fastqc + this simple task. I can start moving the work to this workflow as I progress in the testing phase.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment