Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cnvkit.py coverage does not honor -q option #913

Open
ishida-md opened this issue Sep 13, 2024 · 5 comments
Open

cnvkit.py coverage does not honor -q option #913

ishida-md opened this issue Sep 13, 2024 · 5 comments

Comments

@ishida-md
Copy link

Hi, I am running the latest cnvkit image on the docker hub.
I get spurious calls where many reads with mapq = 0. I tried cnvkit.py coverage with the -q option, but it does not do anything. Is this a bug?

Here is the snippet:
singularity exec -e --env OMP_NUM_THREADS=1 --bind /home cnvkit_0.9.11.sif cnvkit.py coverage -o mapq0.cnn test.bam xgen.t750.bed

singularity exec -e --env OMP_NUM_THREADS=1 --bind /home cnvkit_0.9.11.sif cnvkit.py coverage -o mapq60.cnn -q 60 test.markdup.bam xgen.t750.bed

mapq0.cnn and mapq60.cnn are identical although I see a lot of mapq = 0 reads on IGV.

@ishida-md ishida-md changed the title coverage does not honor -q option cnvkit.py coverage does not honor -q option Sep 13, 2024
@etal
Copy link
Owner

etal commented Sep 22, 2024

Could be bit rot. Some experimentation showed that even low-quality mappings provided a slightly better copy number signal than removing those reads, so filtering on quality isn't recommended anymore. But MAPQ of 0 is a special case implying unmapped or ambiguously mapped reads. I wouldn't recommend -q 60 (even if it worked) but -q 10 might be appropriate here.

  1. Do the results look better if you remove the MAPQ=0 read alignments from your BAM file before analyzing with CNVkit?
  2. Do you know the source of the MAPQ=0 read alignments? Are they mostly secondary mappings in repetitive parts of the genome?

@ishida-md
Copy link
Author

Thank you for your response. I chose MAPQ=60 in the snippet as an extreme example. I have tried pre-filtering with the minimum MAPQ=20 and saw some differences.

  1. Pre-filtering BAMs yields slightly better-looking results in some samples.

unfiltered
INF13-1

filtered
INF13-1 mapq20

  1. Spurious calls from the unfiltered BAMs tend to come from MAPQ=0 reads that map to genes with paralogues. If I pre-filter the BAMs, these calls are gone.

chr1:13,446,216-13,450,648 (hg19)
PRAMEF13
image

chr1:104,291,000-104,299,000 (hg19)
AMY1C
image

@etal
Copy link
Owner

etal commented Sep 22, 2024

Related: #912

@etal
Copy link
Owner

etal commented Sep 22, 2024

Likely fixed in #912. Can you try the latest development version and see if it works for you now?

@ishida-md
Copy link
Author

Yes, cnvkit.py coverage -q is now working. Thank you!
Could you consider adding -q option to cnvkit.py batch?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants