Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Valid SNPIDs missing in vcf file #221

Open
matomol opened this issue Sep 18, 2024 · 3 comments
Open

Valid SNPIDs missing in vcf file #221

matomol opened this issue Sep 18, 2024 · 3 comments

Comments

@matomol
Copy link

matomol commented Sep 18, 2024

Operating System

Ubuntu 22.04

Other Linux

No response

Workflow Version

latest

Workflow Execution

Command line (Local)

Other workflow execution

No response

EPI2ME Version

No response

CLI command run

    nextflow run epi2me-labs/wf-human-variation \
    --out_dir ${VAR_DIR} \
    -w ${WORK_DIR} \
    --bam $ALN_DIR \
    --ref ${REFERENCE[$ARG1]} \
    --sample_name ${ARG0} \
    --bed $BED_REFERENCE/${REF_TYPE[$ARG1]}/hg38bed.bed \
    --bam_min_coverage 5 \
    --snp \
    --sv \
    --mod \
    --phased \
    --cnv \
    --str \
    -profile standard

Workflow Execution - CLI Execution Profile

standard (default)

What happened?

I performed an analysis of the snp.vcf file and realized that although the variantes are correctly annotated the SNPID is missing. This is in particular true for SNPID with higher numbers, so that I assume that an outdated SNP reference database is used.

Relevant log output

Here are some examples:
Correctly annotated and the correct SNPID attached with the following SNPs
snpid 	alleles 	reference 	alternatives
5930 	(A, G) 	    A 	        (G,)
5927 	(A, G) 	    A 	        (G,)

Correctly annotated but the proper SNPID missing are tzhe following variants
rs45508991,  rs72658861, rs11669576

Application activity log entry

There is nothing unusual. The output is of a normal basecalling.

Were you able to successfully run the latest version of the workflow with the demo data?

yes

Other demo data information

no
@matomol
Copy link
Author

matomol commented Sep 18, 2024

Sorry a typo. Not basecalling but variant calling, of course.

@vlshesketh
Copy link
Contributor

Hi @matomol, apologies for the delay in responding. Please can you provide a bit more information so I can assist you better - by 'snpid', do you mean the dbSNP identifier? We perform annotation with SnpEff as follows: first to add basic annotations, and then to annotate using ClinVar. The ClinVar VCF we use is out of date so we are in the process of updating that, but there won't be any dbSNP/rsIDs in the output VCFs as we are not using this dataset to annotate.

@matomol
Copy link
Author

matomol commented Oct 15, 2024

Please find below the summary that I did for just one gene, LDLR. A similar statistics is prepared for all the genes on the Illumina Panel once we succeded to lift it over successfully.

The ClinVar VCF we use is out of date so we are in the process of updating that
Well maybe that will solve most of the problem.

Correctly annotated and with SNPID attached are only the two following SNPs in that region tested

snpid alleles reference alternatives
5930 (A, G) A (G,)
5927 (A, G) A (G,)

The following SNPIDs where correctly found by Illumina and Nanopore, but only Illumina attached the correct SNPID.

rs11669576
ANN = ('A|missense_variant|MODERATE|LDLR|LDLR|transcript|NM_000527.5|protein_coding|8/18|c.1171G>A|p.Ala391Thr|1257/5173|1171/2583|391/860||', 'A|missense_variant|MODERATE|LDLR|LDLR|transcript|XM_011528010.2|protein_coding|8/17|c.1171G>A|p.Ala391Thr|1288/5126|1171/2505|391/834||', 'A|missense_variant|MODERATE|LDLR|LDLR|transcript|NM_001195798.2|protein_coding|8/18|c.1171G>A|p.Ala391Thr|1257/5167|1171/2577|391/858||', 'A|missense_variant|MODERATE|LDLR|LDLR|transcript|NM_001195799.2|protein_coding|7/17|c.1048G>A|p.Ala350Thr|1134/5050|1048/2460|350/819||', 'A|missense_variant|MODERATE|LDLR|LDLR|transcript|NM_001195800.2|protein_coding|6/16|c.667G>A|p.Ala223Thr|753/4669|667/2079|223/692||', 'A|missense_variant|MODERATE|LDLR|LDLR|transcript|NM_001195803.2|protein_coding|7/16|c.790G>A|p.Ala264Thr|876/4639|790/2049|264/682||', 'A|upstream_gene_variant|MODIFIER|MIR6886|MIR6886|transcript|NR_106946.1|pseudogene||n.-1850G>A|||||1850|', 'A|upstream_gene_variant|MODIFIER|MIR6886|MIR6886|transcript|unassigned_transcript_3212|miRNA||n.-1855G>A|||||1855|', 'A|upstream_gene_variant|MODIFIER|MIR6886|MIR6886|transcript|unassigned_transcript_3213|miRNA||n.-1887G>A|||||1887|', 'A|non_coding_transcript_exon_variant|MODIFIER|LDLR|LDLR|transcript|XR_001753685.2|pseudogene|8/18|n.1288G>A||||||', 'A|non_coding_transcript_exon_variant|MODIFIER|LDLR|LDLR|transcript|XR_001753686.2|pseudogene|8/17|n.1288G>A||||||')

rs72658861
ANN = ('C|splice_region_variant&intron_variant|LOW|LDLR|LDLR|transcript|NM_000527.5|protein_coding|7/17|c.1061-8T>C||||||', 'C|splice_region_variant&intron_variant|LOW|LDLR|LDLR|transcript|XM_011528010.2|protein_coding|7/16|c.1061-8T>C||||||', 'C|splice_region_variant&intron_variant|LOW|LDLR|LDLR|transcript|XR_001753685.2|pseudogene|7/17|n.1178-8T>C||||||', 'C|splice_region_variant&intron_variant|LOW|LDLR|LDLR|transcript|XR_001753686.2|pseudogene|7/16|n.1178-8T>C||||||', 'C|splice_region_variant&intron_variant|LOW|LDLR|LDLR|transcript|NM_001195798.2|protein_coding|7/17|c.1061-8T>C||||||', 'C|splice_region_variant&intron_variant|LOW|LDLR|LDLR|transcript|NM_001195799.2|protein_coding|6/16|c.938-8T>C||||||', 'C|splice_region_variant&intron_variant|LOW|LDLR|LDLR|transcript|NM_001195800.2|protein_coding|5/15|c.557-8T>C||||||', 'C|splice_region_variant&intron_variant|LOW|LDLR|LDLR|transcript|NM_001195803.2|protein_coding|6/15|c.680-8T>C||||||', 'C|upstream_gene_variant|MODIFIER|MIR6886|MIR6886|transcript|NR_106946.1|pseudogene||n.-1968T>C|||||1968|', 'C|upstream_gene_variant|MODIFIER|MIR6886|MIR6886|transcript|unassigned_transcript_3212|miRNA||n.-1973T>C|||||1973|', 'C|upstream_gene_variant|MODIFIER|MIR6886|MIR6886|transcript|unassigned_transcript_3213|miRNA||n.-2005T>C|||||2005|')

rs45508991
ANN = ('T|missense_variant|MODERATE|LDLR|LDLR|transcript|NM_000527.5|protein_coding|15/18|c.2177C>T|p.Thr726Ile|2263/5173|2177/2583|726/860||', 'T|missense_variant|MODERATE|LDLR|LDLR|transcript|XM_011528010.2|protein_coding|15/17|c.2177C>T|p.Thr726Ile|2294/5126|2177/2505|726/834||', 'T|missense_variant|MODERATE|LDLR|LDLR|transcript|NM_001195798.2|protein_coding|15/18|c.2177C>T|p.Thr726Ile|2263/5167|2177/2577|726/858||', 'T|missense_variant|MODERATE|LDLR|LDLR|transcript|NM_001195799.2|protein_coding|14/17|c.2054C>T|p.Thr685Ile|2140/5050|2054/2460|685/819||', 'T|missense_variant|MODERATE|LDLR|LDLR|transcript|NM_001195800.2|protein_coding|13/16|c.1673C>T|p.Thr558Ile|1759/4669|1673/2079|558/692||', 'T|missense_variant|MODERATE|LDLR|LDLR|transcript|NM_001195803.2|protein_coding|13/16|c.1643C>T|p.Thr548Ile|1729/4639|1643/2049|548/682||', 'T|non_coding_transcript_exon_variant|MODIFIER|LDLR|LDLR|transcript|XR_001753685.2|pseudogene|15/18|n.2511C>T||||||', 'T|non_coding_transcript_exon_variant|MODIFIER|LDLR|LDLR|transcript|XR_001753686.2|pseudogene|14/17|n.2154C>T||||||')

Lastly, this SNP was not detected by Nanopore.

rs2738442
not detected by nNanopore sequencing SNP NM_000527.5:c.1060+7C>A

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

2 participants