Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Please updata prepare_annovar_user.pl for clinvar_20240902.vcf.gz #254

Open
xiucz opened this issue Sep 6, 2024 · 13 comments
Open

Please updata prepare_annovar_user.pl for clinvar_20240902.vcf.gz #254

xiucz opened this issue Sep 6, 2024 · 13 comments

Comments

@xiucz
Copy link

xiucz commented Sep 6, 2024

Dear Prof. Wang!

I try to update to ANNOVAR, but it gives error:

Error: invalid record found in avinputfile: <ALLELEID=3416616;CLNHGVS=NC_000001.10:g.2488139G>A;CLNVC=single_nucleotide_variant;CLNVCSO=SO:0001483;GENEINFO=TNFRSF14:8764|TNFRSF14-AS1:115110;MC=SO:0001587|nonsense;ONC=Likely_oncogenic;ONCDISDB=Human_Phenotype_Ontology:HP:0002664,Human_Phenotype_Ontology:HP:0003008,Human_Phenotype_Ontology:HP:0006741,MONDO:MONDO:0005070,MeSH:D009369,MedGen:C0027651;ONCDN=Neoplasm;ONCREVSTAT=criteria_provided,_single_submitter;ORIGIN=2
> at ..//prepare_annovar_user.pl line 301, <FH> line 9862.

Not every record has an CLNDN= tag. Could you update prepare_annovar_user.pl to handle the clinvar_20240902.vcf.gz?

Best,
xiucz

@ritayim2
Copy link

i also enounter the same issue. Can someone advise how to fix?

@kaichop
Copy link
Contributor

kaichop commented Sep 23, 2024

The latest version of ClinVar adds two additional interpretations, ONC (oncogenecity) and SCI (somatic clinical impact). As a result, there will be more columns in the output file.

I have updated the prepare_annovar_user.pl file at http://www.openbioinformatics.org/annovar/download/prepare_annovar_user.pl

To run index_annovar.pl, you also need the comment file at http://www.openbioinformatics.org/annovar/download/comment_clinvar_20240917.txt

ClinVar also introduced the "included variants" in the new version. It is a bit confusing, but the explanation seems to be "Included variants: Classifications in ClinVar may be made for a single variant or a set of variants, such as a haplotype. Variants classified only as part of a set of variants (i.e. no direct classification for the variant itself) are considered "included" variants. The VCF files include both variants with a direct classification and included variants. Included variants do not have an associated disease (CLNDN, CLNDISDB) or a classification (CLNSIG). Instead, there are three INFO tags specific to included variants - CLNDNINCL, CLNDISDBINCL, and CLNSIGINCL (see below)." Based on such descriptions, the included variants should not be in the clinvar database in ANNOVAR and they will not be included.

@xiucz
Copy link
Author

xiucz commented Sep 24, 2024

@kaichop

It appears that the bug is still present.Can you repeat it? Thanks a lot.

version=20240902
wget -c ftp://ftp.ncbi.nlm.nih.gov/pub/clinvar/vcf_GRCh37/clinvar_${version}.vcf.gz
wget -c ftp://ftp.ncbi.nlm.nih.gov/pub/clinvar/vcf_GRCh37/clinvar_${version}.vcf.gz.tbi
~/vt/vt decompose clinvar_${version}.vcf.gz -o temp.split.vcf
perl ..//prepare_annovar_user.pl_20240924 -dbtype clinvar_preprocess2 temp.split.vcf -out temp.split2.vcf

Error,

Error: invalid record found in avinputfile: <ALLELEID=3416616;CLNHGVS=NC_000001.10:g.2488139G>A;CLNVC=single_nucleotide_variant;CLNVCSO=SO:0001483;GENEINFO=TNFRSF14:8764|TNFRSF14-AS1:115110;MC=SO:0001587|nonsense;ONC=Likely_oncogenic;ONCDISDB=Human_Phenotype_Ontology:HP:0002664,Human_Phenotype_Ontology:HP:0003008,Human_Phenotype_Ontology:HP:0006741,MONDO:MONDO:0005070,MeSH:D009369,MedGen:C0027651;ONCDN=Neoplasm;ONCREVSTAT=criteria_provided,_single_submitter;ORIGIN=2
> at ..//prepare_annovar_user.pl_20240924 line 301, <FH> line 9862.

@kaichop
Copy link
Contributor

kaichop commented Sep 24, 2024 via email

@xiucz
Copy link
Author

xiucz commented Sep 26, 2024

It works now. Please keep this issue open for a while for newcomers who might encounter the same problem. Thank you.

@kaichop
Copy link
Contributor

kaichop commented Sep 27, 2024 via email

@ritayim2
Copy link

ritayim2 commented Oct 19, 2024

Dear Kai @kaichop,
Thanks a lot for updating. i tried the new script file for COSMIC100 cruated from Cosmic_GenomeScreensMutant_Vcf_v100_GRCh38.tar and Cosmic_GenomeScreensMutant_Tsv_v100_GRCh38.tar. For your information, the output file hg38_cosmic100.txt is 29837298 line as tutorial. However, I still encounter similar error and hope you or someone can advise on the issue:
prefield not defined (chr9 5022158 5022158 T A exonic JAK2 . nonsynonymous_SNV JAK2:NM_001322195:exon2:c.T171A:p.F57L,JAK2:NM_001322196:exon2:c.T171A:p.F57L,JAK2:NM_001322194:exon3:c.T171A:p.F57L,JAK2:NM_004972:exon3:c.T171A:p.F57L 9p24.1 . . . . . . . . . . . . . 0.313 0.13879 T 0.122 0.35710 T 0.012 0.16265 B 0.006 0.12133 B 0.000077 0.52346 D 0.137700 0.999386 0.46865 D 1.525 0.38595 L 0.87 0.46412 T -2.43 0.53258 N 0.549 0.57518 -1.0824 0.06890 T 0.065 0.26772 T 9 0.16422781 0.30730 T 0.008676 0.22927 T 0.127 0.34888 0.628 0.76366 0.5410184844 0.53754 0.4539470442581493 0.45312 0.213371676131 0.23852 0.609011411667 0.54175 T 0.616051 0.87899 D -0.299865 0.08672 T -0.404083 0.32910 T 0.0968818226698108 0.12010 T 0.79482 0.43725 T 0.4743418 0.65556 0.18350086 0.41343 0.4743418 0.65557 0.18350086 0.41342 -6.855 0.52967 T 0.3691061075986174 0.46481 0.877 0.81319 P .\x3b. .\x3b. 1.575725 0.20171 14.62 0.87117516596749178 0.17000 0.79197 0.39199 D AEFDBI 0.154525 0.27957 N -0.494471416308901 0.22260 1.187424 -0.380256747560047 0.25586 1.40798 0.891253416725197 0.25855 0.653281 0.48532 0 0.653731 0.59785 0 0.547309 0.15389 0 0.669 0.65921 0 . . 5.57 3.18 0.35615 1.125000 0.30946 . . 0.609000 0.47794 1.000000 0.71638 1.000000 0.68203 0.643000 0.32503 0.1318:0.1432:0.0:0.7249 5.189 0.14524 698 0.58074 FERM_domain|Band_4.1_domain\x3bFERM_domain|Band_4.1_domain . . . . . . . . . . . . 1.883e-05 0 0 0.0003 0 0 0 0 1.368e-06 1.368e-06 1.361e-06 1.375e-06 5.038e-05 2.3e-07 9e-08 8.35e-06 3.12e-06 0 0 0 5.038e-05 0 0 0 0 0 . . . 0.002959 . . chr9 5022158 . T A . PASS ADP=539;WT=0;HET=1;HOM=0;NC=0 GT:GQ:SDP:DP:RD:AD:FREQ:PVAL:RBQ:ABQ:RDF:RDR:ADF:ADR:AF 0/0:.:.:.:.:.:.:.:.:.:.:.:.:.:. 0/0:.:.:.:.:.:.:.:.:.:.:.:.:.:. 0/0:.:.:.:.:.:.:.:.:.:.:.:.:.:. 0/0:.:.:.:.:.:.:.:.:.:.:.:.:.:. 0/0:.:.:.:.:.:.:.:.:.:.:.:.:.:. 0/0:.:.:.:.:.:.:.:.:.:.:.:.:.:. 0/0:.:.:.:.:.:.:.:.:.:.:.:.:.:. 0/1:255:539:539:282:257:47.68%:8.9184E-96:35:32:98:184:100:157:0.476809 0/0:.:.:.:.:.:.:.:.:.:.:.:.:.:. 0/0:.:.:.:.:.:.:.:.:.:.:.:.:.:. 0/0:.:.:.:.:.:.:.:.:.:.:.:.:.:. 0/0:.:.:.:.:.:.:.:.:.:.:.:.:.:. 0/0:.:.:.:.:.:.:.:.:.:.:.:.:.:. 0/0:.:.:.:.:.:.:.:.:.:.:.:.:.:. 0/0:.:.:.:.:.:.:.:.:.:.:.:.:.:. 0/0:.:.:.:.:.:.:.:.:.:.:.:.:.:. 0/0:.:.:.:.:.:.:.:.:.:.:.:.:.:. 0/0:.:.:.:.:.:.:.:.:.:.:.:.:.:. with field=408 and prefield=390 at /d1/software/annovar/table_annovar.pl line 186, line 9.

@kaichop
Copy link
Contributor

kaichop commented Oct 19, 2024 via email

@kaichop
Copy link
Contributor

kaichop commented Oct 19, 2024 via email

@ritayim2
Copy link

Thank you for your fast response, I did successfully generated hg38_cosmic100.txt with the same number of lines as your instruction. Just when I run the ANNOVAR to annotate my vcf, I get an error which casued premature termination of the process. saying the one of the variant that my in vcf has "prefield not defined"
However i don't see this error when I re-run the same VCF with my old hg38_cosmic98_coding.txt

@kaichop
Copy link
Contributor

kaichop commented Oct 19, 2024 via email

@ritayim2
Copy link

Yes. you are correct I have problem annotating one line of variants. I just retried using just refGene and COSMIC and i still get the error. My many columns comes from the FORMAT field which is a multi-sample vcf with around 15 subfields.

Interestingly, I notice the line after this failing line is a variant that COSMIC100 has entry. And the information has spanned to other colmuns (I have uploaded here for your reference).
dummy.hg38_multianno.txt

@lincj1994
Copy link

Hi @xiucz .
I encountered a similar error #264. I'm wondering if you can help resolve this error or share your code.
Many thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants