-
Notifications
You must be signed in to change notification settings - Fork 363
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Please updata prepare_annovar_user.pl for clinvar_20240902.vcf.gz #254
Comments
i also enounter the same issue. Can someone advise how to fix? |
The latest version of ClinVar adds two additional interpretations, ONC (oncogenecity) and SCI (somatic clinical impact). As a result, there will be more columns in the output file. I have updated the prepare_annovar_user.pl file at http://www.openbioinformatics.org/annovar/download/prepare_annovar_user.pl To run index_annovar.pl, you also need the comment file at http://www.openbioinformatics.org/annovar/download/comment_clinvar_20240917.txt ClinVar also introduced the "included variants" in the new version. It is a bit confusing, but the explanation seems to be "Included variants: Classifications in ClinVar may be made for a single variant or a set of variants, such as a haplotype. Variants classified only as part of a set of variants (i.e. no direct classification for the variant itself) are considered "included" variants. The VCF files include both variants with a direct classification and included variants. Included variants do not have an associated disease (CLNDN, CLNDISDB) or a classification (CLNSIG). Instead, there are three INFO tags specific to included variants - CLNDNINCL, CLNDISDBINCL, and CLNSIGINCL (see below)." Based on such descriptions, the included variants should not be in the clinvar database in ANNOVAR and they will not be included. |
It appears that the bug is still present.Can you repeat it? Thanks a lot.
Error,
|
I remember clinivar no longer uses multiallelic variants so this step is no
longer needed. I just run 'vt decompose' myself and confirmed this (0
multiallelic variant).
Therefore, you can just directly run
prepare_annovar_user.pl -dbtype clinvar2 *input*.vcf -out
hg38_clinvar_20240917_raw.txt
I will update the documentation to reflect this.
…On Tue, Sep 24, 2024 at 2:12 AM xiucz ***@***.***> wrote:
@kaichop <https://github.com/kaichop>
It appears that the bug is still present.Can you repeat it? Thanks a lot.
version=20240902
wget -c ftp://ftp.ncbi.nlm.nih.gov/pub/clinvar/vcf_GRCh37/clinvar_${version}.vcf.gz
wget -c ftp://ftp.ncbi.nlm.nih.gov/pub/clinvar/vcf_GRCh37/clinvar_${version}.vcf.gz.tbi
~/vt/vt <ftp://ftp.ncbi.nlm.nih.gov/pub/clinvar/vcf_GRCh37/clinvar_$%7Bversion%7D.vcf.gz.tbi~/vt/vt> decompose clinvar_${version}.vcf.gz -o temp.split.vcf
perl ..//prepare_annovar_user.pl_20240924 -dbtype clinvar_preprocess2 temp.split.vcf -out temp.split2.vcf
Error,
Error: invalid record found in avinputfile: <ALLELEID=3416616;CLNHGVS=NC_000001.10:g.2488139G>A;CLNVC=single_nucleotide_variant;CLNVCSO=SO:0001483;GENEINFO=TNFRSF14:8764|TNFRSF14-AS1:115110;MC=SO:0001587|nonsense;ONC=Likely_oncogenic;ONCDISDB=Human_Phenotype_Ontology:HP:0002664,Human_Phenotype_Ontology:HP:0003008,Human_Phenotype_Ontology:HP:0006741,MONDO:MONDO:0005070,MeSH:D009369,MedGen:C0027651;ONCDN=Neoplasm;ONCREVSTAT=criteria_provided,_single_submitter;ORIGIN=2
> at ..//prepare_annovar_user.pl_20240924 line 301, <FH> line 9862.
—
Reply to this email directly, view it on GitHub
<#254 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABNG3OGIOHP5WM4B346USVTZYD7DJAVCNFSM6AAAAABNXWZW7SVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDGNZQGI3TOMRZGU>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
It works now. Please keep this issue open for a while for newcomers who might encounter the same problem. Thank you. |
Thank you for letting me know.
…On Thu, Sep 26, 2024 at 7:10 PM xiucz ***@***.***> wrote:
It works now. Please keep this issue open for a while for newcomers who
might encounter the same problem. Thank you.
—
Reply to this email directly, view it on GitHub
<#254 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABNG3OG23MOOI4PAFVFMDWDZYSH6DAVCNFSM6AAAAABNXWZW7SVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDGNZYGA4TSMZTGU>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
Dear Kai @kaichop, |
Did you follow instructions exactly?
https://annovar.openbioinformatics.org/en/latest/user-guide/filter/
tar xvf Cosmic_GenomeScreensMutant_Vcf_v100_GRCh38.tartar xvf
Cosmic_GenomeScreensMutant_Tsv_v100_GRCh38.targunzip
Cosmic_GenomeScreensMutant_v100_GRCh38.vcf.gzgunzip
Cosmic_GenomeScreensMutant_v100_GRCh38.tsv.gztar xvf
Cosmic_NonCodingVariants_Tsv_v100_GRCh38.tartar xvf
Cosmic_NonCodingVariants_Vcf_v100_GRCh38.targunzip
Cosmic_NonCodingVariants_v100_GRCh38.vcf.gzgunzip
Cosmic_NonCodingVariants_v100_GRCh38.tsv.gzecho -e
'#Chr\tStart\tEnd\tRef\tAlt\tCOSMIC100' >
hg38_cosmic100_raw.txtprepare_annovar_user.pl -dbtype cosmic
Cosmic_GenomeScreensMutant_v100_GRCh38.tsv -vcf
Cosmic_GenomeScreensMutant_v100_GRCh38.vcf >> hg38_cosmic100_raw.txt
prepare_annovar_user.pl -dbtype cosmic
Cosmic_NonCodingVariants_v100_GRCh38.tsv -vcf
Cosmic_NonCodingVariants_v100_GRCh38.vcf >> hg38_cosmic100_raw.txt
index_annovar.pl hg38_cosmic100_raw.txt -outfile hg38_cosmic100.txt wc
-l hg38_cosmic100.txt29837298 hg38_cosmic100.txt
…On Sat, Oct 19, 2024 at 11:20 AM Rita Lok Hay Yim ***@***.***> wrote:
i also enounter the same issue. Can someone advise how to fix?
sorry i tried the new script file for COSMIC100 but I still encounter
similar error:
prefield not defined (chr9 5022158 5022158 T A exonic JAK2 .
nonsynonymous_SNV
JAK2:NM_001322195:exon2:c.T171A:p.F57L,JAK2:NM_001322196:exon2:c.T171A:p.F57L,JAK2:NM_001322194:exon3:c.T171A:p.F57L,JAK2:NM_004972:exon3:c.T171A:p.F57L
9p24.1 . . . . . . . . . . . . . 0.313 0.13879 T 0.122 0.35710 T 0.012
0.16265 B 0.006 0.12133 B 0.000077 0.52346 D 0.137700 0.999386 0.46865 D
1.525 0.38595 L 0.87 0.46412 T -2.43 0.53258 N 0.549 0.57518 -1.0824
0.06890 T 0.065 0.26772 T 9 0.16422781 0.30730 T 0.008676 0.22927 T 0.127
0.34888 0.628 0.76366 0.5410184844 0.53754 0.4539470442581493 0.45312
0.213371676131 0.23852 0.609011411667 0.54175 T 0.616051 0.87899 D
-0.299865 0.08672 T -0.404083 0.32910 T 0.0968818226698108 0.12010 T
0.79482 0.43725 T 0.4743418 0.65556 0.18350086 0.41343 0.4743418 0.65557
0.18350086 0.41342 -6.855 0.52967 T 0.3691061075986174 0.46481 0.877
0.81319 P .\x3b. .\x3b. 1.575725 0.20171 14.62 0.87117516596749178 0.17000
0.79197 0.39199 D AEFDBI 0.154525 0.27957 N -0.494471416308901 0.22260
1.187424 -0.380256747560047 0.25586 1.40798 0.891253416725197 0.25855
0.653281 0.48532 0 0.653731 0.59785 0 0.547309 0.15389 0 0.669 0.65921 0 .
. 5.57 3.18 0.35615 1.125000 0.30946 . . 0.609000 0.47794 1.000000 0.71638
1.000000 0.68203 0.643000 0.32503 0.1318:0.1432:0.0:0.7249 5.189 0.14524
698 0.58074 FERM_domain|Band_4.1_domain\x3bFERM_domain|Band_4.1_domain . .
. . . . . . . . . . 1.883e-05 0 0 0.0003 0 0 0 0 1.368e-06 1.368e-06
1.361e-06 1.375e-06 5.038e-05 2.3e-07 9e-08 8.35e-06 3.12e-06 0 0 0
5.038e-05 0 0 0 0 0 . . . 0.002959 . . chr9 5022158 . T A . PASS
ADP=539;WT=0;HET=1;HOM=0;NC=0
GT:GQ:SDP:DP:RD:AD:FREQ:PVAL:RBQ:ABQ:RDF:RDR:ADF:ADR:AF
0/0:.:.:.:.:.:.:.:.:.:.:.:.:.:. 0/0:.:.:.:.:.:.:.:.:.:.:.:.:.:.
0/0:.:.:.:.:.:.:.:.:.:.:.:.:.:. 0/0:.:.:.:.:.:.:.:.:.:.:.:.:.:.
0/0:.:.:.:.:.:.:.:.:.:.:.:.:.:. 0/0:.:.:.:.:.:.:.:.:.:.:.:.:.:.
0/0:.:.:.:.:.:.:.:.:.:.:.:.:.:.
0/1:255:539:539:282:257:47.68%:8.9184E-96:35:32:98:184:100:157:0.476809
0/0:.:.:.:.:.:.:.:.:.:.:.:.:.:. 0/0:.:.:.:.:.:.:.:.:.:.:.:.:.:.
0/0:.:.:.:.:.:.:.:.:.:.:.:.:.:. 0/0:.:.:.:.:.:.:.:.:.:.:.:.:.:.
0/0:.:.:.:.:.:.:.:.:.:.:.:.:.:. 0/0:.:.:.:.:.:.:.:.:.:.:.:.:.:.
0/0:.:.:.:.:.:.:.:.:.:.:.:.:.:. 0/0:.:.:.:.:.:.:.:.:.:.:.:.:.:.
0/0:.:.:.:.:.:.:.:.:.:.:.:.:.:. 0/0:.:.:.:.:.:.:.:.:.:.:.:.:.:. with
field=408 and prefield=390 at /d1/software/annovar/table_annovar.pl line
186, line 9.
—
Reply to this email directly, view it on GitHub
<#254 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABNG3OF3HFMEAS6FN34E263Z4J2D7AVCNFSM6AAAAABNXWZW7SVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDIMRTHE3DCNZZGM>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
what do you mean "I still encounter similar error"? did not you already
generate the correct file with the correct number of lines?
…On Sat, Oct 19, 2024 at 11:25 AM Rita Lok Hay Yim ***@***.***> wrote:
Dear Kai @kaichop <https://github.com/kaichop>,
Thanks a lot for updating. i tried the new script file for COSMIC100
cruated from Cosmic_GenomeScreensMutant_Vcf_v100_GRCh38.tar and
Cosmic_GenomeScreensMutant_Tsv_v100_GRCh38.tar. For your information, the
output file hg38_cosmic100.txt is 29837298 line as tutorial. However, I
still encounter similar error and hope you or someone can advise on the
issue:
prefield not defined (chr9 5022158 5022158 T A exonic JAK2 .
nonsynonymous_SNV
JAK2:NM_001322195:exon2:c.T171A:p.F57L,JAK2:NM_001322196:exon2:c.T171A:p.F57L,JAK2:NM_001322194:exon3:c.T171A:p.F57L,JAK2:NM_004972:exon3:c.T171A:p.F57L
9p24.1 . . . . . . . . . . . . . 0.313 0.13879 T 0.122 0.35710 T 0.012
0.16265 B 0.006 0.12133 B 0.000077 0.52346 D 0.137700 0.999386 0.46865 D
1.525 0.38595 L 0.87 0.46412 T -2.43 0.53258 N 0.549 0.57518 -1.0824
0.06890 T 0.065 0.26772 T 9 0.16422781 0.30730 T 0.008676 0.22927 T 0.127
0.34888 0.628 0.76366 0.5410184844 0.53754 0.4539470442581493 0.45312
0.213371676131 0.23852 0.609011411667 0.54175 T 0.616051 0.87899 D
-0.299865 0.08672 T -0.404083 0.32910 T 0.0968818226698108 0.12010 T
0.79482 0.43725 T 0.4743418 0.65556 0.18350086 0.41343 0.4743418 0.65557
0.18350086 0.41342 -6.855 0.52967 T 0.3691061075986174 0.46481 0.877
0.81319 P .\x3b. .\x3b. 1.575725 0.20171 14.62 0.87117516596749178 0.17000
0.79197 0.39199 D AEFDBI 0.154525 0.27957 N -0.494471416308901 0.22260
1.187424 -0.380256747560047 0.25586 1.40798 0.891253416725197 0.25855
0.653281 0.48532 0 0.653731 0.59785 0 0.547309 0.15389 0 0.669 0.65921 0 .
. 5.57 3.18 0.35615 1.125000 0.30946 . . 0.609000 0.47794 1.000000 0.71638
1.000000 0.68203 0.643000 0.32503 0.1318:0.1432:0.0:0.7249 5.189 0.14524
698 0.58074 FERM_domain|Band_4.1_domain\x3bFERM_domain|Band_4.1_domain . .
. . . . . . . . . . 1.883e-05 0 0 0.0003 0 0 0 0 1.368e-06 1.368e-06
1.361e-06 1.375e-06 5.038e-05 2.3e-07 9e-08 8.35e-06 3.12e-06 0 0 0
5.038e-05 0 0 0 0 0 . . . 0.002959 . . chr9 5022158 . T A . PASS
ADP=539;WT=0;HET=1;HOM=0;NC=0
GT:GQ:SDP:DP:RD:AD:FREQ:PVAL:RBQ:ABQ:RDF:RDR:ADF:ADR:AF
0/0:.:.:.:.:.:.:.:.:.:.:.:.:.:. 0/0:.:.:.:.:.:.:.:.:.:.:.:.:.:.
0/0:.:.:.:.:.:.:.:.:.:.:.:.:.:. 0/0:.:.:.:.:.:.:.:.:.:.:.:.:.:.
0/0:.:.:.:.:.:.:.:.:.:.:.:.:.:. 0/0:.:.:.:.:.:.:.:.:.:.:.:.:.:.
0/0:.:.:.:.:.:.:.:.:.:.:.:.:.:.
0/1:255:539:539:282:257:47.68%:8.9184E-96:35:32:98:184:100:157:0.476809
0/0:.:.:.:.:.:.:.:.:.:.:.:.:.:. 0/0:.:.:.:.:.:.:.:.:.:.:.:.:.:.
0/0:.:.:.:.:.:.:.:.:.:.:.:.:.:. 0/0:.:.:.:.:.:.:.:.:.:.:.:.:.:.
0/0:.:.:.:.:.:.:.:.:.:.:.:.:.:. 0/0:.:.:.:.:.:.:.:.:.:.:.:.:.:.
0/0:.:.:.:.:.:.:.:.:.:.:.:.:.:. 0/0:.:.:.:.:.:.:.:.:.:.:.:.:.:.
0/0:.:.:.:.:.:.:.:.:.:.:.:.:.:. 0/0:.:.:.:.:.:.:.:.:.:.:.:.:.:. with
field=408 and prefield=390 at /d1/software/annovar/table_annovar.pl line
186, line 9.
—
Reply to this email directly, view it on GitHub
<#254 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABNG3OABDDZPSL37LUNTTGLZ4J2XFAVCNFSM6AAAAABNXWZW7SVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDIMRTHE3DCNZZGM>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
Thank you for your fast response, I did successfully generated hg38_cosmic100.txt with the same number of lines as your instruction. Just when I run the ANNOVAR to annotate my vcf, I get an error which casued premature termination of the process. saying the one of the variant that my in vcf has "prefield not defined" |
so do you mean you have a problem when annotating a mutation " chr9
5022158", not that you have a problem generating the cosmic annotation
database using prepare_annovar_user?
Can you try use only the cosmic database in the annotation (it seems that
you have many columns in the output) to insulate the problem to be for this
particular mutation for this particular annotation dbtype?
…On Sat, Oct 19, 2024 at 11:34 AM Rita Lok Hay Yim ***@***.***> wrote:
Thank you for your fast response, I did successfully generated
hg38_cosmic100.txt with the same number of lines as your instruction. Just
when I run the ANNOVAR to annotate my vcf, I get an error which casued
premature termination of the process. saying the one of the variant that my
in vcf has "prefield not defined"
However i don't see this error when I re-run the same VCF with my old
hg38_cosmic98_coding.txt
—
Reply to this email directly, view it on GitHub
<#254 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABNG3OASD4FY34YB2YJ4PQDZ4J3XHAVCNFSM6AAAAABNXWZW7SVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDIMRTHE3TMMZWGM>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
Yes. you are correct I have problem annotating one line of variants. I just retried using just refGene and COSMIC and i still get the error. My many columns comes from the FORMAT field which is a multi-sample vcf with around 15 subfields. Interestingly, I notice the line after this failing line is a variant that COSMIC100 has entry. And the information has spanned to other colmuns (I have uploaded here for your reference). |
Dear Prof. Wang!
I try to update to ANNOVAR, but it gives error:
Not every record has an
CLNDN=
tag. Could you update prepare_annovar_user.pl to handle the clinvar_20240902.vcf.gz?Best,
xiucz
The text was updated successfully, but these errors were encountered: