Skip to content

Commit

Permalink
Merge pull request #92 from leexgh/mutation-assessor-v4
Browse files Browse the repository at this point in the history
Mutation assessor v4
  • Loading branch information
leexgh authored Aug 21, 2024
2 parents 1a12a21 + 67a106f commit cf2941c
Show file tree
Hide file tree
Showing 4 changed files with 28 additions and 17 deletions.
2 changes: 1 addition & 1 deletion data/grch37_ensembl92/export/annotation_version.txt
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,6 @@ HGNC 2023-10 mirrored hgnc The resource for approved human gene nomenclature. Ge
Cancer Hotspots v2 mirrored cancer_hotspots A resource for statistically significant mutations in cancer https://www.cancerhotspots.org
3D Hotspots v2 mirrored 3d_hotspots A resource for statistically significant mutations clustering in 3d protein structures in cancer https://www.3dhotspots.org/
reVUE https://github.com/knowledgesystems/reVUE-data/blob/main/VUEs.json mirrored revue A Repository for Variants with Unexpected Effects (VUE) in Cancer https://www.cancerrevue.org/
Mutation Assessor v3 mirrored mutation_assessor Mutation Assessor predicts the functional impact of amino-acid substitutions in proteins, such as mutations discovered in cancer or missense polymorphisms. http://mutationassessor.org/r3/
Mutation Assessor v4 mirrored mutation_assessor Mutation Assessor predicts the functional impact of amino-acid substitutions in proteins, such as mutations discovered in cancer or missense polymorphisms. http://mutationassessor.org/r3/
My Variant Info Includes many annotation sources, see https://docs.myvariant.info/en/latest/doc/data.html external my_variant_info MyVariant.info provides simple-to-use REST web services to query/retrieve variant annotation data, aggregated from many popular data resources. https://myvariant.info
ClinVar 20230722 mirrored clinvar ClinVar aggregates information about genomic variation and its relationship to human health. https://www.ncbi.nlm.nih.gov/clinvar/
8 changes: 8 additions & 0 deletions data/grch38_ensembl92/input/annotation_version.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
name version type id description url
VEP grch38 mirrored vep VEP determines the effect of your variants(SNPs, insertions, deletions, CNVs or structural variants) on genes, transcripts, and protein sequence, as well as regulatory regions. https://useast.ensembl.org/info/docs/tools/vep/index.html
HGNC 2023-10 mirrored hgnc The resource for approved human gene nomenclature. Genome Nexus uses HGNC gene symbols in annotation http://ftp.ebi.ac.uk/pub/databases/genenames/hgnc/archive/monthly/tsv/
Cancer Hotspots v2 mirrored cancer_hotspots A resource for statistically significant mutations in cancer https://www.cancerhotspots.org
3D Hotspots v2 mirrored 3d_hotspots A resource for statistically significant mutations clustering in 3d protein structures in cancer https://www.3dhotspots.org/
Mutation Assessor v4 mirrored mutation_assessor Mutation Assessor predicts the functional impact of amino-acid substitutions in proteins, such as mutations discovered in cancer or missense polymorphisms. http://mutationassessor.org/r3/
My Variant Info Includes many annotation sources, see https://docs.myvariant.info/en/latest/doc/data.html external my_variant_info MyVariant.info provides simple-to-use REST web services to query/retrieve variant annotation data, aggregated from many popular data resources. https://myvariant.info
ClinVar 20230722 mirrored clinvar ClinVar aggregates information about genomic variation and its relationship to human health. https://www.ncbi.nlm.nih.gov/clinvar/
1 change: 1 addition & 0 deletions data/grch38_ensembl95/export/annotation_version.txt
Original file line number Diff line number Diff line change
Expand Up @@ -3,5 +3,6 @@ VEP grch38 mirrored vep VEP determines the effect of your variants(SNPs, inserti
HGNC 2023-10 mirrored hgnc The resource for approved human gene nomenclature. Genome Nexus uses HGNC gene symbols in annotation http://ftp.ebi.ac.uk/pub/databases/genenames/hgnc/archive/monthly/tsv/
Cancer Hotspots v2 mirrored cancer_hotspots A resource for statistically significant mutations in cancer https://www.cancerhotspots.org
3D Hotspots v2 mirrored 3d_hotspots A resource for statistically significant mutations clustering in 3d protein structures in cancer https://www.3dhotspots.org/
Mutation Assessor v4 mirrored mutation_assessor Mutation Assessor predicts the functional impact of amino-acid substitutions in proteins, such as mutations discovered in cancer or missense polymorphisms. http://mutationassessor.org/r3/
My Variant Info Includes many annotation sources, see https://docs.myvariant.info/en/latest/doc/data.html external my_variant_info MyVariant.info provides simple-to-use REST web services to query/retrieve variant annotation data, aggregated from many popular data resources. https://myvariant.info
ClinVar 20230722 mirrored clinvar ClinVar aggregates information about genomic variation and its relationship to human health. https://www.ncbi.nlm.nih.gov/clinvar/
34 changes: 18 additions & 16 deletions scripts/import_mongo.sh
Original file line number Diff line number Diff line change
Expand Up @@ -52,22 +52,24 @@ elif [[ ${REF_ENSEMBL_VERSION} == *"grch38"* ]]; then
fi

# import mutation assessor
if [[ ${REF_ENSEMBL_VERSION} == *"grch37"* && ${MUTATIONASSESSOR} == true ]]; then
echo "Downloading Mutation assessor data"

curl http://mutationassessor.org/r3/MA_scores_rel3_hg19_full.tar.gz -o ${DIR}/../data/common_input/MA_scores_rel3_hg19_full.tar.gz

echo "Extracting Mutation assessor data"
tar -xvf ${DIR}/../data/common_input/MA_scores_rel3_hg19_full.tar.gz
rm ${DIR}/../data/common_input/MA_scores_rel3_hg19_full.tar.gz

echo "Transforming Mutation assessor data"
sed -i -e 's/"Mutation","RefGenome variant","Gene","Uniprot","Info","Uniprot variant","Func. Impact","FI score"/_id,rgaa,gene,uprot,info,var,F_impact,F_score/g' MA_scores_rel3_hg19_full/MA_scores_rel3_hg19_chr*
sed -i -e 's/hg19,//g' MA_scores_rel3_hg19_full/MA_scores_rel3_hg19_chr*

echo "Importing Mutation assessor data"
for filename in MA_scores_rel3_hg19_full/*.csv; do import mutation_assessor.annotation $filename '--type csv --headerline' && rm $filename; done
fi
echo "Downloading Mutation assessor data"
# Data source: https://drive.google.com/file/d/1V6r65xJFF5fJ7b9JHwqkvCe8wWDrIBhd/view.
# The copy is stored in S3 bucket: ttps://genome-nexus-static-data.s3.amazonaws.com/mutationassessor4_for_genome_nexus.tsv.xz
curl https://genome-nexus-static-data.s3.amazonaws.com/mutationassessor4_for_genome_nexus.tsv.xz -o ${DIR}/../data/common_input/mutationassessor4_for_genome_nexus.tsv.xz
echo "Download completed."

echo "Extracting Mutation assessor data"
unxz -k ${DIR}/../data/common_input/mutationassessor4_for_genome_nexus.tsv.xz

echo "Transforming Mutation assessor data"
sed -i '' 's/uniprotId\tSV\thgvspShort\tF_score\tF_impact\tMSA\tMAV/uniprotId\tsv\thgvspShort\tf_score\tf_impact\tmsa\tmav/' ${DIR}/../data/common_input/mutationassessor4_for_genome_nexus.tsv
awk -F'\t' 'BEGIN{OFS="\t"} NR==1{print "_id",$0; next} {print $1","$3,$0}' ${DIR}/../data/common_input/mutationassessor4_for_genome_nexus.tsv > processed_mutaiton_assessor_tsv_file.tsv
rm ${DIR}/../data/common_input/mutationassessor4_for_genome_nexus.tsv.xz
rm ${DIR}/../data/common_input/mutationassessor4_for_genome_nexus.tsv

echo "Importing Mutation assessor data"
import mutation_assessor.annotation processed_mutaiton_assessor_tsv_file.tsv "--type tsv --headerline"
rm processed_mutaiton_assessor_tsv_file.tsv

# import annotation sources version
import version ${DIR}/../data/${REF_ENSEMBL_VERSION}/export/annotation_version.txt '--drop --type tsv --headerline'

0 comments on commit cf2941c

Please sign in to comment.