neoantigen prediction: HG38 reference

v1.2

Song Cao

Pipeline for detecting neoantigen from snvs and indels

vcf file for snvs with columns: chromosome, start position, ref allele, alt allele, gene hugo symbol, HGSV short, is it somatic or germline mutation. Filename: .snp.vcf

 1       113854971       C       G       PTPN22  p.E207Q Somatic

 1       113900168       C       A       AP4B1   p.A284S Somatic

 1       117623561       C       G       FAM46C  p.I231M Somatic

vcf file for indels with the same columns. Filename .indel.vcf

 3       161086280       T       -       B3GALNT1        p.T159Pfs*8     Somatic

RNA-Seq bam or fastaq file for HLA type

All three input files should be in one folder. One set of files per sample

Command to run:

perl neoscan.pl --rdir --log --bamfq --bed --rna --refdir --step <step_number>

    <rdir> = full path of the folder holding files for this sequence run

    <log> = full path of the folder saving log files

    <bamfq> = 1, input is bam; 0, input is fastq: default 1

    <rna> =1, input data is rna, otherwise is dna

    <bed> = bed file for annotation: ensembl: /gscmnt/gc2518/dinglab/scao/db/ensembl38.85/proteome-first.bed

     refseq: /gscmnt/gc2518/dinglab/scao/db/refseq_hg38_june29/proteome.bed

    <refdir> = ref directory: /gscmnt/gc2518/dinglab/scao/db/refseq_hg38_june29

    <step_number> run this pipeline step by step. (running the whole pipeline if step number is 0)

Help:

perl neoscan.pl

Copy the tool to a folder. This command will create neoscan/ folder in your current folder:

git clone https://github.com/ding-lab/neoscan.git

This is required to be able to do git checkout, Needed just once:

LSF_DOCKER_PRESERVE_ENVIRONMENT=true bsub -Is -R "select[mem>15000] rusage[mem=15000]" -M 32000000 -q docker-interactive -a "docker(scao/dailybox)" /bin/bash

Switch version to v1.2:

git checkout v1.2

Make conda environment:

conda create -n neoantigen optitype=1.3.1 python=2.7

Copy the path to the OptiPathPipeline.py to the neoscan.pl

which OptiTypePipeline.py and copy the output into the neoscan.pl script

After you prepare vcf files, run this:

perl /gscmnt/gc2524/dinglab/akarpova/software/neoscan/neoscan.pl --rdir /gscmnt/gc2524/dinglab/akarpova/cptac3/CCRCC_neoscan/samples --log /gscmnt/gc2524/dinglab/akarpova/cptac3 --bamfq 1 --bed /gscmnt/gc2518/dinglab/scao/db/refseq_hg38_june29/proteome.bed --rna 1 --refdir /gscmnt/gc2518/dinglab/scao/db/refseq_hg38_june29 --step 1

Then change --step 2/3/4/5

Don't run step6, step5 will give the final results

SAMPLE.neo.snv.summary

SAMPLE.neo.indel.summary

Then you may want to filter out peptides found in human cells in general. Just grep every single peptide in this database /gscmnt/gc2518/dinglab/scao/db/ensembl38.85/Homo_sapiens.GRCh38.pep.all.fa.cleaned.fa

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
README.md		README.md
extract_novel_sequence.pl		extract_novel_sequence.pl
extract_reads_supporting_ref.pl		extract_reads_supporting_ref.pl
extract_support_reads_from_novel_sequence.pl		extract_support_reads_from_novel_sequence.pl
fasta_seq_for_indel_using_refseq_bed.pl		fasta_seq_for_indel_using_refseq_bed.pl
fasta_seq_for_snv_using_refseq_bed.pl		fasta_seq_for_snv_using_refseq_bed.pl
filter_reads_in_cdna.pl		filter_reads_in_cdna.pl
find_first_diff_pos_mut_wt_indel.pl		find_first_diff_pos_mut_wt_indel.pl
find_first_diff_pos_mut_wt_snv.pl		find_first_diff_pos_mut_wt_snv.pl
generate_mut_peptide_indel.pl		generate_mut_peptide_indel.pl
generate_mut_peptide_snv.pl		generate_mut_peptide_snv.pl
generate_report_summary.pl		generate_report_summary.pl
get_min_result.pl		get_min_result.pl
get_perferct_mapped_reads.pl		get_perferct_mapped_reads.pl
guess-encoded.py		guess-encoded.py
neoscan.pl		neoscan.pl
neoscan.v1.1.pl		neoscan.v1.1.pl
parseHLAresult.pl		parseHLAresult.pl
parseNetMHC4result.pl		parseNetMHC4result.pl
protein_seq_for_indel_using_refseq_bed.pl		protein_seq_for_indel_using_refseq_bed.pl
protein_seq_for_snv_using_refseq_bed.pl		protein_seq_for_snv_using_refseq_bed.pl
protein_seq_for_snv_using_refseq_bed_orig.pl		protein_seq_for_snv_using_refseq_bed_orig.pl
q20_filter.pl		q20_filter.pl
quality_sanger.table		quality_sanger.table
quality_sanger.table.2col.tsv		quality_sanger.table.2col.tsv
remove_duplicate_mut_peptide_snv.pl		remove_duplicate_mut_peptide_snv.pl
reportSummary.pl		reportSummary.pl
reportSummary_ns.pl		reportSummary_ns.pl
runNetMHC4.py		runNetMHC4.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

neoantigen prediction: HG38 reference

v1.2

Song Cao

About

Releases

Packages

Languages

caosgc33d/neoscan

Folders and files

Latest commit

History

Repository files navigation

neoantigen prediction: HG38 reference

v1.2

Song Cao

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages