-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
Showing
280 changed files
with
76,945 additions
and
0 deletions.
There are no files selected for viewing
Large diffs are not rendered by default.
Oops, something went wrong.
Binary file not shown.
Binary file not shown.
Binary file not shown.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,31 @@ | ||
##fileformat=VCFv4.2 | ||
##source=cuteSV-2.1.1 | ||
##fileDate=2024-08-13 17:27:39 2-UTC | ||
##contig=<ID=chr17,length=7843138> | ||
##contig=<ID=chr18,length=8823530> | ||
##contig=<ID=hpv16,length=7904> | ||
##ALT=<ID=INS,Description="Insertion of novel sequence relative to the reference"> | ||
##ALT=<ID=DEL,Description="Deletion relative to the reference"> | ||
##ALT=<ID=DUP,Description="Region of elevated copy number relative to the reference"> | ||
##ALT=<ID=INV,Description="Inversion of reference sequence"> | ||
##ALT=<ID=BND,Description="Breakend of translocation"> | ||
##INFO=<ID=PRECISE,Number=0,Type=Flag,Description="Precise structural variant"> | ||
##INFO=<ID=IMPRECISE,Number=0,Type=Flag,Description="Imprecise structural variant"> | ||
##INFO=<ID=SVTYPE,Number=1,Type=String,Description="Type of structural variant"> | ||
##INFO=<ID=SVLEN,Number=1,Type=Integer,Description="Difference in length between REF and ALT alleles"> | ||
##INFO=<ID=CHR2,Number=1,Type=String,Description="Chromosome for END coordinate in case of a translocation"> | ||
##INFO=<ID=END,Number=1,Type=Integer,Description="End position of the variant described in this record"> | ||
##INFO=<ID=CIPOS,Number=2,Type=Integer,Description="Confidence interval around POS for imprecise variants"> | ||
##INFO=<ID=CILEN,Number=2,Type=Integer,Description="Confidence interval around inserted/deleted material between breakends"> | ||
##INFO=<ID=RE,Number=1,Type=Integer,Description="Number of read support this record"> | ||
##INFO=<ID=STRAND,Number=A,Type=String,Description="Strand orientation of the adjacency in BEDPE format (DEL:+-, DUP:-+, INV:++/--)"> | ||
##INFO=<ID=RNAMES,Number=.,Type=String,Description="Supporting read names of SVs (comma separated)"> | ||
##INFO=<ID=AF,Number=A,Type=Float,Description="Allele Frequency."> | ||
##FILTER=<ID=q5,Description="Quality below 5"> | ||
##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype"> | ||
##FORMAT=<ID=DR,Number=1,Type=Integer,Description="# High-quality reference reads"> | ||
##FORMAT=<ID=DV,Number=1,Type=Integer,Description="# High-quality variant reads"> | ||
##FORMAT=<ID=PL,Number=G,Type=Integer,Description="# Phred-scaled genotype likelihoods rounded to the closest integer"> | ||
##FORMAT=<ID=GQ,Number=1,Type=Integer,Description="# Genotype quality"> | ||
##CommandLine="cuteSV --threads 2 --sample sample001normal --max_cluster_bias_INS 1000 --diff_ratio_merging_INS 0.9 --max_cluster_bias_DEL 1000 --diff_ratio_merging_DEL 0.5 --min_support 3 --min_mapq 20 --min_size 30 --max_size -1 --report_readid --genotype sample001normal_minimap2_mdtagged_sorted.bam hg38_chr17_1-8M_chr18_1-9M_hpv16.fa.gz sample001normal_cutesv.vcf sample001normal_cutesv_output/" | ||
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT sample001normal |
Large diffs are not rendered by default.
Oops, something went wrong.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,19 @@ | ||
##fileformat=VCFv4.2 | ||
##FILTER=<ID=PASS,Description="All filters passed"> | ||
##FILTER=<ID=RefCall,Description="Genotyping model thinks this site is reference."> | ||
##FILTER=<ID=LowQual,Description="Confidence in this variant being real is below calling threshold."> | ||
##FILTER=<ID=NoCall,Description="Site has depth=0 resulting in no call."> | ||
##INFO=<ID=END,Number=1,Type=Integer,Description="End position (for use with symbolic alleles)"> | ||
##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype"> | ||
##FORMAT=<ID=GQ,Number=1,Type=Integer,Description="Conditional genotype quality"> | ||
##FORMAT=<ID=DP,Number=1,Type=Integer,Description="Read depth"> | ||
##FORMAT=<ID=MIN_DP,Number=1,Type=Integer,Description="Minimum DP observed within the GVCF block."> | ||
##FORMAT=<ID=AD,Number=R,Type=Integer,Description="Read depth for each allele"> | ||
##FORMAT=<ID=VAF,Number=A,Type=Float,Description="Variant allele fractions."> | ||
##FORMAT=<ID=PL,Number=G,Type=Integer,Description="Phred-scaled genotype likelihoods rounded to the closest integer"> | ||
##FORMAT=<ID=MED_DP,Number=1,Type=Integer,Description="Median DP observed within the GVCF block rounded to the nearest integer."> | ||
##DeepVariant_version=1.6.0 | ||
##contig=<ID=chr17,length=7843138> | ||
##contig=<ID=chr18,length=8823530> | ||
##contig=<ID=hpv16,length=7904> | ||
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT sample001normal |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,29 @@ | ||
##fileformat=VCFv4.2 | ||
##fileDate=2024-08-15T05:16:25.545Z | ||
##source=pbsv 2.9.0 (commit v2.9.0-2-gce1559a) | ||
##PG="pbsv call --num-threads 2 --ccs --call-min-reads-per-strand-all-samples 0 --call-min-read-perc-one-sample 10 --call-min-reads-all-samples 3 --call-min-reads-one-sample 3 /Users/leework/Documents/Research/projects/project_nexus/nexus/test/data/fasta/hg38_chr17_1-8M_chr18_1-9M_hpv16.fa sample001normal_pbsv.svsig.gz sample001normal_pbsv.vcf" | ||
##INFO=<ID=SVTYPE,Number=1,Type=String,Description="Type of structural variant"> | ||
##INFO=<ID=END,Number=1,Type=Integer,Description="End position of the structural variant described in this record"> | ||
##INFO=<ID=SVLEN,Number=.,Type=Integer,Description="Difference in length between REF and ALT alleles"> | ||
##INFO=<ID=SVANN,Number=.,Type=String,Description="Repeat annotation of structural variant"> | ||
##INFO=<ID=CIPOS,Number=2,Type=Integer,Description="Confidence interval around POS for imprecise variants"> | ||
##INFO=<ID=MATEID,Number=.,Type=String,Description="ID of mate breakends"> | ||
##INFO=<ID=MATEDIST,Number=1,Type=Integer,Description="Distance to the mate breakend for mates on the same contig"> | ||
##INFO=<ID=IMPRECISE,Number=0,Type=Flag,Description="Imprecise structural variation"> | ||
##ALT=<ID=INV,Description="Inversion"> | ||
##ALT=<ID=DUP,Description="Duplication"> | ||
##FILTER=<ID=Decoy,Description="Variant involves a decoy sequence"> | ||
##FILTER=<ID=NearReferenceGap,Description="Variant is near (< 1000 bp) from a gap (run of >= 50 Ns) in the reference assembly"> | ||
##FILTER=<ID=NearContigEnd,Description="Variant is near (< 1000 bp) from the end of a contig"> | ||
##FILTER=<ID=InsufficientStrandEvidence,Description="Variant has insufficient number of reads per strand (< 0)."> | ||
##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype"> | ||
##FORMAT=<ID=AD,Number=R,Type=Integer,Description="Read depth per allele"> | ||
##FORMAT=<ID=DP,Number=1,Type=Integer,Description="Read depth at this position for this sample"> | ||
##FORMAT=<ID=SAC,Number=.,Type=Integer,Description="Number of reads on the forward and reverse strand supporting each allele including reference"> | ||
##FORMAT=<ID=CN,Number=1,Type=Integer,Description="Copy number genotype for imprecise events"> | ||
##INFO=<ID=NotFullySpanned,Number=0,Type=Flag,Description="Duplication variant does not have any fully spanning reads"> | ||
##reference=file:///Users/leework/Documents/Research/projects/project_nexus/nexus/test/data/fasta/hg38_chr17_1-8M_chr18_1-9M_hpv16.fa | ||
##contig=<ID=chr17,length=7843138> | ||
##contig=<ID=chr18,length=8823530> | ||
##contig=<ID=hpv16,length=7904> | ||
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT sample001normal |
Binary file not shown.
Binary file not shown.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,51 @@ | ||
##fileformat=VCFv4.2 | ||
##source=Sniffles2_2.3.3 | ||
##command="/opt/conda/bin/sniffles --input sample001normal_minimap2_mdtagged_sorted.bam --vcf sample001normal_sniffles2.vcf --snf sample001normal_sniffles2.snf --sample-id sample001normal --reference hg38_chr17_1-8M_chr18_1-9M_hpv16.fa.gz --threads 2 --minsupport 3 --minsvlen 30 --mapq 20 --output-rnames" | ||
##fileDate="2024/08/15 06:09:52" | ||
##contig=<ID=chr17,length=7843138> | ||
##contig=<ID=chr18,length=8823530> | ||
##contig=<ID=hpv16,length=7904> | ||
##ALT=<ID=INS,Description="Insertion"> | ||
##ALT=<ID=DEL,Description="Deletion"> | ||
##ALT=<ID=DUP,Description="Duplication"> | ||
##ALT=<ID=INV,Description="Inversion"> | ||
##ALT=<ID=BND,Description="Breakend; Translocation"> | ||
##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype"> | ||
##FORMAT=<ID=GQ,Number=1,Type=Integer,Description="Genotype quality"> | ||
##FORMAT=<ID=DR,Number=1,Type=Integer,Description="Number of reference reads"> | ||
##FORMAT=<ID=DV,Number=1,Type=Integer,Description="Number of variant reads"> | ||
##FORMAT=<ID=ID,Number=1,Type=String,Description="Individual sample SV ID for multi-sample output"> | ||
##FILTER=<ID=PASS,Description="All filters passed"> | ||
##FILTER=<ID=GT,Description="Genotype filter"> | ||
##FILTER=<ID=SUPPORT_MIN,Description="Minimum read support filter"> | ||
##FILTER=<ID=STDEV_POS,Description="SV Breakpoint standard deviation filter"> | ||
##FILTER=<ID=STDEV_LEN,Description="SV length standard deviation filter"> | ||
##FILTER=<ID=COV_MIN,Description="Minimum coverage filter"> | ||
##FILTER=<ID=COV_CHANGE,Description="Coverage change filter"> | ||
##FILTER=<ID=COV_CHANGE_FRAC,Description="Coverage fractional change filter"> | ||
##FILTER=<ID=MOSAIC_AF,Description="Mosaic maximum allele frequency filter"> | ||
##FILTER=<ID=ALN_NM,Description="Length adjusted mismatch filter"> | ||
##FILTER=<ID=STRAND,Description="Strand support filter"> | ||
##FILTER=<ID=SVLEN_MIN,Description="SV length filter"> | ||
##INFO=<ID=PRECISE,Number=0,Type=Flag,Description="Structural variation with precise breakpoints"> | ||
##INFO=<ID=IMPRECISE,Number=0,Type=Flag,Description="Structural variation with imprecise breakpoints"> | ||
##INFO=<ID=MOSAIC,Number=0,Type=Flag,Description="Structural variation classified as putative mosaic"> | ||
##INFO=<ID=SVLEN,Number=1,Type=Integer,Description="Length of structural variation"> | ||
##INFO=<ID=SVTYPE,Number=1,Type=String,Description="Type of structural variation"> | ||
##INFO=<ID=CHR2,Number=1,Type=String,Description="Mate chromsome for BND SVs"> | ||
##INFO=<ID=SUPPORT,Number=1,Type=Integer,Description="Number of reads supporting the structural variation"> | ||
##INFO=<ID=SUPPORT_INLINE,Number=1,Type=Integer,Description="Number of reads supporting an INS/DEL SV (non-split events only)"> | ||
##INFO=<ID=SUPPORT_LONG,Number=1,Type=Integer,Description="Number of soft-clipped reads putatively supporting the long insertion SV"> | ||
##INFO=<ID=END,Number=1,Type=Integer,Description="End position of structural variation"> | ||
##INFO=<ID=STDEV_POS,Number=1,Type=Float,Description="Standard deviation of structural variation start position"> | ||
##INFO=<ID=STDEV_LEN,Number=1,Type=Float,Description="Standard deviation of structural variation length"> | ||
##INFO=<ID=COVERAGE,Number=.,Type=Float,Description="Coverages near upstream, start, center, end, downstream of structural variation"> | ||
##INFO=<ID=STRAND,Number=1,Type=String,Description="Strands of supporting reads for structural variant"> | ||
##INFO=<ID=AC,Number=.,Type=Integer,Description="Allele count, summed up over all samples"> | ||
##INFO=<ID=SUPP_VEC,Number=1,Type=String,Description="List of read support for all samples"> | ||
##INFO=<ID=CONSENSUS_SUPPORT,Number=1,Type=Integer,Description="Number of reads that support the generated insertion (INS) consensus sequence"> | ||
##INFO=<ID=RNAMES,Number=.,Type=String,Description="Names of supporting reads (if enabled with --output-rnames)"> | ||
##INFO=<ID=AF,Number=1,Type=Float,Description="Allele Frequency"> | ||
##INFO=<ID=NM,Number=.,Type=Float,Description="Mean number of query alignment length adjusted mismatches of supporting reads"> | ||
##INFO=<ID=PHASE,Number=.,Type=String,Description="Phasing information derived from supporting reads, represented as list of: HAPLOTYPE,PHASESET,HAPLOTYPE_SUPPORT,PHASESET_SUPPORT,HAPLOTYPE_FILTER,PHASESET_FILTER"> | ||
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT sample001normal |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,32 @@ | ||
##fileformat=VCFv4.2 | ||
##fileDate=2024-08-15|03:10:49PM|UTC|+0000 | ||
##source=SVIM-v2.0.0 | ||
##contig=<ID=chr17,length=7843138> | ||
##contig=<ID=chr18,length=8823530> | ||
##contig=<ID=hpv16,length=7904> | ||
##ALT=<ID=DEL,Description="Deletion"> | ||
##ALT=<ID=INV,Description="Inversion"> | ||
##ALT=<ID=DUP,Description="Duplication"> | ||
##ALT=<ID=DUP:TANDEM,Description="Tandem Duplication"> | ||
##ALT=<ID=DUP:INT,Description="Interspersed Duplication"> | ||
##ALT=<ID=INS,Description="Insertion"> | ||
##ALT=<ID=BND,Description="Breakend"> | ||
##INFO=<ID=SVTYPE,Number=1,Type=String,Description="Type of structural variant"> | ||
##INFO=<ID=CUTPASTE,Number=0,Type=Flag,Description="Genomic origin of interspersed duplication seems to be deleted"> | ||
##INFO=<ID=END,Number=1,Type=Integer,Description="End position of the variant described in this record"> | ||
##INFO=<ID=SVLEN,Number=1,Type=Integer,Description="Difference in length between REF and ALT alleles"> | ||
##INFO=<ID=SUPPORT,Number=1,Type=Integer,Description="Number of reads supporting this variant"> | ||
##INFO=<ID=STD_SPAN,Number=1,Type=Float,Description="Standard deviation in span of merged SV signatures"> | ||
##INFO=<ID=STD_POS,Number=1,Type=Float,Description="Standard deviation in position of merged SV signatures"> | ||
##INFO=<ID=STD_POS1,Number=1,Type=Float,Description="Standard deviation of breakend 1 position"> | ||
##INFO=<ID=STD_POS2,Number=1,Type=Float,Description="Standard deviation of breakend 2 position"> | ||
##INFO=<ID=SEQS,Number=.,Type=String,Description="Insertion sequences from all supporting reads"> | ||
##INFO=<ID=READS,Number=.,Type=String,Description="Names of all supporting reads"> | ||
##INFO=<ID=ZMWS,Number=1,Type=Integer,Description="Number of supporting ZMWs (PacBio only)"> | ||
##FILTER=<ID=hom_ref,Description="Genotype is homozygous reference"> | ||
##FILTER=<ID=not_fully_covered,Description="Tandem duplication is not fully covered by a single read"> | ||
##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype"> | ||
##FORMAT=<ID=DP,Number=1,Type=Integer,Description="Read depth"> | ||
##FORMAT=<ID=AD,Number=R,Type=Integer,Description="Read depth for each allele"> | ||
##FORMAT=<ID=CN,Number=1,Type=Integer,Description="Copy number of tandem duplication (e.g. 2 for one additional copy)"> | ||
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT sample001normal |
Binary file not shown.
Binary file not shown.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,16 @@ | ||
2024-08-15 15:43:39,641 [INFO] ****************** Start SVision-pro, version 1.8 ******************** | ||
2024-08-15 15:43:39,642 [INFO] Command: /opt/conda/bin/SVision-pro --target_path sample001tumor_minimap2_mdtagged_sorted.bam --base_path sample001normal_minimap2_mdtagged_sorted.bam --genome_path hg38_chr17_1-8M_chr18_1-9M_hpv16.fa.gz --model_path model_liteunet_256_8_16_32_32_32.pth --out_path sample001tumor_svisionpro_outputs/ --sample_name sample001tumor --process_num 2 --detect_mode somatic --preset hifi --min_supp 3 --min_mapq 20 --min_sv_size 30 --max_sv_size 1000000 --device cpu | ||
2024-08-15 15:43:39,644 [INFO] ****************** Step1 collecting and plotting ********************** | ||
2024-08-15 15:43:40,233 [INFO] Collecting sample001tumor chr18_0_10000000, 0 candidate events found. Time cost 0s | ||
2024-08-15 15:43:40,328 [INFO] Collecting sample001tumor hpv16_0_10000000, 0 candidate events found. Time cost 0s | ||
2024-08-15 15:43:40,762 [INFO] Collecting sample001tumor chr17_0_10000000, 5 candidate events found. Time cost 0s | ||
2024-08-15 15:43:40,854 [INFO] Collecting Time Cost 1s | ||
2024-08-15 15:43:40,855 [INFO] ****************** Step2 lite-Unet predicting ************************** | ||
2024-08-15 15:43:42,446 [INFO] Predicting sample001tumor chr17_0_10000000, 5 events pass lite-Unet module. Time cost: 1s | ||
2024-08-15 15:43:42,452 [INFO] Predicting sample001tumor chr18_0_10000000, 0 events pass lite-Unet module. Time cost: 0s | ||
2024-08-15 15:43:42,457 [INFO] Predicting sample001tumor hpv16_0_10000000, 0 events pass lite-Unet module. Time cost: 0s | ||
2024-08-15 15:43:42,507 [INFO] Predicting Time Cost 1s | ||
2024-08-15 15:43:42,508 [INFO] ****************** Step3 Outputting to VCF **************************** | ||
2024-08-15 15:43:42,520 [INFO] Outputting sample001tumor to /Users/leework/Documents/Research/projects/project_nexus/data/processed/work/long_read_dna_variant_calling_svisionpro/e4/3c76343c84a62c1e4445c18ad98bc3/sample001tumor_svisionpro_outputs/sample001tumor.svision_pro_v1.8.s3.vcf. Time cost: 0s | ||
2024-08-15 15:43:42,562 [INFO] ****************** Step4 Finish *********************************** | ||
2024-08-15 15:43:42,563 [INFO] SVision-pro v1.8 successfully finished. Time Cost 2.924899s |
Oops, something went wrong.