Skip to content

Commit

Permalink
add test data
Browse files Browse the repository at this point in the history
  • Loading branch information
andyjslee committed Aug 16, 2024
1 parent dca2636 commit 41c5b69
Show file tree
Hide file tree
Showing 280 changed files with 76,945 additions and 0 deletions.
11,241 changes: 11,241 additions & 0 deletions test/data/vcf/known_sites_hg38_chr17.vcf

Large diffs are not rendered by default.

Binary file added test/data/vcf/known_sites_hg38_chr17.vcf.idx
Binary file not shown.
Binary file added test/data/vcf/known_sites_hg38_chr21_chr22.vcf.gz
Binary file not shown.
Binary file not shown.
31 changes: 31 additions & 0 deletions test/data/vcf/sample001normal_cutesv.vcf
Original file line number Diff line number Diff line change
@@ -0,0 +1,31 @@
##fileformat=VCFv4.2
##source=cuteSV-2.1.1
##fileDate=2024-08-13 17:27:39 2-UTC
##contig=<ID=chr17,length=7843138>
##contig=<ID=chr18,length=8823530>
##contig=<ID=hpv16,length=7904>
##ALT=<ID=INS,Description="Insertion of novel sequence relative to the reference">
##ALT=<ID=DEL,Description="Deletion relative to the reference">
##ALT=<ID=DUP,Description="Region of elevated copy number relative to the reference">
##ALT=<ID=INV,Description="Inversion of reference sequence">
##ALT=<ID=BND,Description="Breakend of translocation">
##INFO=<ID=PRECISE,Number=0,Type=Flag,Description="Precise structural variant">
##INFO=<ID=IMPRECISE,Number=0,Type=Flag,Description="Imprecise structural variant">
##INFO=<ID=SVTYPE,Number=1,Type=String,Description="Type of structural variant">
##INFO=<ID=SVLEN,Number=1,Type=Integer,Description="Difference in length between REF and ALT alleles">
##INFO=<ID=CHR2,Number=1,Type=String,Description="Chromosome for END coordinate in case of a translocation">
##INFO=<ID=END,Number=1,Type=Integer,Description="End position of the variant described in this record">
##INFO=<ID=CIPOS,Number=2,Type=Integer,Description="Confidence interval around POS for imprecise variants">
##INFO=<ID=CILEN,Number=2,Type=Integer,Description="Confidence interval around inserted/deleted material between breakends">
##INFO=<ID=RE,Number=1,Type=Integer,Description="Number of read support this record">
##INFO=<ID=STRAND,Number=A,Type=String,Description="Strand orientation of the adjacency in BEDPE format (DEL:+-, DUP:-+, INV:++/--)">
##INFO=<ID=RNAMES,Number=.,Type=String,Description="Supporting read names of SVs (comma separated)">
##INFO=<ID=AF,Number=A,Type=Float,Description="Allele Frequency.">
##FILTER=<ID=q5,Description="Quality below 5">
##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">
##FORMAT=<ID=DR,Number=1,Type=Integer,Description="# High-quality reference reads">
##FORMAT=<ID=DV,Number=1,Type=Integer,Description="# High-quality variant reads">
##FORMAT=<ID=PL,Number=G,Type=Integer,Description="# Phred-scaled genotype likelihoods rounded to the closest integer">
##FORMAT=<ID=GQ,Number=1,Type=Integer,Description="# Genotype quality">
##CommandLine="cuteSV --threads 2 --sample sample001normal --max_cluster_bias_INS 1000 --diff_ratio_merging_INS 0.9 --max_cluster_bias_DEL 1000 --diff_ratio_merging_DEL 0.5 --min_support 3 --min_mapq 20 --min_size 30 --max_size -1 --report_readid --genotype sample001normal_minimap2_mdtagged_sorted.bam hg38_chr17_1-8M_chr18_1-9M_hpv16.fa.gz sample001normal_cutesv.vcf sample001normal_cutesv_output/"
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT sample001normal
688 changes: 688 additions & 0 deletions test/data/vcf/sample001normal_deepvariant.gvcf

Large diffs are not rendered by default.

19 changes: 19 additions & 0 deletions test/data/vcf/sample001normal_deepvariant.vcf
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
##fileformat=VCFv4.2
##FILTER=<ID=PASS,Description="All filters passed">
##FILTER=<ID=RefCall,Description="Genotyping model thinks this site is reference.">
##FILTER=<ID=LowQual,Description="Confidence in this variant being real is below calling threshold.">
##FILTER=<ID=NoCall,Description="Site has depth=0 resulting in no call.">
##INFO=<ID=END,Number=1,Type=Integer,Description="End position (for use with symbolic alleles)">
##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">
##FORMAT=<ID=GQ,Number=1,Type=Integer,Description="Conditional genotype quality">
##FORMAT=<ID=DP,Number=1,Type=Integer,Description="Read depth">
##FORMAT=<ID=MIN_DP,Number=1,Type=Integer,Description="Minimum DP observed within the GVCF block.">
##FORMAT=<ID=AD,Number=R,Type=Integer,Description="Read depth for each allele">
##FORMAT=<ID=VAF,Number=A,Type=Float,Description="Variant allele fractions.">
##FORMAT=<ID=PL,Number=G,Type=Integer,Description="Phred-scaled genotype likelihoods rounded to the closest integer">
##FORMAT=<ID=MED_DP,Number=1,Type=Integer,Description="Median DP observed within the GVCF block rounded to the nearest integer.">
##DeepVariant_version=1.6.0
##contig=<ID=chr17,length=7843138>
##contig=<ID=chr18,length=8823530>
##contig=<ID=hpv16,length=7904>
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT sample001normal
29 changes: 29 additions & 0 deletions test/data/vcf/sample001normal_pbsv.vcf
Original file line number Diff line number Diff line change
@@ -0,0 +1,29 @@
##fileformat=VCFv4.2
##fileDate=2024-08-15T05:16:25.545Z
##source=pbsv 2.9.0 (commit v2.9.0-2-gce1559a)
##PG="pbsv call --num-threads 2 --ccs --call-min-reads-per-strand-all-samples 0 --call-min-read-perc-one-sample 10 --call-min-reads-all-samples 3 --call-min-reads-one-sample 3 /Users/leework/Documents/Research/projects/project_nexus/nexus/test/data/fasta/hg38_chr17_1-8M_chr18_1-9M_hpv16.fa sample001normal_pbsv.svsig.gz sample001normal_pbsv.vcf"
##INFO=<ID=SVTYPE,Number=1,Type=String,Description="Type of structural variant">
##INFO=<ID=END,Number=1,Type=Integer,Description="End position of the structural variant described in this record">
##INFO=<ID=SVLEN,Number=.,Type=Integer,Description="Difference in length between REF and ALT alleles">
##INFO=<ID=SVANN,Number=.,Type=String,Description="Repeat annotation of structural variant">
##INFO=<ID=CIPOS,Number=2,Type=Integer,Description="Confidence interval around POS for imprecise variants">
##INFO=<ID=MATEID,Number=.,Type=String,Description="ID of mate breakends">
##INFO=<ID=MATEDIST,Number=1,Type=Integer,Description="Distance to the mate breakend for mates on the same contig">
##INFO=<ID=IMPRECISE,Number=0,Type=Flag,Description="Imprecise structural variation">
##ALT=<ID=INV,Description="Inversion">
##ALT=<ID=DUP,Description="Duplication">
##FILTER=<ID=Decoy,Description="Variant involves a decoy sequence">
##FILTER=<ID=NearReferenceGap,Description="Variant is near (< 1000 bp) from a gap (run of >= 50 Ns) in the reference assembly">
##FILTER=<ID=NearContigEnd,Description="Variant is near (< 1000 bp) from the end of a contig">
##FILTER=<ID=InsufficientStrandEvidence,Description="Variant has insufficient number of reads per strand (< 0).">
##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">
##FORMAT=<ID=AD,Number=R,Type=Integer,Description="Read depth per allele">
##FORMAT=<ID=DP,Number=1,Type=Integer,Description="Read depth at this position for this sample">
##FORMAT=<ID=SAC,Number=.,Type=Integer,Description="Number of reads on the forward and reverse strand supporting each allele including reference">
##FORMAT=<ID=CN,Number=1,Type=Integer,Description="Copy number genotype for imprecise events">
##INFO=<ID=NotFullySpanned,Number=0,Type=Flag,Description="Duplication variant does not have any fully spanning reads">
##reference=file:///Users/leework/Documents/Research/projects/project_nexus/nexus/test/data/fasta/hg38_chr17_1-8M_chr18_1-9M_hpv16.fa
##contig=<ID=chr17,length=7843138>
##contig=<ID=chr18,length=8823530>
##contig=<ID=hpv16,length=7904>
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT sample001normal
Binary file added test/data/vcf/sample001normal_pbsv.vcf.gz
Binary file not shown.
Binary file added test/data/vcf/sample001normal_pbsv.vcf.gz.tbi
Binary file not shown.
51 changes: 51 additions & 0 deletions test/data/vcf/sample001normal_sniffles2.vcf
Original file line number Diff line number Diff line change
@@ -0,0 +1,51 @@
##fileformat=VCFv4.2
##source=Sniffles2_2.3.3
##command="/opt/conda/bin/sniffles --input sample001normal_minimap2_mdtagged_sorted.bam --vcf sample001normal_sniffles2.vcf --snf sample001normal_sniffles2.snf --sample-id sample001normal --reference hg38_chr17_1-8M_chr18_1-9M_hpv16.fa.gz --threads 2 --minsupport 3 --minsvlen 30 --mapq 20 --output-rnames"
##fileDate="2024/08/15 06:09:52"
##contig=<ID=chr17,length=7843138>
##contig=<ID=chr18,length=8823530>
##contig=<ID=hpv16,length=7904>
##ALT=<ID=INS,Description="Insertion">
##ALT=<ID=DEL,Description="Deletion">
##ALT=<ID=DUP,Description="Duplication">
##ALT=<ID=INV,Description="Inversion">
##ALT=<ID=BND,Description="Breakend; Translocation">
##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">
##FORMAT=<ID=GQ,Number=1,Type=Integer,Description="Genotype quality">
##FORMAT=<ID=DR,Number=1,Type=Integer,Description="Number of reference reads">
##FORMAT=<ID=DV,Number=1,Type=Integer,Description="Number of variant reads">
##FORMAT=<ID=ID,Number=1,Type=String,Description="Individual sample SV ID for multi-sample output">
##FILTER=<ID=PASS,Description="All filters passed">
##FILTER=<ID=GT,Description="Genotype filter">
##FILTER=<ID=SUPPORT_MIN,Description="Minimum read support filter">
##FILTER=<ID=STDEV_POS,Description="SV Breakpoint standard deviation filter">
##FILTER=<ID=STDEV_LEN,Description="SV length standard deviation filter">
##FILTER=<ID=COV_MIN,Description="Minimum coverage filter">
##FILTER=<ID=COV_CHANGE,Description="Coverage change filter">
##FILTER=<ID=COV_CHANGE_FRAC,Description="Coverage fractional change filter">
##FILTER=<ID=MOSAIC_AF,Description="Mosaic maximum allele frequency filter">
##FILTER=<ID=ALN_NM,Description="Length adjusted mismatch filter">
##FILTER=<ID=STRAND,Description="Strand support filter">
##FILTER=<ID=SVLEN_MIN,Description="SV length filter">
##INFO=<ID=PRECISE,Number=0,Type=Flag,Description="Structural variation with precise breakpoints">
##INFO=<ID=IMPRECISE,Number=0,Type=Flag,Description="Structural variation with imprecise breakpoints">
##INFO=<ID=MOSAIC,Number=0,Type=Flag,Description="Structural variation classified as putative mosaic">
##INFO=<ID=SVLEN,Number=1,Type=Integer,Description="Length of structural variation">
##INFO=<ID=SVTYPE,Number=1,Type=String,Description="Type of structural variation">
##INFO=<ID=CHR2,Number=1,Type=String,Description="Mate chromsome for BND SVs">
##INFO=<ID=SUPPORT,Number=1,Type=Integer,Description="Number of reads supporting the structural variation">
##INFO=<ID=SUPPORT_INLINE,Number=1,Type=Integer,Description="Number of reads supporting an INS/DEL SV (non-split events only)">
##INFO=<ID=SUPPORT_LONG,Number=1,Type=Integer,Description="Number of soft-clipped reads putatively supporting the long insertion SV">
##INFO=<ID=END,Number=1,Type=Integer,Description="End position of structural variation">
##INFO=<ID=STDEV_POS,Number=1,Type=Float,Description="Standard deviation of structural variation start position">
##INFO=<ID=STDEV_LEN,Number=1,Type=Float,Description="Standard deviation of structural variation length">
##INFO=<ID=COVERAGE,Number=.,Type=Float,Description="Coverages near upstream, start, center, end, downstream of structural variation">
##INFO=<ID=STRAND,Number=1,Type=String,Description="Strands of supporting reads for structural variant">
##INFO=<ID=AC,Number=.,Type=Integer,Description="Allele count, summed up over all samples">
##INFO=<ID=SUPP_VEC,Number=1,Type=String,Description="List of read support for all samples">
##INFO=<ID=CONSENSUS_SUPPORT,Number=1,Type=Integer,Description="Number of reads that support the generated insertion (INS) consensus sequence">
##INFO=<ID=RNAMES,Number=.,Type=String,Description="Names of supporting reads (if enabled with --output-rnames)">
##INFO=<ID=AF,Number=1,Type=Float,Description="Allele Frequency">
##INFO=<ID=NM,Number=.,Type=Float,Description="Mean number of query alignment length adjusted mismatches of supporting reads">
##INFO=<ID=PHASE,Number=.,Type=String,Description="Phasing information derived from supporting reads, represented as list of: HAPLOTYPE,PHASESET,HAPLOTYPE_SUPPORT,PHASESET_SUPPORT,HAPLOTYPE_FILTER,PHASESET_FILTER">
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT sample001normal
32 changes: 32 additions & 0 deletions test/data/vcf/sample001normal_svim.vcf
Original file line number Diff line number Diff line change
@@ -0,0 +1,32 @@
##fileformat=VCFv4.2
##fileDate=2024-08-15|03:10:49PM|UTC|+0000
##source=SVIM-v2.0.0
##contig=<ID=chr17,length=7843138>
##contig=<ID=chr18,length=8823530>
##contig=<ID=hpv16,length=7904>
##ALT=<ID=DEL,Description="Deletion">
##ALT=<ID=INV,Description="Inversion">
##ALT=<ID=DUP,Description="Duplication">
##ALT=<ID=DUP:TANDEM,Description="Tandem Duplication">
##ALT=<ID=DUP:INT,Description="Interspersed Duplication">
##ALT=<ID=INS,Description="Insertion">
##ALT=<ID=BND,Description="Breakend">
##INFO=<ID=SVTYPE,Number=1,Type=String,Description="Type of structural variant">
##INFO=<ID=CUTPASTE,Number=0,Type=Flag,Description="Genomic origin of interspersed duplication seems to be deleted">
##INFO=<ID=END,Number=1,Type=Integer,Description="End position of the variant described in this record">
##INFO=<ID=SVLEN,Number=1,Type=Integer,Description="Difference in length between REF and ALT alleles">
##INFO=<ID=SUPPORT,Number=1,Type=Integer,Description="Number of reads supporting this variant">
##INFO=<ID=STD_SPAN,Number=1,Type=Float,Description="Standard deviation in span of merged SV signatures">
##INFO=<ID=STD_POS,Number=1,Type=Float,Description="Standard deviation in position of merged SV signatures">
##INFO=<ID=STD_POS1,Number=1,Type=Float,Description="Standard deviation of breakend 1 position">
##INFO=<ID=STD_POS2,Number=1,Type=Float,Description="Standard deviation of breakend 2 position">
##INFO=<ID=SEQS,Number=.,Type=String,Description="Insertion sequences from all supporting reads">
##INFO=<ID=READS,Number=.,Type=String,Description="Names of all supporting reads">
##INFO=<ID=ZMWS,Number=1,Type=Integer,Description="Number of supporting ZMWs (PacBio only)">
##FILTER=<ID=hom_ref,Description="Genotype is homozygous reference">
##FILTER=<ID=not_fully_covered,Description="Tandem duplication is not fully covered by a single read">
##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">
##FORMAT=<ID=DP,Number=1,Type=Integer,Description="Read depth">
##FORMAT=<ID=AD,Number=R,Type=Integer,Description="Read depth for each allele">
##FORMAT=<ID=CN,Number=1,Type=Integer,Description="Copy number of tandem duplication (e.g. 2 for one additional copy)">
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT sample001normal
Binary file added test/data/vcf/sample001normal_whatshap_phased.vcf.gz
Binary file not shown.
Binary file not shown.
16 changes: 16 additions & 0 deletions test/data/vcf/sample001tumor.svision_pro_v1.8.s3.log
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
2024-08-15 15:43:39,641 [INFO] ****************** Start SVision-pro, version 1.8 ********************
2024-08-15 15:43:39,642 [INFO] Command: /opt/conda/bin/SVision-pro --target_path sample001tumor_minimap2_mdtagged_sorted.bam --base_path sample001normal_minimap2_mdtagged_sorted.bam --genome_path hg38_chr17_1-8M_chr18_1-9M_hpv16.fa.gz --model_path model_liteunet_256_8_16_32_32_32.pth --out_path sample001tumor_svisionpro_outputs/ --sample_name sample001tumor --process_num 2 --detect_mode somatic --preset hifi --min_supp 3 --min_mapq 20 --min_sv_size 30 --max_sv_size 1000000 --device cpu
2024-08-15 15:43:39,644 [INFO] ****************** Step1 collecting and plotting **********************
2024-08-15 15:43:40,233 [INFO] Collecting sample001tumor chr18_0_10000000, 0 candidate events found. Time cost 0s
2024-08-15 15:43:40,328 [INFO] Collecting sample001tumor hpv16_0_10000000, 0 candidate events found. Time cost 0s
2024-08-15 15:43:40,762 [INFO] Collecting sample001tumor chr17_0_10000000, 5 candidate events found. Time cost 0s
2024-08-15 15:43:40,854 [INFO] Collecting Time Cost 1s
2024-08-15 15:43:40,855 [INFO] ****************** Step2 lite-Unet predicting **************************
2024-08-15 15:43:42,446 [INFO] Predicting sample001tumor chr17_0_10000000, 5 events pass lite-Unet module. Time cost: 1s
2024-08-15 15:43:42,452 [INFO] Predicting sample001tumor chr18_0_10000000, 0 events pass lite-Unet module. Time cost: 0s
2024-08-15 15:43:42,457 [INFO] Predicting sample001tumor hpv16_0_10000000, 0 events pass lite-Unet module. Time cost: 0s
2024-08-15 15:43:42,507 [INFO] Predicting Time Cost 1s
2024-08-15 15:43:42,508 [INFO] ****************** Step3 Outputting to VCF ****************************
2024-08-15 15:43:42,520 [INFO] Outputting sample001tumor to /Users/leework/Documents/Research/projects/project_nexus/data/processed/work/long_read_dna_variant_calling_svisionpro/e4/3c76343c84a62c1e4445c18ad98bc3/sample001tumor_svisionpro_outputs/sample001tumor.svision_pro_v1.8.s3.vcf. Time cost: 0s
2024-08-15 15:43:42,562 [INFO] ****************** Step4 Finish ***********************************
2024-08-15 15:43:42,563 [INFO] SVision-pro v1.8 successfully finished. Time Cost 2.924899s
Loading

0 comments on commit 41c5b69

Please sign in to comment.