Skip to content

iva_qc output files

martinghunt edited this page Sep 1, 2014 · 6 revisions

The script iva_qc requires a prefix of output files. The following assumes that you used out for this prefix.


out.stats.*

The main output file is out.stats.txt, which is a human-redable tab-delimited list of stats and their values (one stat per line). An explanation of the stats can be found here. If for some reason you prefer to use a popular spreadsheet application, then exactly the same information is in the tsv file out.stats.tsv and should meet your needs.


out.assembly_*

  • out.assembly_contigs_hit_ref.fasta: a FASTA file of just the contigs that had a nucmer hit to the reference.
  • out.assembly_vs_ref.coords: nucmer file of hits between assembly and reference.
  • out.assembly_v_ref.act.sh: a script that will start ACT, showing the assembly compared against the reference. Choose 'concatenate sequences' when ACT starts. The reference is the top sequence, BLAST hits are in the middle, and the assembly is at the bottom.
  • out.assembly_v_ref.blastn: BLASTN results of comparing the assembly and reference. Needed to run the previous ACT script.

out.contig_placement.*

A PDF file iva_qc.contig_placement.pdf is made that shows the layout of the contigs on the reference, plus contig and read coverage information. This is useful, but if you really want to see what is happening then use ACT with the script out.assembly_v_ref.act.sh.

The file is made using the R script iva_qc.contig_placement.R and the data files iva_qc.read_coverage_on_ref.fwd and iva_qc.read_coverage_on_ref.rev. Here is an explanation of the plots:

  • The main panel at the top shows the nucmer hits of the contigs to the reference, one contig per row. Two (or more) hits on the same row means a contig matches two (or more) distinct places of the reference.
  • Roughly, blue means everything looks OK, red means it does not. This is only approximate and really the best thing is to use ACT.
  • Contigs are shaded dark blue or red if they match in the forward orientation to the reference, and light blue or red if they match in the reverse orientation.
  • The contigs heatmap has three tracks: the top two are black and show presence and absence of contig coverage. The third track is red and shows where there was good read coverage, but not contig coverage, i.e. the assembler should have assembled the region but missed it.
  • The reads heatmap has three tracks. The top track (black) is where read coverage is good on both strands. The middle and bottom tracks (red) show low read coverage on the forwards and reverse strands.
  • The two blue line plots at the bottom show the read depth on the forwards and reverse strands.

### out.gage/ This directory has the results of running the GAGE analysis. Most files are cleaned after running. Some of the remaining files are:

  • run.sh: the script used to run the GAGE analysis.
  • gage.out: stdout from run.sh. IVA gets the stats from this file.
  • out.report: more detailed file made by the GAGE analysis.

out.ratt/

This directory contains the results of running RATT. Most files are cleaned after running. The remaining files are as follows.

  • run.sh: the script used to run RATT, so you can easily rerun if you like to make annotation files for your assembly.
  • run.sh.out: the stdout from running run.sh. IVA gets its stats from this file.

out.reads_mapped_*

  • out.reads_mapped_to_assembly.bam[.bai]: sorted indexed BAM file of reads mapped to the assembly.
  • .reads_mapped_to_ref.bam[.bai]: sorted indexed BAM file of reads mapped to the reference.

out.ref*

  • out.ref_cds_seqs.fa: FASTA file of CDS sequences found in the reference genome.
  • out.ref_cds_seqs_mapped_to_assembly.coords: nucmer coords file of CDS sequences mapped to the assembly.
  • out.reference.fa: FASTA file of reference genome.