Skip to content

Commit

Permalink
DOC: update output details in docs to fix #33
Browse files Browse the repository at this point in the history
  • Loading branch information
Vini2 committed Dec 12, 2023
1 parent 77f5a2b commit ec73c41
Showing 1 changed file with 39 additions and 16 deletions.
55 changes: 39 additions & 16 deletions docs/usage.md
Original file line number Diff line number Diff line change
Expand Up @@ -111,7 +111,33 @@ The `phables run` command will run preprocessing steps, perform genome resolutio

## Output

The main output of Phables will contain the following files and folders.
Following is the folder structure of the Phables complete run.

```
phable.out
├── config.yaml # config file
├── logs # all log files
├── phables # final phables results
├── phables.log # phables master log
├── postprocess # postprocessing results
└── preprocess # preprocessing results
```

Phables will create 3 main folders `preprocess`, `phables` and `postprocess` for the different stages of execution.

### 1. `preprocess` - preprocessing results

The following preprocessing steps will be run and their corresponding files and folders can be found in the `preprocess` folder.

* Obtain unitig sequences from assembly graph - `edges.fasta`
* Map reads to unitig sequences and get BAM files - `temp/*.bam` and `temp/*.bai`
* Calculate coverage of unitig sequences - `coverage.tsv`
* Scan unitig sequences for single-copy marker genes - `edges.fasta.hmmout`
* Scan unitig sequences for Prokaryotic Virus Remote Homologous Groups ([PHROGs](https://phrogs.lmge.uca.fr/)) - `phrogs_annotations.tsv`

### 2. `phables` - genome resolution results

The following files and folders can be found inside the `phables` folder which are the main outputs of Phables.

* `resolved_paths.fasta` containing the resolved genomes
* `resolved_phages` folder containing the resolved genomes in individual FASTA files
Expand All @@ -122,8 +148,20 @@ The main output of Phables will contain the following files and folders.
* `resolved_component_info.txt` containing the details of the phage bubbles resolved
* `component_phrogs.txt` containing PHROGs found in each component

### 3. `postprocess` - postprocessing results

The following postprocessing steps will be run and their corresponding files and folders can be found in the `postprocess` folder.

* Combine resolved genomes and unresolved edges - `genomes_and_unresolved_edges.fasta`
* Obtain read counts for resolved genomes and unresolved edges - `sample_genome_read_counts.tsv`
* Obtain mean coverage of resolved genomes and unresolved edges - `sample_genome_mean_coverage.tsv`
* Obtain RPKM coverage of resolved genomes and unresolved edges - `sample_genome_rpkm.tsv`


## Step-wise usage

You can execute each of the preprocessing, phables and postprocessing steps individually if you wish to do so as follows.

### Preprocessing only

You can use the following command to **only run the preprocessing steps**.
Expand All @@ -133,14 +171,6 @@ You can use the following command to **only run the preprocessing steps**.
phables run --input assembly_graph.gfa --reads fastq --threads 8 preprocess
```

The following preprocessing steps will be run with the corresponding files and folders.

* Obtain unitig sequences from assembly graph - `edges.fasta`
* Map reads to unitig sequences and get BAM files - `temp`
* Calculate coverage of unitig sequences - `coverage.tsv`
* Scan unitig sequences for single-copy marker genes - `edges.fasta.hmmout`
* Scan unitig sequences for Prokaryotic Virus Remote Homologous Groups (PHROGs) - `phrogs/phrogs_annotations.tsv`

### Genome resolution only

You can use the following command to **only run the genome resolution steps**. Please make sure to have the preprocessing results in the output folder.
Expand All @@ -158,10 +188,3 @@ You can use the following command to **only run the postprocessing steps**.
# Only run phables core
phables run --input assembly_graph.gfa --reads fastq --threads 8 postprocess
```

The following postprocessing steps will be run with the corresponding files and folders.

* Combine resolved genomes and unresolved edges - `genomes_and_unresolved_edges.fasta`
* Obtain read counts for resolved genomes and unresolved edges - `sample_genome_read_counts.tsv`
* Obtain mean coverage of resolved genomes and unresolved edges - `sample_genome_mean_coverage.tsv`
* Obtain RPKM coverage of resolved genomes and unresolved edges - `sample_genome_rpkm.tsv`

0 comments on commit ec73c41

Please sign in to comment.