anvi'o Phylogenomics Workflow with Custom hmms

I'm blatantly stealing the code from the anvi'o Phylogenomics Workflow and only including the steps that are relevant to my process for my own reference, and so I can write a SnakeMake file of this later. All code will be run in the directory of the input files, until I change this.

Step 0: Set up Peloton

Request an interactive session

srun -p high -t 20:00:00 -u --pty bash -il

Load anvio module

module load anvio/6.2-bio3
source activate anvio-6.2

Get into anvi'o directory for this workflow

mkdir /home/jemizzi/chapter_2/anvio_circadian
cd /home/jemizzi/chapter_2/anvio_circadian

Step 1: Obtain data and fix fasta files (def lines and removing contigs <1000 nts)

Put fasta files of MAGs of interested into a directory. In this case I'm using 4 novel Vanda MAGs, my Phormidium pseudopriestleyi MAG, other polar cyanobacteria MAGs, and a few other cyanos of interest.

$ ls
BulkMat_35.fa
MP8IB2_15.fa
OtherMAGsHere.fa

anvi'o requires clean def lines, see explanation here. Clean up fastas before proceeding to avoid snarky error messages:

for i in `ls *fa | awk 'BEGIN{FS=".fa"}{print $1}'` # all files in data directory must end in .fa
do
    anvi-script-reformat-fasta $i.fa -o $i-fixed.fa -l 1000 --simplify-names
    # overwrite contigs with fixed contigs
    mv $i-fixed.fa $i.fa
done

This command also removes sequences shorter than 1000 nucleotides because anvi'o developers think this is a good idea, it makes sense to me and I trust them.

Step 2: Generate anvi'o contigs database for each fasta file.

anvi'o will make files that end in .db which contain information about the MAGs that will allow it to analyze the MAGs.

for i in `ls *fa | awk 'BEGIN{FS=".fa"}{print $1}'` # all files in data directory must end in .fa
do
    anvi-gen-contigs-database -f $i.fa -o $i.db
    anvi-run-hmms -c $i.db
done

Step 3: Create `external-genomes.txt` file.

I've found it easiest to create the external-genomes.txt file in excel and then convert it to a .txt file. Just be sure to follow the formatting in the example. For BulkMat_35.fa and MP8IB2_15.fa, the file would look like this:

name		contigs_db_path
BulkMat_35	BulkMat_35.db
MP8IB2_15	MP8IB2_15.db

Also, file names can't contain periods. The file I made is here: external-genomes.txt.

Step 3.5: Create Custom HMMs file.

Follow the instructions here. Get HMMs by searching eggNOG. Put these into a directory. In my case, I will choose circadian_HMMs.

Step 4: Custom HMM search.

Refer anvio to custom HMM files.

for i in `ls *fa | awk 'BEGIN{FS=".fa"}{print $1}'` # all files in data directory must end in .fa
do
    anvi-run-hmms -c $i.db -H circadian_HMMs/
done

Get hits.

for i in `ls *fa | awk 'BEGIN{FS=".fa"}{print $1}'` # all files in data directory must end in .fa
do
    anvi-get-sequences-for-hmm-hits --external-genomes external-genomes.txt \
                                -o $i.concatenated-proteins.fa \
                                --hmm-source circadian_HMMs \
                                --return-best-hit \
                                --get-aa-sequences \
                                --concatenate
done

Out of date notes: Step 4: Custom HMM search. Look at available sequences: (maybe get rid of this code chunk)

anvi-get-sequences-for-hmm-hits --external-genomes external-genomes.txt \
                                   --hmm-source Bacteria_71 \
                                   --list-available-gene-names

Run HMM search:

anvi-get-sequences-for-hmm-hits --external-genomes external-genomes.txt \
                                -o concatenated-proteins.fa \
                                --hmm-source circadian \
                                --return-best-hit \
                                --get-aa-sequences \
                                --concatenate

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

anvio_commands.md

anvio_commands.md

anvi'o Phylogenomics Workflow with Custom hmms

Step 0: Set up Peloton

Step 1: Obtain data and fix fasta files (def lines and removing contigs <1000 nts)

Step 2: Generate anvi'o contigs database for each fasta file.

Step 3: Create `external-genomes.txt` file.

Step 3.5: Create Custom HMMs file.

Step 4: Custom HMM search.

Files

anvio_commands.md

Latest commit

History

anvio_commands.md

File metadata and controls

anvi'o Phylogenomics Workflow with Custom hmms

Step 0: Set up Peloton

Step 1: Obtain data and fix fasta files (def lines and removing contigs <1000 nts)

Step 2: Generate anvi'o contigs database for each fasta file.

Step 3: Create external-genomes.txt file.

Step 3.5: Create Custom HMMs file.

Step 4: Custom HMM search.

Step 3: Create `external-genomes.txt` file.