Skip to content

Latest commit

 

History

History
53 lines (33 loc) · 2.97 KB

README.md

File metadata and controls

53 lines (33 loc) · 2.97 KB

HGT finder tool

Static Badge GitHub commit activity

Workflow

The files will be processed by Orthofinder (Emms, D.M., Kelly, S. 2019) and KaKs Calculator (Zhang Z.). Both this programs will provide values of distance between proteins of the same orthogroup (inferred by Orthofinder). The distances that fall below the 5th quantile of the distribution will be marked for further inspection, which is made on the topology of the tree. Gene trees that diverge significantly from the inferred species tree are marked as candidates for HGT.

Usage:

DockerHub pull:

Run this in your terminal to pull a pre-made image from DockerHub:

docker pull cl3mente/dualHGT:latest

After the pull is complete, you will have a working image that you can use to run dualHGT on a container - specify your local input folder binding it to an 'input' volume in the container:

docker run -v your-input-folder/:/app/input cl3mente/dualhgt:latest

Alternatively, run this command from the folder itself:

docker run -v $PWD/:/app/input cl3mente/dualhgt:latest

This folder will be the channel for Docker to communicate with your local machine. This is where the results will be written, in the 'output' subfolder.

GitHub clone:

Although not suggested, you can simply clone this repository and make sure every dependency is installed.

git clone www.github.com/cl3mente/dualHGT

Run the software:

Once you run the container, or have the dualHGT.py script available, run it with this command and the other options that you might want to customize:

python dualHGT.py -i input/ [...]

Additional arguments

-i or --input: This argument is used to specify the input directory.

-gff: A flag specifying whether the input directory contains:

  • FLAG ON: a reference genome .fasta and an annotation .gff with the same root for each species selected (for this)
  • FLAG OFF (default): multifasta files with protein sequences of the species selected

-OFr or --orthofinderResults: This argument is optional and allows you to specify the OrthoFinder results file.

-v or --verbose: Verbose mode.

-nt or --numberThreads: The number of threads to use for the analysis.

the script needs protein sequences and their corresponding coding sequences from the investigated organism group. The folder containing both files is passed to the program with the -i (or --input) argument. It's also possible to pass a folder with previous Orthofinder results to the program to avoid multiple time-consuming runs.