-

TreeProfiler Tutorial

+

TreeProfiler Tutorial

-

Introduction

+

Introduction

TreeProfiler is command-line tool for profiling metadata table into phylogenetic tree with descriptive analysis and output visualization

-

Table of Contents

+

Table of Contents

-

Installation

+

Installation

-

Dependencies

+

Dependencies

TreeProfiler requires
    -
  • Python version >= 3.9

  • +
  • Python version >= 3.10

  • ETE Toolkit v4

  • -
  • biopython

  • -
  • selenium

  • -
  • scipy

  • -
  • matplotlib

  • -
  • numba

  • +
  • biopython >= 1.8

  • +
  • selenium >= 4.24

  • +
  • scipy >= 1.8.0

  • +
  • matplotlib >= 3.4

  • +
  • pymc >= 5.0.0

  • +
  • aesara

  • pastml (custom)

-

Quick install via pip

+

Quick install via pip

# Install ETE Toolkit v4
 pip install --force-reinstall https://github.com/etetoolkit/ete/archive/ete4.zip
 
+
 # Install TreeProfiler dependencies
-pip install biopython selenium scipy matplotlib numba
+pip install biopython selenium scipy matplotlib pymc aesara
 
 # Install custom pastml package for ete4
 pip install "git+https://github.com/dengzq1234/pastml.git@pastml2ete4"
@@ -237,7 +253,7 @@ 

Quick install via pip # Install TreeProfiler tool via pip pip install TreeProfiler -# or installing main repo +# Or install directly from github pip install https://github.com/compgenomicslab/TreeProfiler/archive/main.zip # or development mode for latestest update pip install git+https://github.com/compgenomicslab/TreeProfiler@dev-repo @@ -245,7 +261,7 @@

Quick install via pip

-

Quick Start with examples dataset

+

Quick Start with examples dataset

TreeProfiler provide various example dataset for testing in examples/ or https://github.com/compgenomicslab/TreeProfiler/tree/main/examples, each directory consists a demo script *_demo.sh for quick starting different functions in TreeProfiler which already has annotate-plot pipeline of example data. User can fast explore different example tree with different visualizations. Here is the demonstration:

# execute demo script of example1
 cd examples/basic_example1/
@@ -291,9 +307,9 @@ 

Quick Start with exam

-

Manual installation

+

Manual installation

-

Install ETE v4

+

Install ETE v4

Quick way

pip install https://github.com/etetoolkit/ete/archive/ete4.zip
 
@@ -304,7 +320,7 @@

Install ETE v4

Clone this repository (git clone https://github.com/etetoolkit/ete.git)

  • Install dependecies - If you are using conda:

    -

    conda install -c conda-forge cython bottle brotli numpy pyqt

    +

    conda install -c conda-forge cython bottle brotli pyqt numpy<2.0

    • Otherwise, you can install them with

    @@ -318,10 +334,10 @@

    Install ETE v4(In Linux there may be some cases where the gcc library must be installed, which can be done with conda install -c conda-forge gcc_linux-64)

  • -

    Install TreeProfiler

    +

    Install TreeProfiler

    Install dependencies

    # install BioPython, selenium, scipy via conda
    -conda install -c conda-forge biopython selenium scipy matplotlib
    +conda install -c conda-forge biopython selenium scipy matplotlib pymc
     
     # or pip
     pip install biopython selenium scipy matplotlib
    @@ -336,13 +352,13 @@ 

    Install TreeProfiler

    or install inrectly from github

    # install directly
    -pip install https://github.com/dengzq1234/TreeProfiler/archive/refs/tags/v1.1.0.tar.gz
    +pip install https://github.com/compgenomicslab/TreeProfiler/archive/main.zip
     
    -

    Input files

    +

    Input files

    TreeProfiler takes following file types as input

    @@ -365,7 +381,7 @@

    Input files
    -

    Basic Usage

    +

    Basic Usage

    TreeProfiler has two main subcommand:
    -

    Using TreeProfiler

    +

    Using TreeProfiler

    In this Tutorial we will use TreeProfiler and demostrate basic usage with data in examples/

    -
    tree examples/
    -examples/
    -├── automatic_query
    -│   ├── basic_example1_metadata1.tsv
    -│   ├── basic_example1.nw
    -│   ├── collapse_demo.sh
    -│   ├── highlight_demo.sh
    -│   └── prune_demo.sh
    -├── basic_example1
    -│   ├── basic_example1_metadata1.tsv
    -│   ├── basic_example1_metadata2.tsv
    -│   ├── basic_example1.nw
    -│   └── example1_demo.sh
    -├── basic_example2
    -│   ├── diauxic.array
    -│   ├── diauxic.nw
    -│   ├── example2_demo.sh
    -│   ├── FluA_H3_AA.fas
    -│   ├── MCC_FluA_H3_Genotype.txt
    -│   └── MCC_FluA_H3.nw
    -├── pratical_example
    -│   ├── emapper
    -│   │   ├── 7955.ENSDARP00000116736.aln.faa
    -│   │   ├── 7955.ENSDARP00000116736.nw
    -│   │   ├── 7955.out.emapper.annotations
    -│   │   ├── 7955.out.emapper.pfam
    -│   │   ├── 7955.out.emapper.smart.out
    -│   │   ├── emapper_demo.sh
    -│   │   ├── nifH.faa.aln
    -│   │   ├── nifH.nw
    -│   │   ├── nifH.out.emapper.annotations
    -│   │   └── nifH.out.emapper.pfam
    -│   ├── gtdb_r202
    -│   │   ├── ar122_metadata_r202_lite.tar.gz
    -│   │   ├── bac120_metadata_r202_lite.tar.gz
    -│   │   ├── gtdbv202full_demo.sh
    -│   │   ├── gtdbv202lite_demo.sh
    -│   │   ├── gtdbv202.nw
    -│   │   ├── merge_gtdbtree.py
    -│   │   └── progenome3.tar.gz
    -│   └── progenome3
    -│       ├── progenome3.nw
    -│       ├── progenome3.tsv
    -│       └── progenome_demo.sh
    -└── taxonomy_example
    -    ├── gtdb
    -    │   ├── gtdb_demo.sh
    -    │   ├── gtdb_example1.nw
    -    │   └── gtdb_example1.tsv
    -    └── ncbi
    -        ├── ncbi_demo.sh
    -        ├── ncbi_example.nw
    -        └── ncbi_example.tsv
    -
    -
    -

    Reading input tree

    +

    Reading input tree

    -

    Tree format

    +

    Tree format

    TreeProfiler accpept input tree in .nw or .ete by putting --input-type {newick,ete} flag to identify. By default, TreeProfiler will automatically detech the format of tree. The difference between .nw and .ete:

    -

    Tree parser

    +

    Tree parser

    TreeProfiler provides argument --internal {name,support} to specify newick tree when it include values in internal node. [default: name]

    @@ -484,7 +445,7 @@

    Tree parser
    -

    treeprofiler-annotate computing phylogenetic profiles and annotation

    +

    treeprofiler-annotate computing phylogenetic profiles and annotation

    TreeProfiler annotate subcommand is the step that annotate input metadata to target tree. As a result, itwill generate the following output file:

    1. <input_tree> + _annotated.nw, newick format with annotated tree

    2. @@ -493,7 +454,7 @@

      <input_tree> + _annotated.tsv, metadata in tab-sarated values format with annotated and summarized internal nodes information.

    -

    Annotate metadata into tree

    +

    Annotate metadata into tree

    In the following sub session we will describe the usage of following arguments in annotate step for metadata:

    @@ -520,7 +481,7 @@

    Annotate metadata in

    -

    Basic metadata in TSV/CSV format

    +

    Basic metadata in TSV/CSV format

    TreeProfiler allows users to input metadata in tsv/csv file by setting --metadata <filename.tsv|.csv> and -s <seperator>. By default, the first column of metadata should be names of target tree leaves and metadata should contain column names for each column of metadata.

    For annotating more than one metadata inputs to tree such as --metadata table1.tsv table2.tsv.

    Check metadata

    @@ -571,7 +532,7 @@

    Basic metadata in TS