Construct a phylogenetic tree from Kraken2 in order to use in phyloseq object #46

magibc · 2022-03-10T16:28:08Z

Hi,
It's my first time analysing metagenomics from a shotgun experiment but before to post this issue I tried some strategies to overcome the topic described below.

I would like to construct a phyloseq object from Kraken2 + Bracken output but containing a phylogenetic tree in the phyloseq object in order to be able to calculate Unifrac distances in downstream analysis.

Could you recommend me some software to create a phylogenetic tree from Kraken2 output?

Thanks on advance for your help/hints,

Magí.

damioresegun · 2022-08-09T10:25:27Z

I was trying to do this recently too but as I understand it, you can't make it directly from a Kraken output as that only have taxonomic information. For phylogenetic information, you need to carry out a multiple sequence alignment and then carry out bootstrap calculation. Its somewhat computationally difficult to do and I honestly gave up on doing that for my samples. In terms of software that could make one, you could try MEGA although in my experience, that works for small-ish datasets.

For phyloseq, you can make a taxonomic tree that could help in visualisation however, it doesn't really do much in providing phylogenetic information that you can analyse.

If you do manage to find something that works for metagenomic data, let me know!

CMagnoBR · 2022-12-12T13:13:52Z

Hey guys!
Let me know too! Recently Im looking for a method for infer phylogenetic tree, and produce a newick tree file for others purposes.
Using Kraken2, we can extract the mapped sequences using "--classified-out" option. However, there aren`t any method to use these sequences for phylogenetic inference. The head of the reads has the TaxID classified by the Kraken2. I believe that is the way. Suggestions?

Regards.

CMagnoBR · 2023-12-18T14:18:11Z

Hi!
Past 1 year, there is some advance for this question??
Thanks!

paulzierep · 2024-04-24T19:30:39Z

I need to look into this, but should it not be possible to create a tree from the tax ids using e.g. ete3: http://etetoolkit.org/docs/latest/tutorial/tutorial_ncbitaxonomy.html
In any case the tree is optional for phyloseq so you can use most functions without the tree.

cxkBio · 2024-10-02T05:55:54Z

I need to look into this, but should it not be possible to create a tree from the tax ids using e.g. ete3: http://etetoolkit.org/docs/latest/tutorial/tutorial_ncbitaxonomy.html In any case the tree is optional for phyloseq so you can use most functions without the tree.

Have you implemented getting phylogenetic trees from Kraken?

paulzierep · 2024-10-02T06:50:09Z

Since we cannot use sequences to create the tree, it can only work using the taxi id information from ncbi.
Based on this me and chatGPT ;) made this script:

from ete3 import NCBITaxa

# Initialize NCBITaxa object
ncbi = NCBITaxa()

# Function to parse Kraken2 report and extract TaxIDs
def parse_kraken2_report(file_path):
    taxid_set = set()
    with open(file_path, 'r') as file:
        for line in file:
            fields = line.strip().split("\t")
            if len(fields) > 4:
                taxid = fields[4]  # TaxID is in the 5th column
                taxid_set.add(int(taxid))  # Add to the set to avoid duplicates
    return list(taxid_set)

# Function to build a tree from TaxIDs
def build_taxonomic_tree(taxids):
    # Fetch the taxonomic tree based on the TaxIDs
    tree = ncbi.get_topology(taxids)
    return tree

# Function to visualize the tree
def show_tree(tree):
    tree.show()

# Optionally, you can export the tree to Newick or an image
def save_tree(tree, output_file):
    tree.write(format=1, outfile=output_file)
    tree.render(output_file.replace(".nwk", ".png"), w=183, units="mm")

# Main workflow
kraken2_report_file = "path_to_your_kraken2_report.txt"  # Update this path
taxids = parse_kraken2_report(kraken2_report_file)
print(f"Found {len(taxids)} unique TaxIDs.")

tree = build_taxonomic_tree(taxids)

# Show the tree interactively
show_tree(tree)

# Optionally, save the tree to a Newick file and an image
save_tree(tree, "kraken2_taxonomic_tree.nwk")

Since for phyloseq one needs matching ids in the OTU table and the tree you would need to use ncbi id / abundance for the OTU table input and the newick tree from the script. Let me know if that works.

cxkBio · 2024-10-02T06:58:02Z

Since we cannot use sequences to create the tree, it can only work using the taxi id information from ncbi. Based on this me and chatGPT ;) made this script:

from ete3 import NCBITaxa

# Initialize NCBITaxa object
ncbi = NCBITaxa()

# Function to parse Kraken2 report and extract TaxIDs
def parse_kraken2_report(file_path):
    taxid_set = set()
    with open(file_path, 'r') as file:
        for line in file:
            fields = line.strip().split("\t")
            if len(fields) > 4:
                taxid = fields[4]  # TaxID is in the 5th column
                taxid_set.add(int(taxid))  # Add to the set to avoid duplicates
    return list(taxid_set)

# Function to build a tree from TaxIDs
def build_taxonomic_tree(taxids):
    # Fetch the taxonomic tree based on the TaxIDs
    tree = ncbi.get_topology(taxids)
    return tree

# Function to visualize the tree
def show_tree(tree):
    tree.show()

# Optionally, you can export the tree to Newick or an image
def save_tree(tree, output_file):
    tree.write(format=1, outfile=output_file)
    tree.render(output_file.replace(".nwk", ".png"), w=183, units="mm")

# Main workflow
kraken2_report_file = "path_to_your_kraken2_report.txt"  # Update this path
taxids = parse_kraken2_report(kraken2_report_file)
print(f"Found {len(taxids)} unique TaxIDs.")

tree = build_taxonomic_tree(taxids)

# Show the tree interactively
show_tree(tree)

# Optionally, save the tree to a Newick file and an image
save_tree(tree, "kraken2_taxonomic_tree.nwk")

Since for phyloseq one needs matching ids in the OTU table and the tree you would need to use ncbi id / abundance for the OTU table input and the newick tree from the script. Let me know if that works.

Thank you so much for your help. I'll try it. Best wishes to you!

paulzierep · 2024-10-02T07:01:45Z

You are welcome, let me know if it works, I will probably make a Galaxy tool out of the script at one point and make this a easy to use workflow ... should also work with other read-based classifiers

cxkBio · 2024-10-02T07:43:31Z

You are welcome, let me know if it works, I will probably make a Galaxy tool out of the script at one point and make this a easy to use workflow ... should also work with other read-based classifiers

Perhaps you could give Bracken, Phanta, and metaphlan a try based on reads. Additionally, metaphlan includes way for drawing phylogenetic trees, so you could make comparisons. For now, due to data processing issues, I'm unable to test your code at once.

CMagnoBR · 2024-10-15T18:46:40Z

Wow! Thank you guys!
Let me know if this tool has been implemented as a Galaxy tool. By the way, I thank if this is implemented at Galaxy Europe server.

Cheers.

magibc changed the title ~~kreport2mpa.py and combine_mpa.py to phyloseq object~~ Construct a phylogenetic tree from Kraken2 in order to use in phyloseq object Mar 15, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Construct a phylogenetic tree from Kraken2 in order to use in phyloseq object #46

Construct a phylogenetic tree from Kraken2 in order to use in phyloseq object #46

magibc commented Mar 10, 2022 •

edited

Loading

damioresegun commented Aug 9, 2022

CMagnoBR commented Dec 12, 2022 •

edited

Loading

CMagnoBR commented Dec 18, 2023

paulzierep commented Apr 24, 2024

cxkBio commented Oct 2, 2024

paulzierep commented Oct 2, 2024

cxkBio commented Oct 2, 2024

paulzierep commented Oct 2, 2024

cxkBio commented Oct 2, 2024

CMagnoBR commented Oct 15, 2024 •

edited

Loading

Construct a phylogenetic tree from Kraken2 in order to use in phyloseq object #46

Construct a phylogenetic tree from Kraken2 in order to use in phyloseq object #46

Comments

magibc commented Mar 10, 2022 • edited Loading

damioresegun commented Aug 9, 2022

CMagnoBR commented Dec 12, 2022 • edited Loading

CMagnoBR commented Dec 18, 2023

paulzierep commented Apr 24, 2024

cxkBio commented Oct 2, 2024

paulzierep commented Oct 2, 2024

cxkBio commented Oct 2, 2024

paulzierep commented Oct 2, 2024

cxkBio commented Oct 2, 2024

CMagnoBR commented Oct 15, 2024 • edited Loading

magibc commented Mar 10, 2022 •

edited

Loading

CMagnoBR commented Dec 12, 2022 •

edited

Loading

CMagnoBR commented Oct 15, 2024 •

edited

Loading