PloiDB data processing and ploidy inference pipeline

This repository consists of the pipeline responsible for producing the of PloiDB plidy inferences. In also includes additional data processing and tree reconstruction scripts.

To execute the inference scheme on a given dataset, use the exec_ploidb_pipeline script with the following arguments:

    --counts_path PATH              chromosome counts file path  [required]
    --tree_path PATH                path to the tree file  [required]
    --output_dir PATH               directory to create the chromevol input in
                                    [required]
    --log_path PATH                 path to log file of the script  [required]
    --taxonomic_classification_path TEXT
                                    path to data file with taxonomic
                                    classification of members in the counts and
                                    tree data
    --ploidy_classification_path TEXT
                                    path to write the ploidy classification to
    --optimize_thresholds BOOLEAN   indicator weather thresholds should be
                                    optimized based on simulations
    --diploidy_threshold FLOAT RANGE
                                    threshold between 0 and 1 for the frequency
                                    of polyploidy support across mappings for
                                    taxa to be deemed as diploids  [0<=x<=1]
    --polyploidy_threshold FLOAT RANGE
                                    threshold between 0 and 1 for the frequency
                                    of polyploidy support across mappings for
                                    taxa to be deemed as polyploids  [0<=x<=1]
    --allow_base_num_parameter BOOLEAN
                                    indicator if we allow the selected model to
                                    include base number parameter or not
    --use_model_selection BOOLEAN   indicator if we allow the selected model to
                                    include base number parameter or not
    --help                          Show this message and exit.

The output of the pipeline is created in <output_dir> and includes the following:

model_selection - a folder consisting of the resluts of chromevol model fitting to the data with different sets of parameters
stochastic_mappings.zip - a zip consisting of the sampled stochastic mappings, based on the all the different chromevol models, except for the gain_loss model which does not account for polyplodizations.
simulations.zip - a zip consisting of the simulated datasets, based on the all the different chromevol models, except for the gain_loss model which does not account for polyplodizations.
ploidy.csv - a file with the plidy inference data
classified_tree.nwk, classified_tree.phyloxml - files of trees with the chromosome numbers and classifications of tip taxa in newick and phyloxml formats, respectively

For additional deatils on the algorithm used for producing PloiDB classifications, please see the manuscript:

Halabi, Keren, Anat Shafir, and Itay Mayrose. "PloiDB: The plant ploidy database." New Phytologist (2023).‏ https://doi.org/10.1111/nph.19057

Name		Name	Last commit message	Last commit date
Latest commit History 130 Commits
.idea		.idea
data_generation		data_generation
data_processing		data_processing
notebooks		notebooks
pipeline		pipeline
services		services
.gitignore		.gitignore
README.md		README.md
collect_ccdb.py		collect_ccdb.py
collect_unresolved_names.ipynb		collect_unresolved_names.ipynb
compute_tippiness_metrices.py		compute_tippiness_metrices.py
exec_ploidb_pipeline.py		exec_ploidb_pipeline.py
simulate.py		simulate.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

PloiDB data processing and ploidy inference pipeline

The output of the pipeline is created in <output_dir> and includes the following:

About

Releases

Packages

Languages

halabikeren/ploidb

Folders and files

Latest commit

History

Repository files navigation

PloiDB data processing and ploidy inference pipeline

The output of the pipeline is created in <output_dir> and includes the following:

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages