Analysis performed on a yeast codon dataset
Dataset created by Pascal Durrens (CNRS) in 2020 (see codons_2020 for details about the construction of the dataset).
Taxonomy cleaning is used to make a usable .csv file from the taxonomy file in codons_2020 (codons_2020/CDS/MEASURES/SPECIES/genomes-species.csv). taxonomy_cleaning.py usage: reglog.py [-h] Inputdir Outputdir
Extract taxonomic subset.
positional arguments: Inputdir directory containing the whole dataset Outputdir Path to output directory (will be created if it doesn't exist)
optional arguments: -h, --help show this help message and exit
- metrics_average.py :
usage: extract_subset [-h] Workingdir Outputdir Taxopath
Used to prepare the dataset (codons_2020/CSD/MEASURES/SEQUENCES/ALL_MEASURES) for PCA/sPCA analysis by computing the median for each species. The taxonomy file is the one generated by taxonomy_cleaning.py
positional arguments: Workingdir directory containing the whole dataset Outputdir Path to output directory Taxopath Path to taxonomy file
optional arguments: -h, --help show this help message and exit
- PCA.R :
usage: extract_subset [-h] Workingdir Outputdir Taxopath Subset_level Subset_name groups
Use to perform PCA and sPCA on a file and save the result in pdf format.
positional arguments: Workingdir directory containing the whole dataset Outputdir Path to output directory Taxopath Path to taxonomy file Subset_level Name of the taxon level you want to keep (phylum, class, order,...) Subset_name Name of the taxon you want to keep at the chosen level groups Name of the taxon level you want to want to highlight
optional arguments: -h, --help show this help message and exit
- PCA_CANDIDA.R :
usage: extract_subset [-h] Workingdir Outputdir Candida
Use to perform PCA and sPCA analysis only on Candida yeast species. Use of the candida_data.csv file providing specific data about candida species
positional arguments: Workingdir directory containing the whole dataset Outputdir Path to output directory Candida Path to candida data file
optional arguments: -h, --help show this help message and exit
- prepare_reg_log.py :
usage: prepare log reg [-h] [-f [FAMILY ...]] [-s [SPECIES ...]] [-gc [GENECODETYPE ...]] Workingdir Outputdir Taxopath
Prepare the data for logistic regression model by selecting the subset of species to use for the classification.
positional arguments: Workingdir directory containing the whole dataset Outputdir Path to output directory (will be created if it doesn't exist) Taxopath Path to taxonomy file
optional arguments: -h, --help show this help message and exit -f [FAMILY ...], --family [FAMILY ...] families of interest -s [SPECIES ...], --species [SPECIES ...] list of species to keep -gc [GENECODETYPE ...], --genecodetype [GENECODETYPE ...] Choose which genecode type you want to include (default = W)
- reglog.py:
usage: reglog.py [-h] Inputdir Outputdir
Perform classification on the data set with multiple conditions. (Dataset are the files created by prepare_reg_log.py, multiple files will be processed one at a time).
positional arguments: Inputdir directory containing the whole dataset Outputdir Path to output directory (will be created if it doesn't exist)
optional arguments: -h, --help show this help message and exit
-
model_pathos.py :
usage: model_pathos.py [-h] File Outputdir
Perform classification on the data set with multiple conditions. (Dataset are the files created by prepare_model_pathos.py, multiple files will be processed one at a time).
positional arguments: File File with dataset Outputdir Path to output directory
optional arguments: -h, --help show this help message and exit