Skip to content

chrislaincoubard/YeastCodonAnalysis

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

YeastCodonAnalysis

Analysis performed on a yeast codon dataset

Dataset

Dataset created by Pascal Durrens (CNRS) in 2020 (see codons_2020 for details about the construction of the dataset).

Taxonomy

Taxonomy cleaning is used to make a usable .csv file from the taxonomy file in codons_2020 (codons_2020/CDS/MEASURES/SPECIES/genomes-species.csv). taxonomy_cleaning.py usage: reglog.py [-h] Inputdir Outputdir

Extract taxonomic subset.

positional arguments: Inputdir directory containing the whole dataset Outputdir Path to output directory (will be created if it doesn't exist)

optional arguments: -h, --help show this help message and exit

PCA

  • metrics_average.py :

usage: extract_subset [-h] Workingdir Outputdir Taxopath

Used to prepare the dataset (codons_2020/CSD/MEASURES/SEQUENCES/ALL_MEASURES) for PCA/sPCA analysis by computing the median for each species. The taxonomy file is the one generated by taxonomy_cleaning.py

positional arguments: Workingdir directory containing the whole dataset Outputdir Path to output directory Taxopath Path to taxonomy file

optional arguments: -h, --help show this help message and exit

  • PCA.R :

usage: extract_subset [-h] Workingdir Outputdir Taxopath Subset_level Subset_name groups

Use to perform PCA and sPCA on a file and save the result in pdf format.

positional arguments: Workingdir directory containing the whole dataset Outputdir Path to output directory Taxopath Path to taxonomy file Subset_level Name of the taxon level you want to keep (phylum, class, order,...) Subset_name Name of the taxon you want to keep at the chosen level groups Name of the taxon level you want to want to highlight

optional arguments: -h, --help show this help message and exit

  • PCA_CANDIDA.R :

usage: extract_subset [-h] Workingdir Outputdir Candida

Use to perform PCA and sPCA analysis only on Candida yeast species. Use of the candida_data.csv file providing specific data about candida species

positional arguments: Workingdir directory containing the whole dataset Outputdir Path to output directory Candida Path to candida data file

optional arguments: -h, --help show this help message and exit

Classification

  • prepare_reg_log.py :

usage: prepare log reg [-h] [-f [FAMILY ...]] [-s [SPECIES ...]] [-gc [GENECODETYPE ...]] Workingdir Outputdir Taxopath

Prepare the data for logistic regression model by selecting the subset of species to use for the classification.

positional arguments: Workingdir directory containing the whole dataset Outputdir Path to output directory (will be created if it doesn't exist) Taxopath Path to taxonomy file

optional arguments: -h, --help show this help message and exit -f [FAMILY ...], --family [FAMILY ...] families of interest -s [SPECIES ...], --species [SPECIES ...] list of species to keep -gc [GENECODETYPE ...], --genecodetype [GENECODETYPE ...] Choose which genecode type you want to include (default = W)

  • reglog.py:

usage: reglog.py [-h] Inputdir Outputdir

Perform classification on the data set with multiple conditions. (Dataset are the files created by prepare_reg_log.py, multiple files will be processed one at a time).

positional arguments: Inputdir directory containing the whole dataset Outputdir Path to output directory (will be created if it doesn't exist)

optional arguments: -h, --help show this help message and exit

  • model_pathos.py :

    usage: model_pathos.py [-h] File Outputdir

Perform classification on the data set with multiple conditions. (Dataset are the files created by prepare_model_pathos.py, multiple files will be processed one at a time).

positional arguments: File File with dataset Outputdir Path to output directory

optional arguments: -h, --help show this help message and exit

About

Analysis performed on a yeast codon dataset

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published