Skip to content

Dennis-Dekker/Machine_learning

Repository files navigation

Machine_learning

Machine learning Project - mRNA expression data

Goal

Test different classification methods (discuss) → Select the best method

Dataset

Gene expression (Base ML dataset, Raw RNA seq dataset) of five different cancer types.

Methods

Preprocessing (Unsupervised learning):

  • Principal Component Analysis(PCA)

  • tSNE

Use different classification methods (Supervised learning):

  • K-Nearest Neighbors

  • Linear Models

  • Naive Bayes Classifiers

  • Decision Trees

  • Kernelized Support Vector Machines(?)

TODO

Keep track on what we still have to do. Please update this list with new todo's.

  • Update README.
  • Investigate preprocessing that is applied to the data.
  • Write about preprocessing steps in report.
  • Keep track on references in the report.
  • Reorganize repository (give logical filenames, restructure folders, etc.).
  • Rewrite PCA scripts structure.
  • Calculate amount of PC's needed (PCA script).
  • Review PCA script (especially investigate explained variation values).
  • Download data of different cancer types from Synapse and merge with annotations (also from Synapse).
  • Try different Hyperparameters in the ML algorithms (Knn, SVM, ecc) and cross validation
  • PCA: try to apply it within cancer types
  • find important features → DEG (Differentially expressed genes)
  • KEGG analysis (Pathways)
  • check for class imbalances (bar plot)

About

Machine learning project on mRNA expression

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages