Skip to content

STRIDES-Codes/Exploring-feature-selection-in-deep-learning-models-for-GDC-Cancer-site-expression-data

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

24 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

2020 CSHL Codeathon

Exploring Feature Selection for Genomics Expression Profile (draft version)

Motivation:

NCI-DOE collaboration (https://github.com/ravichas/ML-TC1) show that genomic expression profiles collected from different cancer sites/types can be modeled (classification) using the deep-learning (convolutional neural network) method. The method works well for a balanced dataset. Neural network method doesn't answer what features (i.e. genes) are important for the classification? A project to explore feature selection for genomics data could be useful for cancer research communities.

Complexity of the problem and open questions:

Genomics data is high dimensional in terms of the number of genes/probes/features. Models constructed from a high dimensional Omics data will be complex and difficult to explain. Identifying important features/genes is as important as building high accuracy models. Keeping in mind that genes do not work alone, pathway-based analysis could be used to

Overview

  • Data collection
  • Datasets created in the previous step will be used to construct/compare several supervised and unsupervised models (tSNE, PCA,
  • Important features from these models will be compared with experimental findings
  • Summarize the conclusions
  • Provide list of open questions and propose future directions

alt text

Links

Presentation slide Google Link:

Summary link:

GitHub:

Team

Releases

No releases published

Packages

No packages published