CS4220 Group Project: Predicting Gene Essentiality in Cancer Cell-Lines using a Weighted Naive Bayes Model
This repository details the code used for the group project in the module CS4220 Knowledge Discovery Methods in Bioinformatics. All code in this repository is formulated and implemented jointly by Amanda Ho Shan Rui, Benjamin Tan Jee Min, Chan Sheng You and Chen Tianying, Tiana. The gene essentiality problem is adapted from the BROAD-DREAM Gene Essentiality Prediction Challenge.
Scripts:
- Pre-processing.R details the pre-processing steps taken in R.
- InformationGain.py details the steps taken to derive weight vectors from the weight matrix for our testing and training data. The weight vectors are then used as input for our Weighted Naive Bayes Model.
- WeightedNaiveBayes.py details the training and testing of our Weighted Naive Bayes model and benchmarking with the Gaussian Naive Bayes model.