Welcome to our group 5 project repository! 🧬 This project is all about charting new territories in the pancreas using single-cell RNA sequencing (scRNA-seq) data. We're on a mission to conquer the challenges of batch effect correction and data high-dimensionality in scRNA-seq with some seriously advanced computational firepower! 💻
We've deployed three main computational wizards:
- Unsupervised Clustering (MNN + KMeans): 🔍 Identifying distinct cell types.
- GLM with Lasso Penalty: 📐 Enhancing feature selection.
- Multi-Task Neural Network: 🧠 A top-notch approach for cell type classification.
Behold the power of our methods, especially the multi-task neural network, in classifying cell types with astronomical accuracy! 🌠
Model | ARI | NMI |
---|---|---|
Unsupervised Clustering MNN + kmeans | 0.5717 | 0.6139 |
GLM with Lasso penalty | 0.9502 | 0.9146 |
Transfer Learning and Multi-task Neural Network | 1.0000 | 1.0000 |
A big shout-out to our cosmic crew 🚀, who contributed to data processing, model development, and analysis. Every part of this project has been crafted with utmost care and precision.
Introduction and data processing was contributed by Xintong
. GLM with Lasso penalty was contributed by Bulun
, MNN + kmeans was contributed by Kexin
, Multi-task Neural Network was contributed by Xintong
. The final report was collaboratively drafted by all team members.
DeepNeuralNetwork/
: 🤖 Jupyter notebooks for transfer-learning and multilevel classification implemented with pytorch framework.MNN/
: 🧼 Unsupervised learning via Mutual Nearest Neighbors, acclerated with RcppSoftMaxLasso/
: 🧼 Generalized Logistic regression model with L1 regularization optimized by cpp-acclerated FISTA algorithmData/
: 📊 Original H5 Datapreprocessed_adata.h5ad
, PCA dimension reductedadata_pca.csv
, and gene expression datax_matrix
. Large files have been prtitioned into several zip files.
Thank you for visiting our project! We welcome your contributions and feedback to make this project even more !