Skip to content

Python Implementation of SIMLR for single-cell visualization and analysis

License

Notifications You must be signed in to change notification settings

fbao-fudan/SIMLR_PY

 
 

Repository files navigation

SIMLR

This is a python implementation of the paper published in Nature Methods titled as "Visualization and analysis of single-cell RNA-seq data by kernel-based similarity learning".

OVERVIEW

Single-cell RNA-seq technologies enable high throughput gene expression measurement of individual cells, and allow the discovery of heterogeneity within cell populations. Measurement of cell-to-cell gene expression similarity is critical to identification, visualization and analysis of cell populations. However, single-cell data introduce challenges to conventional measures of gene expression similarity because of the high level of noise, outliers and dropouts. We develop a novel similarity-learning framework, SIMLR (Single-cell Interpretation via Multi-kernel LeaRning), which learns an appropriate distance metric from the data for dimension reduction, clustering and visualization. SIMLR is capable of separating known subpopulations more accurately in single-cell data sets than do existing dimension reduction methods. Additionally, SIMLR demonstrates high sensitivity and accuracy on high-throughput peripheral blood mononuclear cells (PBMC) data sets generated by the GemCode single-cell technology from 10x Genomics.

IMPLEMENTATIONS

We provide implementations of SIMLR for large scale single-cell RNA-seq data. With small dataset (e.g, dataset with less than 3,000 cells), we recommend the user to use the matlab package or R package from https://github.com/BatzoglouLabSU/SIMLR. For Large dataset (with more than 3,000 cells), we recommend the user to use the python function called "SIMLR_LARGE".

This large-scale implementation uses approximate version of SIMLR to address the computational issue.

DEMO

We provide two demos for the usage of SIMLR in large scale. In test_largescale.py we run SIMLR on Zeisel dataset with 3005 cells in our paper.

DEBUG

Please feel free to send us emails if you have touble running our SIMLR. The correspondence email is [email protected]

Requirements

  • numpy>=1.8
  • scipy>=0.13.2
  • annoy>=1.8
  • sklearn>=0.17
  • fbpca>=1.0

Installation

python setup.py install

or pip install SIMLR

Tutorial

see tests/test_largescale.py

About

Python Implementation of SIMLR for single-cell visualization and analysis

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 99.7%
  • Makefile 0.3%