bindSC (Bi-order INtegration of multi-omics Data from Single Cell sequencing technologies) is an R package for single cell multi-omic integration analysis, developed and maintained by Ken chen's lab in MDACC. bindSC
is developed to address the challenge of single-cell multi-omic data integration that consists of unpaired cells measured with unmatched features across modalities. Previous methods such as Seurat, Liger, Harmony did not work on this case unless match feature empricallcy. For example, integration of scRNA-seq and scATAC-seq data requires to calculate the gene/promoter activity by counting peaks in gene body, which always loses information. This strategy also did not work on integrating scRNA-seq and cytof data becasue gene pression and protein abundance level is not always correlated due to sparsity of scRNA-seq data or post translational modification.
The core algorithm implemented in bindSC
package is BiCCA
(Bi-order Canonical Correlation Analysis), which utilizes a transition matrix Z (M features by L samples) to bridge observed X (M features by K cells) with Y (N features by L cells). Initialized from prior knowldge, the matrix Z is solved iteratively by maximizing correlation of pair (X, Z) and correlation of pair (Y, Z) simultaneously. Under estimated matrix Z, the cell/feature correspondence across modalities can be obtained by implementing standard CCA
on pair (X, Z) and pair (Y, Z) respectively.
Once multiple datasets are integrated, bindSC
provides functionality for further data exploration, analysis, and visualization. User can:
- Jointly defining cell types from multi-omic datasets
- Identifying comprehensive molecular multi-view of biological processes in cell type level.
Improvements and new features will be added on a regular basis, please contact [email protected] or [email protected] with any question.
- Add the modality specfic weighting factor on the objective fucntion
- Add the weighting factor of initilized gene score matrix on the objective function
- bindSC is able to take low-dimension representaions (for example PCs/LSI) from orignal matrix as input for integration. This will save computational time dramatically for large-scale data.
- Add integraion of scRNA-seq and cytof data demo from CITE-seq technology
- Update parameter optimization module
- Provide joint profiles of gene expression, chromatin accessibility, and TF activity on pseudocell level.
- Release
bindSC
.
The bindSC
package requires only a standard computer with enough RAM to support the in-memory operations. For minimal performance, please make sure that the computer has at least about 10 GB
of RAM. For optimal performance, we recommend a computer with the following specs:
- RAM: 10+ GB
- CPU: 4+ cores, 2.3 GHz/core
Before setting up the bindSC
package, users should have R
version 3.6.0 or higher, and several packages set up from CRAN and other repositories. The user can check the dependencies in DESCRIPTION
.
bindSC
is written in R
and can be installed by following R
commands:
$ R
> install.packages('devtools')
> library(devtools)
> install_github('KChen-lab/bindSC')
Users can also install bindSC
from source code:
$ git clone https://github.com/KChen-lab/bindSC.git
$ R CMD INSTALL bindSC
For usage examples and guided walkthroughs, check the vignettes
directory of the repo.
-
Jointly Defining Cell Types from scRNA-seq and scATAC-seq on A549 dataset
-
Jointly Defining Cell Types from snRNA-seq and snATAC-seq on mouse retina dataset
-
Integrating scRNA-seq and spatial transcriptomics on mouse brain cortex dataset
We also provided comparison of bindSC with available tools including Seurat, LIGER, and Harmony on above 4 benchmarking datasets
This project is covered under the GNU General Public License 3.0.
Preprint: Unbiased integration of single cell multi-omics data