This repository contains a comprehensive R script for analyzing single-cell hashing data, enabling the separation of cells based on Hashtag Oligonucleotide (HTO) tags. It leverages the power of Seurat, Harmony, and other essential R packages to demultiplex single-cell data, detect doublets, and perform downstream analyses like dimensionality reduction and visualization.
Single Cell Hashing Single-cell hashing is a technique used to label individual cells with unique molecular tags (e.g., DNA-barcoded antibodies or chemical tags) before pooling them for single-cell sequencing. This method allows researchers to multiplex multiple samples or conditions within the same sequencing run. By applying unique barcodes to each sample, single-cell hashing enables the identification of the sample of origin for each individual cell after sequencing. This technique helps reduce experimental costs and minimizes batch effects, making it a valuable tool in high-throughput single-cell analysis.
Single Cell Demultiplexing Single-cell demultiplexing is the computational process used to assign individual cells back to their respective sample or donor of origin after sequencing. By analyzing the barcode sequences (from single-cell hashing) or genotype information, demultiplexing algorithms differentiate and group cells based on their origin, enabling the separation of pooled samples. This process is essential for accurately analyzing and comparing data from different samples or conditions within a single experiment, allowing for more precise biological insights from multiplexed single-cell sequencing.
The data for taken from a publically available literature "Putative regulators for the continuum of erythroid differentiation revealed by single-cell transcriptome of human BM and UCB cells" (https://doi.org/10.1073/pnas.1915085117)
- Quality Control: Filter out cells with zero HTO tag counts.
- Demultiplexing: Classify cells as singlets, doublets, or negatives using HTO tags.
- Normalization: Apply CLR normalization for HTO data and log normalization for gene expression data.
- Dimensionality Reduction: Perform PCA and UMAP for visualizing clusters.
- Batch Correction: Use Harmony for batch effect correction in merged datasets.
- Sample-Specific Analysis: Extract and analyze cells specific to individual HTO tags.
- Visualization: Generate informative plots for exploratory data analysis.
- Seurat (Version 5.1.0)
- ggplot2 (Version 3.5.1)
- tidyverse (Version 2.0.0)
- stringr (Version 1.5.1)
- harmony (Version 1.2.1)