Defines a S4 class that is based on SingleCellExperiment
. In addition to the usual gene layer, SingleCellAlleleExperiment
can also store data for immune genes such as HLAs, Immunoglobulins and KIRs at the allele level and at the level of functionally similar groups of immune genes.
SingleCellAlleleExperiment
and its data package scaeData
are available in Bioconductor and can be installed as follows:
if (!requireNamespace("BiocManager", quietly=TRUE))
install.packages("BiocManager")
BiocManager::install("scaeData")
BiocManager::install("SingleCellAlleleExperiment")
Alternatively, they can be installed from GitHub using the devtools package:
if (!require("devtools", quietly = TRUE))
install.packages("devtools")
devtools::install_github("AGImkeller/scaeData", build_vignettes = TRUE)
devtools::install_github("AGImkeller/SingleCellAlleleExperiment", build_vignettes = TRUE)
Immune molecules such as B and T cell receptors, human leukocyte antigens (HLAs) or killer Ig-like receptors (KIRs) are encoded in the genetically most diverse loci of the human genome. Many of these immune genes are hyperpolymorphic, showing high allelic diversity across human populations. In addition, typical immune molecules are polygenic, which means that multiple functionally similar genes encode the same protein subunit.
However, interactive single-cell methods commonly used to analyze immune cells in large patient cohorts do not consider this. This leads to erroneous quantification of important immune mediators and impaired inter-donor comparability.
We have developed a workflow that allows quantification of expression and interactive exploration of donor-specific alleles of different immune genes. The workflow is divided into two software packages and one additional data package:
-
The scIGD software package consist of a Snakemake workflow designed to automate and streamline the genotyping process for immune genes, focusing on key targets such as HLAs and KIRs, and enabling allele-specific quantification from single-cell RNA-sequencing (scRNA-seq) data using donor-specific references. For detailed information of the performed steps and how to utilize this workflow, please refer to its documentation.
-
To harness the full analytical potential of the results, we've developed a dedicated
R
package,SingleCellAlleleExperiment
presented in this repository. This package provides a comprehensive multi-layer data structure, enabling the representation of immune genes at specific levels, including alleles, genes, and groups of functionally similar genes and thus, allows data analysis across these immunologically relevant, different layers of annotation. -
The scaeData is an
R/ExperimentHub
data package providing datasets generated and processed by the scIGD software package which can be used to explore the data and potential downstream analysis workflows using the here presented novelSingleCellAlleleExperiment
data structure. Refer to scaeData for more information regarding the available datasets and source of raw data.
This workflow is designed to support both 10x and BD Rhapsody data, encompassing amplicon/targeted sequencing as well as whole-transcriptome-based data, providing flexibility to users working with different experimental setups.
Figure 1: Overview of the scIGD workflow for unraveling immunogenomic diversity in single-cell data, highlighting the integration of the SingleCellAlleleExperiment package for comprehensive data analysis.
The SingleCellAlleleExperiment (SCAE)
class serves as a comprehensive multi-layer data structure, enabling the representation of immune genes at specific levels, including alleles, genes, and groups of functionally similar genes and thus, allows data analysis across these immunologically relevant, different layers of annotation. The implemented data object is derived from the SingleCellExperiment (SCE) class and follows similar conventions, where rows should represent features (genes, transcripts) and columns should represent cells.
Figure 2: Scheme of SingleCellAlleleExperiment object structure with lookup table.
For the integration of the relevant additional data layers (see Figure 2), the quantification data for alleles, generated by the novel scIGD software package, is aggregated into two additional data layers via an ontology-based design principle using a lookup table during object generation.
For example, the counts of the alleles A*01:01:01:01
and A*02:01:01:01
that are present in the raw input data will be combined into the HLA-A
immune gene layer (see Table 1 below). Next, all counts of immune genes corresponding to HLA-class I
are combined into the HLA-class I
functional class layer. See the structure of the used lookup table below.
Table 1: Scheme of the lookup table used to aggregate allele information into multiple data layers.
Allele | Gene | Function |
---|---|---|
A*01:01:01 | HLA-A | HLA class I |
A*02:01:01 | HLA-A | HLA class I |
... | ... | ... |
DRB1*01:01:01 | HLA-DRB1 | HLA class II |
The resulting SCAE
data object can be used in combination with established single cell analysis packages like scater and scran to perform downstream analysis on immune gene expression, allowing data exploration on functional and allele level. See the vignette for further information and insights on how to perform downstream analysis using exemplary data from the accompanying R/Experimenthub
package scaeData.
You can explore your SingleCellAlleleExperiment object with iSEE
library(iSEE)
app <- iSEE(scae)
app
Figure 2: Exploring the data saved in an SingleCellAlleleExperiment object with iSEE.
To be added..