Predicting cell-cell interactions between source cell types and target cell types.
Repository: openproblems-bio/task_cell_cell_communication
The growing availability of single-cell data has sparked an increased interest in the inference of cell-cell communication (CCC), with an ever-growing number of computational tools developed for this purpose.
Different tools propose distinct preprocessing steps with diverse scoring functions that are challenging to compare and evaluate. Furthermore, each tool typically comes with its own set of prior knowledge. To harmonize these, Dimitrov et al recently developed the LIANA+ framework, which was used as a foundation for this task.
The challenges in evaluating the tools are further exacerbated by the lack of a gold standard to benchmark the performance of CCC methods. In an attempt to address this, Dimitrov et al use alternative data modalities, including the spatial proximity of cell types and downstream cytokine activities, to generate an inferred ground truth. However, these modalities are only approximations of biological reality and come with their own assumptions and limitations. In time, the inclusion of more datasets with known ground truth interactions will become available, from which the limitations and advantages of the different CCC methods will be better understood. This subtask evaluates methods in their ability to predict interactions between spatially-adjacent source cell types and target cell types. This subtask focuses on the prediction of interactions from steady-state, or single-context, single-cell data.
name | roles |
---|---|
Daniel Dimitrov | maintainer, author |
Scott Gigante | contributor |
Robrecht Cannoodt | contributor |
Vishnuvasan Raghuraman | contributor |
flowchart LR
file_common_spatial("Raw spatial dataset")
comp_dataset_processor[/"Dataset Processor"/]
file_dataset("Dataset")
file_solution("Solution")
comp_control_method[/"Control Method"/]
comp_method[/"Method"/]
comp_metric[/"Metric"/]
file_prediction("Prediction")
file_score("Score")
file_common_spatial---comp_dataset_processor
comp_dataset_processor-->file_dataset
comp_dataset_processor-->file_solution
file_dataset---comp_control_method
file_dataset---comp_method
file_solution---comp_control_method
file_solution---comp_metric
comp_control_method-->file_prediction
comp_method-->file_prediction
comp_metric-->file_score
file_prediction---comp_metric
An unprocessed dataset as output by a dataset loader.
Example file:
resources_test/common/singlecell_broadinstitute_scp2167_human_brain/dataset.h5ad
Description:
This dataset contains raw counts and metadata as output by a dataset loader.
The format of this file is derived from the CELLxGENE schema v4.0.0.
Format:
AnnData object
obs: 'dataset_id', 'assay', 'assay_ontology_term_id', 'cell_type', 'cell_type_ontology_term_id', 'development_stage', 'development_stage_ontology_term_id', 'disease', 'disease_ontology_term_id', 'donor_id', 'is_primary_data', 'organism', 'organism_ontology_term_id', 'self_reported_ethnicity', 'self_reported_ethnicity_ontology_term_id', 'sex', 'sex_ontology_term_id', 'suspension_type', 'tissue', 'tissue_ontology_term_id', 'tissue_general', 'tissue_general_ontology_term_id', 'batch'
var: 'feature_id', 'feature_symbol'
obsm: 'spatial'
layers: 'counts'
uns: 'dataset_id', 'dataset_name', 'dataset_url', 'dataset_reference', 'dataset_summary', 'dataset_description', 'dataset_organism'
Data structure:
Slot | Type | Description |
---|---|---|
obs["dataset_id"] |
string |
(Optional) Identifier for the dataset from which the cell data is derived, useful for tracking and referencing purposes. |
obs["assay"] |
string |
(Optional) Type of assay used to generate the cell data, indicating the methodology or technique employed. |
obs["assay_ontology_term_id"] |
string |
(Optional) Experimental Factor Ontology (EFO: ) term identifier for the assay, providing a standardized reference to the assay type. |
obs["cell_type"] |
string |
(Optional) Classification of the cell type based on its characteristics and function within the tissue or organism. |
obs["cell_type_ontology_term_id"] |
string |
(Optional) Cell Ontology (CL: ) term identifier for the cell type, offering a standardized reference to the specific cell classification. |
obs["development_stage"] |
string |
(Optional) Stage of development of the organism or tissue from which the cell is derived, indicating its maturity or developmental phase. |
obs["development_stage_ontology_term_id"] |
string |
(Optional) Ontology term identifier for the developmental stage, providing a standardized reference to the organism’s developmental phase. If the organism is human (organism_ontology_term_id == 'NCBITaxon:9606' ), then the Human Developmental Stages (HsapDv: ) ontology is used. If the organism is mouse (organism_ontology_term_id == 'NCBITaxon:10090' ), then the Mouse Developmental Stages (MmusDv: ) ontology is used. Otherwise, the Uberon (UBERON: ) ontology is used. |
obs["disease"] |
string |
(Optional) Information on any disease or pathological condition associated with the cell or donor. |
obs["disease_ontology_term_id"] |
string |
(Optional) Ontology term identifier for the disease, enabling standardized disease classification and referencing. Must be a term from the Mondo Disease Ontology (MONDO: ) ontology term, or PATO:0000461 from the Phenotype And Trait Ontology (PATO: ). |
obs["donor_id"] |
string |
(Optional) Identifier for the donor from whom the cell sample is obtained. |
obs["is_primary_data"] |
boolean |
(Optional) Indicates whether the data is primary (directly obtained from experiments) or has been computationally derived from other primary data. |
obs["organism"] |
string |
(Optional) Organism from which the cell sample is obtained. |
obs["organism_ontology_term_id"] |
string |
(Optional) Ontology term identifier for the organism, providing a standardized reference for the organism. Must be a term from the NCBI Taxonomy Ontology (NCBITaxon: ) which is a child of NCBITaxon:33208 . |
obs["self_reported_ethnicity"] |
string |
(Optional) Ethnicity of the donor as self-reported, relevant for studies considering genetic diversity and population-specific traits. |
obs["self_reported_ethnicity_ontology_term_id"] |
string |
(Optional) Ontology term identifier for the self-reported ethnicity, providing a standardized reference for ethnic classifications. If the organism is human (organism_ontology_term_id == 'NCBITaxon:9606' ), then the Human Ancestry Ontology (HANCESTRO: ) is used. |
obs["sex"] |
string |
(Optional) Biological sex of the donor or source organism, crucial for studies involving sex-specific traits or conditions. |
obs["sex_ontology_term_id"] |
string |
(Optional) Ontology term identifier for the biological sex, ensuring standardized classification of sex. Only PATO:0000383 , PATO:0000384 and PATO:0001340 are allowed. |
obs["suspension_type"] |
string |
(Optional) Type of suspension or medium in which the cells were stored or processed, important for understanding cell handling and conditions. |
obs["tissue"] |
string |
(Optional) Specific tissue from which the cells were derived, key for context and specificity in cell studies. |
obs["tissue_ontology_term_id"] |
string |
(Optional) Ontology term identifier for the tissue, providing a standardized reference for the tissue type. For organoid or tissue samples, the Uber-anatomy ontology (UBERON: ) is used. The term ids must be a child term of UBERON:0001062 (anatomical entity). For cell cultures, the Cell Ontology (CL: ) is used. The term ids cannot be CL:0000255 , CL:0000257 or CL:0000548 . |
obs["tissue_general"] |
string |
(Optional) General category or classification of the tissue, useful for broader grouping and comparison of cell data. |
obs["tissue_general_ontology_term_id"] |
string |
(Optional) Ontology term identifier for the general tissue category, aiding in standardizing and grouping tissue types. For organoid or tissue samples, the Uber-anatomy ontology (UBERON: ) is used. The term ids must be a child term of UBERON:0001062 (anatomical entity). For cell cultures, the Cell Ontology (CL: ) is used. The term ids cannot be CL:0000255 , CL:0000257 or CL:0000548 . |
obs["batch"] |
string |
(Optional) A batch identifier. This label is very context-dependent and may be a combination of the tissue, assay, donor, etc. |
var["feature_id"] |
string |
(Optional) Ensemble gene ID. |
var["feature_symbol"] |
string |
Gene symbol. |
obsm["spatial"] |
double |
Spatial coordinates. |
layers["counts"] |
integer |
Raw counts. |
uns["dataset_id"] |
string |
A unique identifier for the dataset. |
uns["dataset_name"] |
string |
Nicely formatted name. |
uns["dataset_url"] |
string |
(Optional) Link to the original source of the dataset. |
uns["dataset_reference"] |
string |
(Optional) Bibtex reference of the paper in which the dataset was published. |
uns["dataset_summary"] |
string |
Short description of the dataset. |
uns["dataset_description"] |
string |
Long description of the dataset. |
uns["dataset_organism"] |
string |
(Optional) The organism of the sample in the dataset. |
A dataset processor for the sc-CCC task.
Arguments:
Name | Type | Description |
---|---|---|
--input |
file |
An unprocessed dataset as output by a dataset loader. |
--output_dataset |
file |
(Output) A dataset for the sc-CCC task. |
--output_solution |
file |
(Output) A dataset with ground-truth annotations for the sc-CCC task. |
A dataset for the sc-CCC task.
Example file:
resources_test/task_cell_cell_communication/singlecell_broadinstitute_scp2167_human_brain/dataset.h5ad
Format:
AnnData object
obs: 'cell_type'
var: 'feature_id', 'feature_symbol'
layers: 'counts'
uns: 'dataset_id'
Data structure:
Slot | Type | Description |
---|---|---|
obs["cell_type"] |
string |
Cell type annotation. |
var["feature_id"] |
string |
(Optional) Ensemble gene ID. |
var["feature_symbol"] |
string |
Gene symbol. |
layers["counts"] |
integer |
Raw counts. |
uns["dataset_id"] |
string |
A unique identifier for the dataset. |
A dataset with ground-truth annotations for the sc-CCC task.
Example file:
resources_test/task_cell_cell_communication/singlecell_broadinstitute_scp2167_human_brain/solution.h5ad
Format:
AnnData object
uns: 'dataset_id', 'dataset_name', 'dataset_url', 'dataset_reference', 'dataset_summary', 'dataset_description', 'dataset_organism', 'assumed_truth'
Data structure:
Slot | Type | Description |
---|---|---|
uns["dataset_id"] |
string |
A unique identifier for the dataset. |
uns["dataset_name"] |
string |
Nicely formatted name. |
uns["dataset_url"] |
string |
(Optional) Link to the original source of the dataset. |
uns["dataset_reference"] |
string |
(Optional) Bibtex reference of the paper in which the dataset was published. |
uns["dataset_summary"] |
string |
Short description of the dataset. |
uns["dataset_description"] |
string |
Long description of the dataset. |
uns["dataset_organism"] |
string |
(Optional) The organism of the sample in the dataset. |
uns["assumed_truth"] |
object |
A dataframe with the assumed ground truth. Must have columns ‘source_cell_type’, ‘target_cell_type’, ‘ligand’, ‘receptor’, ‘colocalised’. |
A control method for the sc-CCC task.
Arguments:
Name | Type | Description |
---|---|---|
--dataset |
file |
A dataset for the sc-CCC task. |
--solution |
file |
(Optional) A dataset with ground-truth annotations for the sc-CCC task. |
--prediction |
file |
(Output) The prediction file. |
A method for the sc-CCC task.
Arguments:
Name | Type | Description |
---|---|---|
--dataset |
file |
A dataset for the sc-CCC task. |
--prediction |
file |
(Output) The prediction file. |
A metric.
Arguments:
Name | Type | Description |
---|---|---|
--solution |
file |
A dataset with ground-truth annotations for the sc-CCC task. |
--prediction |
file |
The prediction file. |
--score |
file |
(Output) Metric score file. |
The prediction file
Example file:
resources_test/task_cell_cell_communication/singlecell_broadinstitute_scp2167_human_brain/prediction.h5ad
Metric score file
Example file:
resources_test/task_cell_cell_communication/singlecell_broadinstitute_scp2167_human_brain/score.h5ad
Format:
AnnData object
uns: 'dataset_id', 'method_id', 'metric_ids', 'metric_values'
Data structure:
Slot | Type | Description |
---|---|---|
uns["dataset_id"] |
string |
A unique identifier for the dataset. |
uns["method_id"] |
string |
A unique identifier for the method. |
uns["metric_ids"] |
string |
One or more unique metric identifiers. |
uns["metric_values"] |
double |
The metric values obtained for the given prediction. Must be of same length as ‘metric_ids’. |