Cell-Cell Communication

Predicting cell-cell interactions between source cell types and target cell types.

Repository: openproblems-bio/task_cell_cell_communication

Description

The growing availability of single-cell data has sparked an increased interest in the inference of cell-cell communication (CCC), with an ever-growing number of computational tools developed for this purpose.

Different tools propose distinct preprocessing steps with diverse scoring functions that are challenging to compare and evaluate. Furthermore, each tool typically comes with its own set of prior knowledge. To harmonize these, Dimitrov et al recently developed the LIANA+ framework, which was used as a foundation for this task.

The challenges in evaluating the tools are further exacerbated by the lack of a gold standard to benchmark the performance of CCC methods. In an attempt to address this, Dimitrov et al use alternative data modalities, including the spatial proximity of cell types and downstream cytokine activities, to generate an inferred ground truth. However, these modalities are only approximations of biological reality and come with their own assumptions and limitations. In time, the inclusion of more datasets with known ground truth interactions will become available, from which the limitations and advantages of the different CCC methods will be better understood. This subtask evaluates methods in their ability to predict interactions between spatially-adjacent source cell types and target cell types. This subtask focuses on the prediction of interactions from steady-state, or single-context, single-cell data.

Authors & contributors

name	roles
Daniel Dimitrov	maintainer, author
Scott Gigante	contributor
Robrecht Cannoodt	contributor
Vishnuvasan Raghuraman	contributor

API

flowchart LR
  file_common_spatial("Raw spatial dataset")
  comp_dataset_processor[/"Dataset Processor"/]
  file_dataset("Dataset")
  file_solution("Solution")
  comp_control_method[/"Control Method"/]
  comp_method[/"Method"/]
  comp_metric[/"Metric"/]
  file_prediction("Prediction")
  file_score("Score")
  file_common_spatial---comp_dataset_processor
  comp_dataset_processor-->file_dataset
  comp_dataset_processor-->file_solution
  file_dataset---comp_control_method
  file_dataset---comp_method
  file_solution---comp_control_method
  file_solution---comp_metric
  comp_control_method-->file_prediction
  comp_method-->file_prediction
  comp_metric-->file_score
  file_prediction---comp_metric

File format: Raw spatial dataset

An unprocessed dataset as output by a dataset loader.

Example file: resources_test/common/singlecell_broadinstitute_scp2167_human_brain/dataset.h5ad

Description:

This dataset contains raw counts and metadata as output by a dataset loader.

The format of this file is derived from the CELLxGENE schema v4.0.0.

Format:

AnnData object
 obs: 'dataset_id', 'assay', 'assay_ontology_term_id', 'cell_type', 'cell_type_ontology_term_id', 'development_stage', 'development_stage_ontology_term_id', 'disease', 'disease_ontology_term_id', 'donor_id', 'is_primary_data', 'organism', 'organism_ontology_term_id', 'self_reported_ethnicity', 'self_reported_ethnicity_ontology_term_id', 'sex', 'sex_ontology_term_id', 'suspension_type', 'tissue', 'tissue_ontology_term_id', 'tissue_general', 'tissue_general_ontology_term_id', 'batch'
 var: 'feature_id', 'feature_symbol'
 obsm: 'spatial'
 layers: 'counts'
 uns: 'dataset_id', 'dataset_name', 'dataset_url', 'dataset_reference', 'dataset_summary', 'dataset_description', 'dataset_organism'

Data structure:

Slot	Type	Description
`obs["dataset_id"]`	`string`	(Optional) Identifier for the dataset from which the cell data is derived, useful for tracking and referencing purposes.
`obs["assay"]`	`string`	(Optional) Type of assay used to generate the cell data, indicating the methodology or technique employed.
`obs["assay_ontology_term_id"]`	`string`	(Optional) Experimental Factor Ontology (`EFO:`) term identifier for the assay, providing a standardized reference to the assay type.
`obs["cell_type"]`	`string`	(Optional) Classification of the cell type based on its characteristics and function within the tissue or organism.
`obs["cell_type_ontology_term_id"]`	`string`	(Optional) Cell Ontology (`CL:`) term identifier for the cell type, offering a standardized reference to the specific cell classification.
`obs["development_stage"]`	`string`	(Optional) Stage of development of the organism or tissue from which the cell is derived, indicating its maturity or developmental phase.
`obs["development_stage_ontology_term_id"]`	`string`	(Optional) Ontology term identifier for the developmental stage, providing a standardized reference to the organism’s developmental phase. If the organism is human (`organism_ontology_term_id == 'NCBITaxon:9606'`), then the Human Developmental Stages (`HsapDv:`) ontology is used. If the organism is mouse (`organism_ontology_term_id == 'NCBITaxon:10090'`), then the Mouse Developmental Stages (`MmusDv:`) ontology is used. Otherwise, the Uberon (`UBERON:`) ontology is used.
`obs["disease"]`	`string`	(Optional) Information on any disease or pathological condition associated with the cell or donor.
`obs["disease_ontology_term_id"]`	`string`	(Optional) Ontology term identifier for the disease, enabling standardized disease classification and referencing. Must be a term from the Mondo Disease Ontology (`MONDO:`) ontology term, or `PATO:0000461` from the Phenotype And Trait Ontology (`PATO:`).
`obs["donor_id"]`	`string`	(Optional) Identifier for the donor from whom the cell sample is obtained.
`obs["is_primary_data"]`	`boolean`	(Optional) Indicates whether the data is primary (directly obtained from experiments) or has been computationally derived from other primary data.
`obs["organism"]`	`string`	(Optional) Organism from which the cell sample is obtained.
`obs["organism_ontology_term_id"]`	`string`	(Optional) Ontology term identifier for the organism, providing a standardized reference for the organism. Must be a term from the NCBI Taxonomy Ontology (`NCBITaxon:`) which is a child of `NCBITaxon:33208`.
`obs["self_reported_ethnicity"]`	`string`	(Optional) Ethnicity of the donor as self-reported, relevant for studies considering genetic diversity and population-specific traits.
`obs["self_reported_ethnicity_ontology_term_id"]`	`string`	(Optional) Ontology term identifier for the self-reported ethnicity, providing a standardized reference for ethnic classifications. If the organism is human (`organism_ontology_term_id == 'NCBITaxon:9606'`), then the Human Ancestry Ontology (`HANCESTRO:`) is used.
`obs["sex"]`	`string`	(Optional) Biological sex of the donor or source organism, crucial for studies involving sex-specific traits or conditions.
`obs["sex_ontology_term_id"]`	`string`	(Optional) Ontology term identifier for the biological sex, ensuring standardized classification of sex. Only `PATO:0000383`, `PATO:0000384` and `PATO:0001340` are allowed.
`obs["suspension_type"]`	`string`	(Optional) Type of suspension or medium in which the cells were stored or processed, important for understanding cell handling and conditions.
`obs["tissue"]`	`string`	(Optional) Specific tissue from which the cells were derived, key for context and specificity in cell studies.
`obs["tissue_ontology_term_id"]`	`string`	(Optional) Ontology term identifier for the tissue, providing a standardized reference for the tissue type. For organoid or tissue samples, the Uber-anatomy ontology (`UBERON:`) is used. The term ids must be a child term of `UBERON:0001062` (anatomical entity). For cell cultures, the Cell Ontology (`CL:`) is used. The term ids cannot be `CL:0000255`, `CL:0000257` or `CL:0000548`.
`obs["tissue_general"]`	`string`	(Optional) General category or classification of the tissue, useful for broader grouping and comparison of cell data.
`obs["tissue_general_ontology_term_id"]`	`string`	(Optional) Ontology term identifier for the general tissue category, aiding in standardizing and grouping tissue types. For organoid or tissue samples, the Uber-anatomy ontology (`UBERON:`) is used. The term ids must be a child term of `UBERON:0001062` (anatomical entity). For cell cultures, the Cell Ontology (`CL:`) is used. The term ids cannot be `CL:0000255`, `CL:0000257` or `CL:0000548`.
`obs["batch"]`	`string`	(Optional) A batch identifier. This label is very context-dependent and may be a combination of the tissue, assay, donor, etc.
`var["feature_id"]`	`string`	(Optional) Ensemble gene ID.
`var["feature_symbol"]`	`string`	Gene symbol.
`obsm["spatial"]`	`double`	Spatial coordinates.
`layers["counts"]`	`integer`	Raw counts.
`uns["dataset_id"]`	`string`	A unique identifier for the dataset.
`uns["dataset_name"]`	`string`	Nicely formatted name.
`uns["dataset_url"]`	`string`	(Optional) Link to the original source of the dataset.
`uns["dataset_reference"]`	`string`	(Optional) Bibtex reference of the paper in which the dataset was published.
`uns["dataset_summary"]`	`string`	Short description of the dataset.
`uns["dataset_description"]`	`string`	Long description of the dataset.
`uns["dataset_organism"]`	`string`	(Optional) The organism of the sample in the dataset.

Component type: Dataset Processor

A dataset processor for the sc-CCC task.

Arguments:

Name	Type	Description
`--input`	`file`	An unprocessed dataset as output by a dataset loader.
`--output_dataset`	`file`	(Output) A dataset for the sc-CCC task.
`--output_solution`	`file`	(Output) A dataset with ground-truth annotations for the sc-CCC task.

File format: Dataset

A dataset for the sc-CCC task.

Example file: resources_test/task_cell_cell_communication/singlecell_broadinstitute_scp2167_human_brain/dataset.h5ad

Format:

AnnData object
 obs: 'cell_type'
 var: 'feature_id', 'feature_symbol'
 layers: 'counts'
 uns: 'dataset_id'

Data structure:

Slot	Type	Description
`obs["cell_type"]`	`string`	Cell type annotation.
`var["feature_id"]`	`string`	(Optional) Ensemble gene ID.
`var["feature_symbol"]`	`string`	Gene symbol.
`layers["counts"]`	`integer`	Raw counts.
`uns["dataset_id"]`	`string`	A unique identifier for the dataset.

File format: Solution

A dataset with ground-truth annotations for the sc-CCC task.

Example file: resources_test/task_cell_cell_communication/singlecell_broadinstitute_scp2167_human_brain/solution.h5ad

Format:

AnnData object
 uns: 'dataset_id', 'dataset_name', 'dataset_url', 'dataset_reference', 'dataset_summary', 'dataset_description', 'dataset_organism', 'assumed_truth'

Data structure:

Slot	Type	Description
`uns["dataset_id"]`	`string`	A unique identifier for the dataset.
`uns["dataset_name"]`	`string`	Nicely formatted name.
`uns["dataset_url"]`	`string`	(Optional) Link to the original source of the dataset.
`uns["dataset_reference"]`	`string`	(Optional) Bibtex reference of the paper in which the dataset was published.
`uns["dataset_summary"]`	`string`	Short description of the dataset.
`uns["dataset_description"]`	`string`	Long description of the dataset.
`uns["dataset_organism"]`	`string`	(Optional) The organism of the sample in the dataset.
`uns["assumed_truth"]`	`object`	A dataframe with the assumed ground truth. Must have columns ‘source_cell_type’, ‘target_cell_type’, ‘ligand’, ‘receptor’, ‘colocalised’.

Component type: Control Method

A control method for the sc-CCC task.

Arguments:

Name	Type	Description
`--dataset`	`file`	A dataset for the sc-CCC task.
`--solution`	`file`	(Optional) A dataset with ground-truth annotations for the sc-CCC task.
`--prediction`	`file`	(Output) The prediction file.

Component type: Method

A method for the sc-CCC task.

Arguments:

Name	Type	Description
`--dataset`	`file`	A dataset for the sc-CCC task.
`--prediction`	`file`	(Output) The prediction file.

Component type: Metric

A metric.

Arguments:

Name	Type	Description
`--solution`	`file`	A dataset with ground-truth annotations for the sc-CCC task.
`--prediction`	`file`	The prediction file.
`--score`	`file`	(Output) Metric score file.

File format: Prediction

The prediction file

Example file: resources_test/task_cell_cell_communication/singlecell_broadinstitute_scp2167_human_brain/prediction.h5ad

File format: Score

Metric score file

Example file: resources_test/task_cell_cell_communication/singlecell_broadinstitute_scp2167_human_brain/score.h5ad

Format:

AnnData object
 uns: 'dataset_id', 'method_id', 'metric_ids', 'metric_values'

Data structure:

Slot	Type	Description
`uns["dataset_id"]`	`string`	A unique identifier for the dataset.
`uns["method_id"]`	`string`	A unique identifier for the method.
`uns["metric_ids"]`	`string`	One or more unique metric identifiers.
`uns["metric_values"]`	`double`	The metric values obtained for the given prediction. Must be of same length as ‘metric_ids’.

Name		Name	Last commit message	Last commit date
Latest commit History 50 Commits
.github		.github
.vscode		.vscode
common @ 79006d5		common @ 79006d5
scripts		scripts
src		src
.gitignore		.gitignore
.gitmodules		.gitmodules
LICENSE		LICENSE
README.md		README.md
_viash.yaml		_viash.yaml
main.nf		main.nf
nextflow.config		nextflow.config
thumbnail.svg		thumbnail.svg

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Cell-Cell Communication

Description

Authors & contributors

API

File format: Raw spatial dataset

Component type: Dataset Processor

File format: Dataset

File format: Solution

Component type: Control Method

Component type: Method

Component type: Metric

File format: Prediction

File format: Score

About

Releases

Packages

Contributors 3

Languages

License

openproblems-bio/task_cell_cell_communication

Folders and files

Latest commit

History

Repository files navigation

Cell-Cell Communication

Description

Authors & contributors

API

File format: Raw spatial dataset

Component type: Dataset Processor

File format: Dataset

File format: Solution

Component type: Control Method

Component type: Method

Component type: Metric

File format: Prediction

File format: Score

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages