Skip to content

Latest commit

 

History

History
312 lines (225 loc) · 14.5 KB

README.md

File metadata and controls

312 lines (225 loc) · 14.5 KB

Cell-Cell Communication

Predicting cell-cell interactions between source cell types and target cell types.

Repository: openproblems-bio/task_cell_cell_communication

Description

The growing availability of single-cell data has sparked an increased interest in the inference of cell-cell communication (CCC), with an ever-growing number of computational tools developed for this purpose.

Different tools propose distinct preprocessing steps with diverse scoring functions that are challenging to compare and evaluate. Furthermore, each tool typically comes with its own set of prior knowledge. To harmonize these, Dimitrov et al recently developed the LIANA+ framework, which was used as a foundation for this task.

The challenges in evaluating the tools are further exacerbated by the lack of a gold standard to benchmark the performance of CCC methods. In an attempt to address this, Dimitrov et al use alternative data modalities, including the spatial proximity of cell types and downstream cytokine activities, to generate an inferred ground truth. However, these modalities are only approximations of biological reality and come with their own assumptions and limitations. In time, the inclusion of more datasets with known ground truth interactions will become available, from which the limitations and advantages of the different CCC methods will be better understood. This subtask evaluates methods in their ability to predict interactions between spatially-adjacent source cell types and target cell types. This subtask focuses on the prediction of interactions from steady-state, or single-context, single-cell data.

Authors & contributors

name roles
Daniel Dimitrov maintainer, author
Scott Gigante contributor
Robrecht Cannoodt contributor
Vishnuvasan Raghuraman contributor

API

flowchart LR
  file_common_spatial("Raw spatial dataset")
  comp_dataset_processor[/"Dataset Processor"/]
  file_dataset("Dataset")
  file_solution("Solution")
  comp_control_method[/"Control Method"/]
  comp_method[/"Method"/]
  comp_metric[/"Metric"/]
  file_prediction("Prediction")
  file_score("Score")
  file_common_spatial---comp_dataset_processor
  comp_dataset_processor-->file_dataset
  comp_dataset_processor-->file_solution
  file_dataset---comp_control_method
  file_dataset---comp_method
  file_solution---comp_control_method
  file_solution---comp_metric
  comp_control_method-->file_prediction
  comp_method-->file_prediction
  comp_metric-->file_score
  file_prediction---comp_metric
Loading

File format: Raw spatial dataset

An unprocessed dataset as output by a dataset loader.

Example file: resources_test/common/singlecell_broadinstitute_scp2167_human_brain/dataset.h5ad

Description:

This dataset contains raw counts and metadata as output by a dataset loader.

The format of this file is derived from the CELLxGENE schema v4.0.0.

Format:

AnnData object
 obs: 'dataset_id', 'assay', 'assay_ontology_term_id', 'cell_type', 'cell_type_ontology_term_id', 'development_stage', 'development_stage_ontology_term_id', 'disease', 'disease_ontology_term_id', 'donor_id', 'is_primary_data', 'organism', 'organism_ontology_term_id', 'self_reported_ethnicity', 'self_reported_ethnicity_ontology_term_id', 'sex', 'sex_ontology_term_id', 'suspension_type', 'tissue', 'tissue_ontology_term_id', 'tissue_general', 'tissue_general_ontology_term_id', 'batch'
 var: 'feature_id', 'feature_symbol'
 obsm: 'spatial'
 layers: 'counts'
 uns: 'dataset_id', 'dataset_name', 'dataset_url', 'dataset_reference', 'dataset_summary', 'dataset_description', 'dataset_organism'

Data structure:

Slot Type Description
obs["dataset_id"] string (Optional) Identifier for the dataset from which the cell data is derived, useful for tracking and referencing purposes.
obs["assay"] string (Optional) Type of assay used to generate the cell data, indicating the methodology or technique employed.
obs["assay_ontology_term_id"] string (Optional) Experimental Factor Ontology (EFO:) term identifier for the assay, providing a standardized reference to the assay type.
obs["cell_type"] string (Optional) Classification of the cell type based on its characteristics and function within the tissue or organism.
obs["cell_type_ontology_term_id"] string (Optional) Cell Ontology (CL:) term identifier for the cell type, offering a standardized reference to the specific cell classification.
obs["development_stage"] string (Optional) Stage of development of the organism or tissue from which the cell is derived, indicating its maturity or developmental phase.
obs["development_stage_ontology_term_id"] string (Optional) Ontology term identifier for the developmental stage, providing a standardized reference to the organism’s developmental phase. If the organism is human (organism_ontology_term_id == 'NCBITaxon:9606'), then the Human Developmental Stages (HsapDv:) ontology is used. If the organism is mouse (organism_ontology_term_id == 'NCBITaxon:10090'), then the Mouse Developmental Stages (MmusDv:) ontology is used. Otherwise, the Uberon (UBERON:) ontology is used.
obs["disease"] string (Optional) Information on any disease or pathological condition associated with the cell or donor.
obs["disease_ontology_term_id"] string (Optional) Ontology term identifier for the disease, enabling standardized disease classification and referencing. Must be a term from the Mondo Disease Ontology (MONDO:) ontology term, or PATO:0000461 from the Phenotype And Trait Ontology (PATO:).
obs["donor_id"] string (Optional) Identifier for the donor from whom the cell sample is obtained.
obs["is_primary_data"] boolean (Optional) Indicates whether the data is primary (directly obtained from experiments) or has been computationally derived from other primary data.
obs["organism"] string (Optional) Organism from which the cell sample is obtained.
obs["organism_ontology_term_id"] string (Optional) Ontology term identifier for the organism, providing a standardized reference for the organism. Must be a term from the NCBI Taxonomy Ontology (NCBITaxon:) which is a child of NCBITaxon:33208.
obs["self_reported_ethnicity"] string (Optional) Ethnicity of the donor as self-reported, relevant for studies considering genetic diversity and population-specific traits.
obs["self_reported_ethnicity_ontology_term_id"] string (Optional) Ontology term identifier for the self-reported ethnicity, providing a standardized reference for ethnic classifications. If the organism is human (organism_ontology_term_id == 'NCBITaxon:9606'), then the Human Ancestry Ontology (HANCESTRO:) is used.
obs["sex"] string (Optional) Biological sex of the donor or source organism, crucial for studies involving sex-specific traits or conditions.
obs["sex_ontology_term_id"] string (Optional) Ontology term identifier for the biological sex, ensuring standardized classification of sex. Only PATO:0000383, PATO:0000384 and PATO:0001340 are allowed.
obs["suspension_type"] string (Optional) Type of suspension or medium in which the cells were stored or processed, important for understanding cell handling and conditions.
obs["tissue"] string (Optional) Specific tissue from which the cells were derived, key for context and specificity in cell studies.
obs["tissue_ontology_term_id"] string (Optional) Ontology term identifier for the tissue, providing a standardized reference for the tissue type. For organoid or tissue samples, the Uber-anatomy ontology (UBERON:) is used. The term ids must be a child term of UBERON:0001062 (anatomical entity). For cell cultures, the Cell Ontology (CL:) is used. The term ids cannot be CL:0000255, CL:0000257 or CL:0000548.
obs["tissue_general"] string (Optional) General category or classification of the tissue, useful for broader grouping and comparison of cell data.
obs["tissue_general_ontology_term_id"] string (Optional) Ontology term identifier for the general tissue category, aiding in standardizing and grouping tissue types. For organoid or tissue samples, the Uber-anatomy ontology (UBERON:) is used. The term ids must be a child term of UBERON:0001062 (anatomical entity). For cell cultures, the Cell Ontology (CL:) is used. The term ids cannot be CL:0000255, CL:0000257 or CL:0000548.
obs["batch"] string (Optional) A batch identifier. This label is very context-dependent and may be a combination of the tissue, assay, donor, etc.
var["feature_id"] string (Optional) Ensemble gene ID.
var["feature_symbol"] string Gene symbol.
obsm["spatial"] double Spatial coordinates.
layers["counts"] integer Raw counts.
uns["dataset_id"] string A unique identifier for the dataset.
uns["dataset_name"] string Nicely formatted name.
uns["dataset_url"] string (Optional) Link to the original source of the dataset.
uns["dataset_reference"] string (Optional) Bibtex reference of the paper in which the dataset was published.
uns["dataset_summary"] string Short description of the dataset.
uns["dataset_description"] string Long description of the dataset.
uns["dataset_organism"] string (Optional) The organism of the sample in the dataset.

Component type: Dataset Processor

A dataset processor for the sc-CCC task.

Arguments:

Name Type Description
--input file An unprocessed dataset as output by a dataset loader.
--output_dataset file (Output) A dataset for the sc-CCC task.
--output_solution file (Output) A dataset with ground-truth annotations for the sc-CCC task.

File format: Dataset

A dataset for the sc-CCC task.

Example file: resources_test/task_cell_cell_communication/singlecell_broadinstitute_scp2167_human_brain/dataset.h5ad

Format:

AnnData object
 obs: 'cell_type'
 var: 'feature_id', 'feature_symbol'
 layers: 'counts'
 uns: 'dataset_id'

Data structure:

Slot Type Description
obs["cell_type"] string Cell type annotation.
var["feature_id"] string (Optional) Ensemble gene ID.
var["feature_symbol"] string Gene symbol.
layers["counts"] integer Raw counts.
uns["dataset_id"] string A unique identifier for the dataset.

File format: Solution

A dataset with ground-truth annotations for the sc-CCC task.

Example file: resources_test/task_cell_cell_communication/singlecell_broadinstitute_scp2167_human_brain/solution.h5ad

Format:

AnnData object
 uns: 'dataset_id', 'dataset_name', 'dataset_url', 'dataset_reference', 'dataset_summary', 'dataset_description', 'dataset_organism', 'assumed_truth'

Data structure:

Slot Type Description
uns["dataset_id"] string A unique identifier for the dataset.
uns["dataset_name"] string Nicely formatted name.
uns["dataset_url"] string (Optional) Link to the original source of the dataset.
uns["dataset_reference"] string (Optional) Bibtex reference of the paper in which the dataset was published.
uns["dataset_summary"] string Short description of the dataset.
uns["dataset_description"] string Long description of the dataset.
uns["dataset_organism"] string (Optional) The organism of the sample in the dataset.
uns["assumed_truth"] object A dataframe with the assumed ground truth. Must have columns ‘source_cell_type’, ‘target_cell_type’, ‘ligand’, ‘receptor’, ‘colocalised’.

Component type: Control Method

A control method for the sc-CCC task.

Arguments:

Name Type Description
--dataset file A dataset for the sc-CCC task.
--solution file (Optional) A dataset with ground-truth annotations for the sc-CCC task.
--prediction file (Output) The prediction file.

Component type: Method

A method for the sc-CCC task.

Arguments:

Name Type Description
--dataset file A dataset for the sc-CCC task.
--prediction file (Output) The prediction file.

Component type: Metric

A metric.

Arguments:

Name Type Description
--solution file A dataset with ground-truth annotations for the sc-CCC task.
--prediction file The prediction file.
--score file (Output) Metric score file.

File format: Prediction

The prediction file

Example file: resources_test/task_cell_cell_communication/singlecell_broadinstitute_scp2167_human_brain/prediction.h5ad

File format: Score

Metric score file

Example file: resources_test/task_cell_cell_communication/singlecell_broadinstitute_scp2167_human_brain/score.h5ad

Format:

AnnData object
 uns: 'dataset_id', 'method_id', 'metric_ids', 'metric_values'

Data structure:

Slot Type Description
uns["dataset_id"] string A unique identifier for the dataset.
uns["method_id"] string A unique identifier for the method.
uns["metric_ids"] string One or more unique metric identifiers.
uns["metric_values"] double The metric values obtained for the given prediction. Must be of same length as ‘metric_ids’.