Skip to content

openproblems-bio/test

Repository files navigation

Template

A one sentence summary of purpose and methodology. Used for creating an overview tables.

Repository: rcannood/test

Description

Provide a clear and concise description of your task, detailing the specific problem it aims to solve. Outline the input data types, the expected output, and any assumptions or constraints. Be sure to explain any terminology or concepts that are essential for understanding the task.

Explain the motivation behind your proposed task. Describe the biological or computational problem you aim to address and why it’s important. Discuss the current state of research in this area and any gaps or challenges that your task could help address. This section should convince readers of the significance and relevance of your task.

Authors & contributors

name roles
John Doe author, maintainer

API

flowchart LR
  file_common_dataset("Common Dataset")
  comp_data_processor[/"Data processor"/]
  file_solution("Solution")
  file_test_h5ad("Test data")
  file_train_h5ad("Training data")
  comp_control_method[/"Control Method"/]
  comp_metric[/"Metric"/]
  comp_method[/"Method"/]
  file_prediction("Predicted data")
  file_score("Score")
  file_common_dataset---comp_data_processor
  comp_data_processor-->file_solution
  comp_data_processor-->file_test_h5ad
  comp_data_processor-->file_train_h5ad
  file_solution---comp_control_method
  file_solution---comp_metric
  file_test_h5ad---comp_control_method
  file_test_h5ad---comp_method
  file_train_h5ad---comp_control_method
  file_train_h5ad---comp_method
  comp_control_method-->file_prediction
  comp_metric-->file_score
  comp_method-->file_prediction
  file_prediction---comp_metric
Loading

File format: Common Dataset

A subset of the common dataset.

Example file: resources_test/common/pancreas/dataset.h5ad

Format:

AnnData object
 obs: 'cell_type', 'batch'
 var: 'hvg', 'hvg_score'
 obsm: 'X_pca'
 layers: 'counts', 'normalized'
 uns: 'dataset_id', 'dataset_name', 'dataset_url', 'dataset_reference', 'dataset_summary', 'dataset_description', 'dataset_organism', 'normalization_id'

Data structure:

Slot Type Description
obs["cell_type"] string Cell type information.
obs["batch"] string Batch information.
var["hvg"] boolean Whether or not the feature is considered to be a ‘highly variable gene’.
var["hvg_score"] double A ranking of the features by hvg.
obsm["X_pca"] double The resulting PCA embedding.
layers["counts"] integer Raw counts.
layers["normalized"] double Normalized expression values.
uns["dataset_id"] string A unique identifier for the dataset.
uns["dataset_name"] string Nicely formatted name.
uns["dataset_url"] string (Optional) Link to the original source of the dataset.
uns["dataset_reference"] string (Optional) Bibtex reference of the paper in which the dataset was published.
uns["dataset_summary"] string Short description of the dataset.
uns["dataset_description"] string Long description of the dataset.
uns["dataset_organism"] string (Optional) The organism of the sample in the dataset.
uns["normalization_id"] string Which normalization was used.

Component type: Data processor

A data processor.

Arguments:

Name Type Description
--input file A subset of the common dataset.
--output_train file (Output) The training data in h5ad format.
--output_test file (Output) The subset of molecules used for the test dataset.
--output_solution file (Output) The solution for the test data.

File format: Solution

The solution for the test data

Example file: resources_test/task_template/pancreas/solution.h5ad

Format:

AnnData object
 obs: 'label', 'batch'
 var: 'hvg', 'hvg_score'
 obsm: 'X_pca'
 layers: 'counts', 'normalized'
 uns: 'dataset_id', 'dataset_name', 'dataset_url', 'dataset_reference', 'dataset_summary', 'dataset_description', 'dataset_organism', 'normalization_id'

Data structure:

Slot Type Description
obs["label"] string Ground truth cell type labels.
obs["batch"] string Batch information.
var["hvg"] boolean Whether or not the feature is considered to be a ‘highly variable gene’.
var["hvg_score"] double A ranking of the features by hvg.
obsm["X_pca"] double The resulting PCA embedding.
layers["counts"] integer Raw counts.
layers["normalized"] double Normalized counts.
uns["dataset_id"] string A unique identifier for the dataset.
uns["dataset_name"] string Nicely formatted name.
uns["dataset_url"] string (Optional) Link to the original source of the dataset.
uns["dataset_reference"] string (Optional) Bibtex reference of the paper in which the dataset was published.
uns["dataset_summary"] string Short description of the dataset.
uns["dataset_description"] string Long description of the dataset.
uns["dataset_organism"] string (Optional) The organism of the sample in the dataset.
uns["normalization_id"] string Which normalization was used.

File format: Test data

The subset of molecules used for the test dataset

Example file: resources_test/task_template/pancreas/test.h5ad

Format:

AnnData object
 obs: 'batch'
 var: 'hvg', 'hvg_score'
 obsm: 'X_pca'
 layers: 'counts', 'normalized'
 uns: 'dataset_id', 'normalization_id'

Data structure:

Slot Type Description
obs["batch"] string Batch information.
var["hvg"] boolean Whether or not the feature is considered to be a ‘highly variable gene’.
var["hvg_score"] double A ranking of the features by hvg.
obsm["X_pca"] double The resulting PCA embedding.
layers["counts"] integer Raw counts.
layers["normalized"] double Normalized counts.
uns["dataset_id"] string A unique identifier for the dataset.
uns["normalization_id"] string Which normalization was used.

File format: Training data

The training data in h5ad format

Example file: resources_test/task_template/pancreas/train.h5ad

Format:

AnnData object
 obs: 'label', 'batch'
 var: 'hvg', 'hvg_score'
 obsm: 'X_pca'
 layers: 'counts', 'normalized'
 uns: 'dataset_id', 'normalization_id'

Data structure:

Slot Type Description
obs["label"] string Ground truth cell type labels.
obs["batch"] string Batch information.
var["hvg"] boolean Whether or not the feature is considered to be a ‘highly variable gene’.
var["hvg_score"] double A ranking of the features by hvg.
obsm["X_pca"] double The resulting PCA embedding.
layers["counts"] integer Raw counts.
layers["normalized"] double Normalized counts.
uns["dataset_id"] string A unique identifier for the dataset.
uns["normalization_id"] string Which normalization was used.

Component type: Control Method

Quality control methods for verifying the pipeline.

Arguments:

Name Type Description
--input_train file The training data in h5ad format.
--input_test file The subset of molecules used for the test dataset.
--input_solution file The solution for the test data.
--output file (Output) A predicted dataset as output by a method.

Component type: Metric

A task template metric.

Arguments:

Name Type Description
--input_solution file The solution for the test data.
--input_prediction file A predicted dataset as output by a method.
--output file (Output) File indicating the score of a metric.

Component type: Method

A method.

Arguments:

Name Type Description
--input_train file The training data in h5ad format.
--input_test file The subset of molecules used for the test dataset.
--output file (Output) A predicted dataset as output by a method.

File format: Predicted data

A predicted dataset as output by a method.

Example file: resources_test/task_template/pancreas/prediction.h5ad

Format:

AnnData object
 obs: 'label_pred'
 uns: 'dataset_id', 'normalization_id', 'method_id'

Data structure:

Slot Type Description
obs["label_pred"] string Predicted labels for the test cells.
uns["dataset_id"] string A unique identifier for the dataset.
uns["normalization_id"] string Which normalization was used.
uns["method_id"] string A unique identifier for the method.

File format: Score

File indicating the score of a metric.

Example file: resources/score.h5ad

Format:

AnnData object
 uns: 'dataset_id', 'normalization_id', 'method_id', 'metric_ids', 'metric_values'

Data structure:

Slot Type Description
uns["dataset_id"] string A unique identifier for the dataset.
uns["normalization_id"] string Which normalization was used.
uns["method_id"] string A unique identifier for the method.
uns["metric_ids"] string One or more unique metric identifiers.
uns["metric_values"] double The metric values obtained for the given prediction. Must be of same length as ‘metric_ids’.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published