A one sentence summary of purpose and methodology. Used for creating an overview tables.
Repository: rcannood/test
Provide a clear and concise description of your task, detailing the specific problem it aims to solve. Outline the input data types, the expected output, and any assumptions or constraints. Be sure to explain any terminology or concepts that are essential for understanding the task.
Explain the motivation behind your proposed task. Describe the biological or computational problem you aim to address and why it’s important. Discuss the current state of research in this area and any gaps or challenges that your task could help address. This section should convince readers of the significance and relevance of your task.
name | roles |
---|---|
John Doe | author, maintainer |
flowchart LR
file_common_dataset("Common Dataset")
comp_data_processor[/"Data processor"/]
file_solution("Solution")
file_test_h5ad("Test data")
file_train_h5ad("Training data")
comp_control_method[/"Control Method"/]
comp_metric[/"Metric"/]
comp_method[/"Method"/]
file_prediction("Predicted data")
file_score("Score")
file_common_dataset---comp_data_processor
comp_data_processor-->file_solution
comp_data_processor-->file_test_h5ad
comp_data_processor-->file_train_h5ad
file_solution---comp_control_method
file_solution---comp_metric
file_test_h5ad---comp_control_method
file_test_h5ad---comp_method
file_train_h5ad---comp_control_method
file_train_h5ad---comp_method
comp_control_method-->file_prediction
comp_metric-->file_score
comp_method-->file_prediction
file_prediction---comp_metric
A subset of the common dataset.
Example file: resources_test/common/pancreas/dataset.h5ad
Format:
AnnData object
obs: 'cell_type', 'batch'
var: 'hvg', 'hvg_score'
obsm: 'X_pca'
layers: 'counts', 'normalized'
uns: 'dataset_id', 'dataset_name', 'dataset_url', 'dataset_reference', 'dataset_summary', 'dataset_description', 'dataset_organism', 'normalization_id'
Data structure:
Slot | Type | Description |
---|---|---|
obs["cell_type"] |
string |
Cell type information. |
obs["batch"] |
string |
Batch information. |
var["hvg"] |
boolean |
Whether or not the feature is considered to be a ‘highly variable gene’. |
var["hvg_score"] |
double |
A ranking of the features by hvg. |
obsm["X_pca"] |
double |
The resulting PCA embedding. |
layers["counts"] |
integer |
Raw counts. |
layers["normalized"] |
double |
Normalized expression values. |
uns["dataset_id"] |
string |
A unique identifier for the dataset. |
uns["dataset_name"] |
string |
Nicely formatted name. |
uns["dataset_url"] |
string |
(Optional) Link to the original source of the dataset. |
uns["dataset_reference"] |
string |
(Optional) Bibtex reference of the paper in which the dataset was published. |
uns["dataset_summary"] |
string |
Short description of the dataset. |
uns["dataset_description"] |
string |
Long description of the dataset. |
uns["dataset_organism"] |
string |
(Optional) The organism of the sample in the dataset. |
uns["normalization_id"] |
string |
Which normalization was used. |
A data processor.
Arguments:
Name | Type | Description |
---|---|---|
--input |
file |
A subset of the common dataset. |
--output_train |
file |
(Output) The training data in h5ad format. |
--output_test |
file |
(Output) The subset of molecules used for the test dataset. |
--output_solution |
file |
(Output) The solution for the test data. |
The solution for the test data
Example file: resources_test/task_template/pancreas/solution.h5ad
Format:
AnnData object
obs: 'label', 'batch'
var: 'hvg', 'hvg_score'
obsm: 'X_pca'
layers: 'counts', 'normalized'
uns: 'dataset_id', 'dataset_name', 'dataset_url', 'dataset_reference', 'dataset_summary', 'dataset_description', 'dataset_organism', 'normalization_id'
Data structure:
Slot | Type | Description |
---|---|---|
obs["label"] |
string |
Ground truth cell type labels. |
obs["batch"] |
string |
Batch information. |
var["hvg"] |
boolean |
Whether or not the feature is considered to be a ‘highly variable gene’. |
var["hvg_score"] |
double |
A ranking of the features by hvg. |
obsm["X_pca"] |
double |
The resulting PCA embedding. |
layers["counts"] |
integer |
Raw counts. |
layers["normalized"] |
double |
Normalized counts. |
uns["dataset_id"] |
string |
A unique identifier for the dataset. |
uns["dataset_name"] |
string |
Nicely formatted name. |
uns["dataset_url"] |
string |
(Optional) Link to the original source of the dataset. |
uns["dataset_reference"] |
string |
(Optional) Bibtex reference of the paper in which the dataset was published. |
uns["dataset_summary"] |
string |
Short description of the dataset. |
uns["dataset_description"] |
string |
Long description of the dataset. |
uns["dataset_organism"] |
string |
(Optional) The organism of the sample in the dataset. |
uns["normalization_id"] |
string |
Which normalization was used. |
The subset of molecules used for the test dataset
Example file: resources_test/task_template/pancreas/test.h5ad
Format:
AnnData object
obs: 'batch'
var: 'hvg', 'hvg_score'
obsm: 'X_pca'
layers: 'counts', 'normalized'
uns: 'dataset_id', 'normalization_id'
Data structure:
Slot | Type | Description |
---|---|---|
obs["batch"] |
string |
Batch information. |
var["hvg"] |
boolean |
Whether or not the feature is considered to be a ‘highly variable gene’. |
var["hvg_score"] |
double |
A ranking of the features by hvg. |
obsm["X_pca"] |
double |
The resulting PCA embedding. |
layers["counts"] |
integer |
Raw counts. |
layers["normalized"] |
double |
Normalized counts. |
uns["dataset_id"] |
string |
A unique identifier for the dataset. |
uns["normalization_id"] |
string |
Which normalization was used. |
The training data in h5ad format
Example file: resources_test/task_template/pancreas/train.h5ad
Format:
AnnData object
obs: 'label', 'batch'
var: 'hvg', 'hvg_score'
obsm: 'X_pca'
layers: 'counts', 'normalized'
uns: 'dataset_id', 'normalization_id'
Data structure:
Slot | Type | Description |
---|---|---|
obs["label"] |
string |
Ground truth cell type labels. |
obs["batch"] |
string |
Batch information. |
var["hvg"] |
boolean |
Whether or not the feature is considered to be a ‘highly variable gene’. |
var["hvg_score"] |
double |
A ranking of the features by hvg. |
obsm["X_pca"] |
double |
The resulting PCA embedding. |
layers["counts"] |
integer |
Raw counts. |
layers["normalized"] |
double |
Normalized counts. |
uns["dataset_id"] |
string |
A unique identifier for the dataset. |
uns["normalization_id"] |
string |
Which normalization was used. |
Quality control methods for verifying the pipeline.
Arguments:
Name | Type | Description |
---|---|---|
--input_train |
file |
The training data in h5ad format. |
--input_test |
file |
The subset of molecules used for the test dataset. |
--input_solution |
file |
The solution for the test data. |
--output |
file |
(Output) A predicted dataset as output by a method. |
A task template metric.
Arguments:
Name | Type | Description |
---|---|---|
--input_solution |
file |
The solution for the test data. |
--input_prediction |
file |
A predicted dataset as output by a method. |
--output |
file |
(Output) File indicating the score of a metric. |
A method.
Arguments:
Name | Type | Description |
---|---|---|
--input_train |
file |
The training data in h5ad format. |
--input_test |
file |
The subset of molecules used for the test dataset. |
--output |
file |
(Output) A predicted dataset as output by a method. |
A predicted dataset as output by a method.
Example file: resources_test/task_template/pancreas/prediction.h5ad
Format:
AnnData object
obs: 'label_pred'
uns: 'dataset_id', 'normalization_id', 'method_id'
Data structure:
Slot | Type | Description |
---|---|---|
obs["label_pred"] |
string |
Predicted labels for the test cells. |
uns["dataset_id"] |
string |
A unique identifier for the dataset. |
uns["normalization_id"] |
string |
Which normalization was used. |
uns["method_id"] |
string |
A unique identifier for the method. |
File indicating the score of a metric.
Example file: resources/score.h5ad
Format:
AnnData object
uns: 'dataset_id', 'normalization_id', 'method_id', 'metric_ids', 'metric_values'
Data structure:
Slot | Type | Description |
---|---|---|
uns["dataset_id"] |
string |
A unique identifier for the dataset. |
uns["normalization_id"] |
string |
Which normalization was used. |
uns["method_id"] |
string |
A unique identifier for the method. |
uns["metric_ids"] |
string |
One or more unique metric identifiers. |
uns["metric_values"] |
double |
The metric values obtained for the given prediction. Must be of same length as ‘metric_ids’. |