Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Define ontology #249

Open
rcannood opened this issue May 8, 2023 · 0 comments
Open

Define ontology #249

rcannood opened this issue May 8, 2023 · 0 comments

Comments

@rcannood
Copy link
Member

rcannood commented May 8, 2023

something like this?
@slobentanzer @scottgigante-immunai Regardless of how we decide to resolve this issue, I'm sure we can already many items we can define.

Originally posted by @rcannood in #247 (comment)

For instance:

Common dataset workflow

graph LR
  classDef component fill:#decbe4,stroke:#333,color:#000
  classDef anndata fill:#d9d9d9,stroke:#333,color:#000
  normalization:::group
  dataset_processors:::group
  raw_dataset["Raw dataset"]:::anndata
  common_dataset[Common<br/>dataset]:::anndata
  dataset_loader[/Dataset<br/>loader/]:::component
  subgraph normalization [Normalization methods]
    log_cpm[/"Log CPM"/]:::component
    l1_sqrt[/"L1 sqrt"/]:::component
    log_scran_pooling[/"Log scran<br/>pooling"/]:::component
    sqrt_cpm[/Sqrt CPM/]:::component
  end
  subgraph dataset_processors[Dataset processors]
    pca[/PCA/]:::component
    hvg[/HVG/]:::component
    knn[/KNN/]:::component
  end
  dataset_loader --> raw_dataset --> log_cpm & l1_sqrt & log_scran_pooling & sqrt_cpm --> pca --> hvg --> knn --> common_dataset
Loading

Task-specific benchmarking workflow

graph LR
  classDef component fill:#decbe4,stroke:#333,color:#000
  classDef anndata fill:#d9d9d9,stroke:#333,color:#000
  common_dataset[Common<br/>dataset]:::anndata
  dataset_processor[/Dataset<br/>processor/]:::component
  solution[Ground-truth]:::anndata
  masked_data[Input data]:::anndata
  method[/Method/]:::component
  control_method[/Control<br/>method/]:::component
  output[Prediction]:::anndata
  metric[/Metric/]:::component
  score[Score]:::anndata
  common_dataset --> dataset_processor --> masked_data
  dataset_processor --> solution
  masked_data --> method --> output
  masked_data & solution --> control_method --> output
  solution & output --> metric --> score
Loading

Discussion

However, this workflow might not be applicable for all tasks.

  • Multimodal datasets will have to be processed differently to regular unimodal datasets
  • Some tasks don't really have a ground-truth and instead rely on internal scores. IMO these "benchmarks" should not be a part of OpenProblems, since it doesn't really count as a benchmark.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant