Terminology change: from datasets > dataset > pipeline; pipelines > pipeline_group > pipeline #30

adlersantos · 2021-05-18T16:40:27Z

Description

The concept of a dataset is starting to become an overloaded term. It could mean the following:

A BigQuery dataset which is a collection of tables. This is the original definition we based the datasets folder from.
A collection of datasets can also be called a dataset. e.g. the Vizgen dataset, which includes the Mouse Brain Map dataset.
The other way applies just as well: a subset of larger dataset/s can also be called a dataset. e.g. the Mouse Brain Map dataset which is part of the Vizgen dataset

Plus, in the future, we can expect pipelines that need to onboard multiple datasets in one go. Such a concept is difficult to align using the current hierarchy.

Proposed

The proposal here is to switch from using the datasets/DATASET/PIPELINE hierarchy into the pipelines/PIPELINE_GROUP/PIPELINE hierarchy.

# CURRENT 
datasets/
    vizgen/                      (dataset)
        mouse_brain_map          (pipeline)
        some_genome_collection   (pipeline)
    covid19/                     (dataset)
        national_cases           (pipeline)
        racial_stats             (pipeline)        


# PROPOSED
pipelines/
    vizgen/                      (pipeline group)
        mouse_brain_map          (pipeline)
        some_genome_collection   (pipeline)
    covid19/                     (pipeline group)
        national_cases           (pipeline)
        racial_stats             (pipeline)

Checklist

I created this issue in accordance with the Code of Conduct.
This issue is appropriately labeled.

The text was updated successfully, but these errors were encountered:

adlersantos · 2021-06-01T21:40:48Z

@shanecglass Hope you can review if this makes sense.

adlersantos added cleanup Cleanup or refactor code revision: readme Improvements or additions to the README labels May 18, 2021

adlersantos changed the title ~~Terminology change: from datasets => pipelines; dataset => pipeline_group~~ Terminology change: from datasets > dataset > pipeline; pipelines > pipeline_group > pipeline May 18, 2021

adlersantos self-assigned this May 26, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Terminology change: from datasets > dataset > pipeline; pipelines > pipeline_group > pipeline #30

Terminology change: from datasets > dataset > pipeline; pipelines > pipeline_group > pipeline #30

adlersantos commented May 18, 2021 •

edited

Loading

adlersantos commented Jun 1, 2021

Terminology change: from datasets > dataset > pipeline; pipelines > pipeline_group > pipeline #30

Terminology change: from datasets > dataset > pipeline; pipelines > pipeline_group > pipeline #30

Comments

adlersantos commented May 18, 2021 • edited Loading

Description

Proposed

Checklist

adlersantos commented Jun 1, 2021

adlersantos commented May 18, 2021 •

edited

Loading