Grouping artifacts in the data catalog #4260

namedgraph · 2024-10-28T13:22:28Z

Description

I tried grouping the artifacts by introducing "namespaces" as the first level of config in YAML while moving the actual artifacts to the second level:

a_group_of_artifacts:
  outputs:
    type: ...

  errors:
    type: ...

and was planning to address the artifacts as a_group_of_artifacts:outputs, a_group_of_artifacts:errors etc.

But it turns out that Kedro does not support this?

DatasetError: An exception occurred when parsing config for dataset 'a_group_of_artifacts':
'type' is missing from dataset catalog configuration

Context

Our pipelines mostly augment the initial inputs, which means we end up with a lot of similarly named artifacts (e.g. final_outputs, processed_outputs and other kinds of _outputs) which gets confusing. It feels that there should be a better way to group/namespace the artifacts.

Possible Implementation

Instead of treating the 1st-level YAML blocks as artifacts, why not traverse the levels recursively until a block with type is encountered -- and treating it as artifact while ignoring the other nesting blocks?

Possible Alternatives

Maybe some other solution I don't know about? Not a Kedro expert...

The text was updated successfully, but these errors were encountered:

lrcouto · 2024-10-28T18:42:47Z

Hey @namedgraph, thank you for your feature proposal. Your idea makes sense, but as of now, Kedro does not support grouping artifacts in the manner you describe, and interprets each entry on the catalog as a separate data source with it's own type definition.

For now, you can try to use Kedro dataset factories to reduce the number of similar catalog entries on your project.

namedgraph · 2024-10-29T07:54:52Z

@lrcouto it feels inconsistent that one can nest YAML in parameters and use the parent:child syntax, but not in the catalog 🤷‍♂️

namedgraph added the Issue: Feature Request New feature or improvement to existing feature label Oct 28, 2024

github-actions bot mentioned this issue Nov 1, 2024

Monthly issue metrics report #4280

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Grouping artifacts in the data catalog #4260

Grouping artifacts in the data catalog #4260

namedgraph commented Oct 28, 2024

lrcouto commented Oct 28, 2024

namedgraph commented Oct 29, 2024

Grouping artifacts in the data catalog #4260

Grouping artifacts in the data catalog #4260

Comments

namedgraph commented Oct 28, 2024

Description

Context

Possible Implementation

Possible Alternatives

lrcouto commented Oct 28, 2024

namedgraph commented Oct 29, 2024