Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow new attributes to be added to DataSets #400

Closed
WaylonWalker opened this issue Jun 2, 2020 · 11 comments
Closed

Allow new attributes to be added to DataSets #400

WaylonWalker opened this issue Jun 2, 2020 · 11 comments
Labels
Community Issue/PR opened by the open-source community Component: IO Issue/PR addresses data loading/saving/versioning and validation, the DataCatalog and DataSets Issue: Feature Request New feature or improvement to existing feature pinned Issue shouldn't be closed by stale bot

Comments

@WaylonWalker
Copy link
Contributor

WaylonWalker commented Jun 2, 2020

Description

I have certain attributes to track within my datasets and have created custom DataSets to get around this issue. Now that hooks are out most of my reasons for custom DataSets are gone, and I can achieve the same thing with an after_node_run hook, but I still cannot attach custom attributes to datasets.

Use Case 1 (can I share this dataset)

I would like to attach things like confidentiality to the dataset so that team members can easily know who they can share a dataset with by looking at an attribute on the dataset. Ideally, I would like to add these to the catalog.

Use Case 2 (can I delete this sub_pipeliene)

I would also like to be able to check the pipeline health in CI, one thing that I would like to look for is dangling edges that are useless. Sometimes during refactoring we switch to a new section of the pipeline, the old one gets disconnected, never removed, and now we wonder if anyone is using that output. It would have been nice to have CI tell us that we need to mark that dataset as a final output or remove the section of pipeline.

Possible Implementation

cars:
  type: pandas.CSVDataSet
  filepath: data/01_raw/company/cars.csv
  attributes: # 👈 this is the proposed feature, not currently in the framework
    is_output: true
    confidentiality: public

The AbstractDataset's would need to accept the attributes keyword, then attach the attributes to each instance.

@WaylonWalker WaylonWalker added the Issue: Feature Request New feature or improvement to existing feature label Jun 2, 2020
@mzjp2
Copy link
Contributor

mzjp2 commented Jun 2, 2020

I've logged this for us to discuss. Thanks!

I vaguely remember seeing a similar feature request around. If anybody remembers it, can you please link it? :)

@WaylonWalker
Copy link
Contributor Author

Thanks @mzjp2, glad to hear it's in the discussion. I am definitely open to any better suggestions. Right now my solution is custom datasets for everything, and Its a bit overkill for this one application.

@lorenabalan
Copy link
Contributor

Potentially related to #163 ?

@WaylonWalker
Copy link
Contributor Author

WaylonWalker commented Jun 3, 2020

I think the use cases are different. I think in my use case I want to be able to add attributes to datasets that become part of the dataset in a way that plugins can interact with them. The attribute is specific to the dataset, not generic to the whole project.

I can definitely see use cases combining the TemplatedConfigLoader along with custom attributes in really interesting ways. But the TemplatedConfigLoader alone does not achieve what I am looking for.

If I were to describe the most general use case. I want to be able to create a plugin that does something, and I want to be able to have attributes on the dataset that might tell the dataset something as simple as to skip.

After thinking about the generic use case a similar argument could be useful on the node object, where we can also pass extra attributes to be attached to the node that plugins can see.

@mzjp2 mzjp2 changed the title Allow new attributes to be added to DataSets [KED-1739] Allow new attributes to be added to DataSets Jun 4, 2020
@tdrobbin
Copy link

tdrobbin commented Jul 1, 2020

@mzjp2 Similar to #324

@stale
Copy link

stale bot commented Apr 12, 2021

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the stale label Apr 12, 2021
@WaylonWalker
Copy link
Contributor Author

Still dreaming of being able to add additional attributes to datasets so that I can access them in hooks. Is this something the kedro team is interested in allowing?

@stale stale bot removed the stale label Apr 14, 2021
@merelcht
Copy link
Member

Hi @WaylonWalker! This is something we'd like to solve and we have an internal issue to address this workflow. However, it's not a top priority at the moment so I can't give any estimate about when it would be finished.

@stale
Copy link

stale bot commented Jun 25, 2021

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the stale label Jun 25, 2021
@stale stale bot closed this as completed Jul 2, 2021
@yetudada yetudada added pinned Issue shouldn't be closed by stale bot and removed stale labels Jul 2, 2021
@yetudada yetudada reopened this Jul 2, 2021
@merelcht merelcht changed the title [KED-1739] Allow new attributes to be added to DataSets Allow new attributes to be added to DataSets Mar 7, 2022
@merelcht merelcht added the Community Issue/PR opened by the open-source community label Mar 7, 2022
@merelcht merelcht added the Component: IO Issue/PR addresses data loading/saving/versioning and validation, the DataCatalog and DataSets label Mar 15, 2022
@merelcht
Copy link
Member

Closing this in favour of: #1076

@merelcht merelcht moved this to Done in Kedro Framework Mar 24, 2022
@yetudada
Copy link
Contributor

This issue was opened forever ago and we've made it possible with #2537. Check out the thread on #1076. Thank you so much for the feedback!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Community Issue/PR opened by the open-source community Component: IO Issue/PR addresses data loading/saving/versioning and validation, the DataCatalog and DataSets Issue: Feature Request New feature or improvement to existing feature pinned Issue shouldn't be closed by stale bot
Projects
None yet
Development

No branches or pull requests

6 participants