-
Notifications
You must be signed in to change notification settings - Fork 912
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Decide on syntax to allow adding new attributes #2439
Comments
The top-level key is a much simpler solution to handle on the Kedro side, because it would only require the parsing of the metadata key and not the unlimited possibilities of keys that could go under it. I suggest we go for the name
On the point of the |
An alternative option to the It's still a vague idea, but something along the lines of extending the @.... syntax output:
type: ....
....
output@plotly:
type: ....
....
output@preview:
type: ....
....
output@summary:
type: ....
.... |
I found an old discussion about this on the So you could have a I think we've established now that there's a legitimate need for a way to add new attributes to datasets, with the dataset preview being one of the use cases. @AntonyMilneQB and @idanov could you comment with what your preferred solution would be? |
So if I summarise correctly, there are four approaches:
I want to ground this conversation in a look at an example. If we apply this approach to the Approach A
Approach B
Approach CI was confused about how the
Approach D
General thoughts
|
tl;dr: I like the
Probably option 1 is best, but we can worry about this later anyway. Basically for the purpose of kedro framework, we don't use anything that's in
|
Looks like I posted at the same type as @yetudada but I basically agree with her here. Both my option 1 and 2 are sub-types of Approach A, which is my preferred one too. I think the code snippets for Approach C and D should look more like this though: Approach CI'm pretty sure you wouldn't repeat
Approach DHere you would still need to provide
|
Thanks @yetudada and @AntonyMilneQB for the comments! Looking at all the examples, my preference still goes to Approach A. I like the idea of A2, described by Antony, but I worry about the clarity of use of it. As a new user coming into a project, you’d have to go and find out which metadata key does what, because some might be |
I also favor the A1 approach since that's the most intuitive one to me. I share some concern as @AntonyMilneQB about namespace conflicting - or do we need to take one step more to protect the namespace in case there are multiple plugins. But I am not overly concern about this, since for plugins we are also not handling this right now (potentially plugins can override each other and cause weird behavior) and no one is complaining. |
Just an additional argument against the @ syntax: it would couple tightly the framework (catalog and ConfigLoader) and the datasets because there will have some extra magic happening while loading the catalog with the ConfigLoader. This is not necessarily bad, but this creates some inconsitencies between the yaml and the python API. While the python API is not the recommended way to create datasets, I find it useful for a couple of use cases:
Regarding the extra file, this may become something useful if plugins were to introduce very long and complex attributes (e.g. the entire schema of a big table) but this will be very easy to add if needed later. For now as @AntonyMilneQB mentions I feel as a kerdo user that it would be much more readable to have these attributes near the dataset they belong so we I can see at a glance. Just for clarity : I'd vote for approach A1 too ;) |
Thanks everyone for sharing your thoughts and preferences! It's very clear that everyone that commented here prefers the option with the top-level |
Description
#1076
Context
or
A top-level key would make it clearer that any entries under it aren't core Kedro, but custom attributes that will be handled by a plugin. However, the top-level key does add extra nesting, and especially for the
layer
attribute this could be annoying for users.metadata
,custom_data
, .... ? The key will be calledmetadata
Possible Implementation
@Galileo-Galilei mentioned:
I'd go for a top level key rather than storing everything low level. This wil make explicit to users that the informations stored here are not provided by kedro (it will save you a lot of time due to questions not related to kedro, or possible "inconsistencies" between your documentation and their catalog due to third party plugins).
For this key, I prefer the name metadata which is more explicit than other propositions and kind of a standard but honestly I'll be fine with other propositions.
Possible Alternatives
(Optional) Describe any alternative solutions or features you've considered.
The text was updated successfully, but these errors were encountered: