-
Notifications
You must be signed in to change notification settings - Fork 910
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Move AbstractDataSet
to Kedro-Plugins
#2409
Comments
Thanks for the feature request! I personally think it's an interesting enabler, because a lot of other tools/workflows lack this nice data abstraction (something I've heard from other MLEs, as well as personally experienced in deploying to and looking at the examples for Azure ML Pipelines, Kubeflow Pipelines, Argo Workflows, etc.). I do recall that there was some discussion around whether
Would it not make sense to just move it all to the existing (relatively new) |
I initially thought of that as well, but |
I think it only requires stuff from |
I believe you're correct, and if the team decides that is the better path then I'm all for it. |
This is the relevant discussion we had. |
Thanks for the link, it seems there was a lot of discussion on the topic which I understand completely, but it does seem like a lot of users were advocating for I've submitted a PR with the changes made, please let me know if there's anything else I can do to help with. I suppose another option could be to create an entirely different repository, something like miniconda vs anaconda. |
I'm very interested in your statement: "It does seem like a lot of users were advocating for AbstractDataSet", from the discussions we had previously we only had 1 or two users commenting, but not such a significant number that it convinced us we should move. Have you noticed other people, outside your company, talking about the need for |
Sorry, that was probably an inaccurate phrasing on my part, and some assumptions. But to answer your question, since I've joined the conversation, I've only seen a couple people advocate for the move. That's also not completely surprising, most Kedro users likely won't care where |
@bpmeek in case you didn't already come across it, in #1776 I originally proposed something very similar, which was decided against (for good reasons I think). In my mind, many of the advantages posted there still hold true and it would make more sense for In practice, what is the actual disadvantage of the current system that you find? As far as I understand, the only real difference for you is that you are required to |
Just to give my two cents (and also because I was involved in the initial conversation):
So my original requests still holds :) To be honest I don't really understand the original argument:
I understand you want to separate "team made" versus "third party" datasets, but it seems that separating the AbstractDataSet from the library where datasets lives has the opposite effect. Whatever happen, if |
Correct me if I'm wrong, but if the primary concern is ensuring backwards compatibility with third-party developed tools, Could we not replace the current contents of
That way no current integrations are broken, and a deprecation warning could even be thrown if the decision is made to completely remove it at a later date. Also, with a great amount of respect to @idanov, while the
So while these classes are core to the Runner, I don't view the Runner as being core to Kedro, and there may come a day that users can decide between multiple runners.
This is the only disadvantage I personally have encountered, but I could easily envision the issues that @Galileo-Galilei mentioned in his earlier comments. |
I am really confused by this conversation, not sure I understand the pain point here. Please could you provide some specific reasoning why this is necessary at all @bpmeek? Introducing dependency on
To me this is the real problem here, because for unknown reasons, the Just as a reminder for the goal of the split:
These are stated and implicit goals. But the one true north star goal that sparked this split is to decouple Actions:
|
@idanov I think I may be approaching this from a different angle, in what I'm suggesting an install of
I think the proposed change wouldn't impact any of these goals, only reverse the direction, i.e. But this is also in part why the original request recommends moving it to its own plugin.
Can you help me understand this a little more? Currently it seems that I do definitely understand wanting to keep the interface part of the framework, and if it's too much of a risk to separate then I understand. |
+1 that Kedro is not that light. https://github.com/kedro-org/kedro/blob/main/dependency/requirements.txt includes a lot of requirements, and they're not even that loosely defined. To provide a concrete (but simple) example, I wanted to abstract data loading from execution logic in something similar to https://github.com/deepyaman/exedra-examples/blob/main/nyc_taxi_data_regression/deploy/kfp/pipeline.py (the automated translation process is less straightforward, but not the focus). As a result, I wanted a |
Crazy idea: stop requiring inheritance from |
Interesting idea, I don't know how does it works in practice but happy to explore. |
Unfortunately this isn't really a possibility without drastic changes because |
There have been some very reasonable points raised in this thread:
On the other hand, I feel that some arguments are not core to the discussion:
The discussion of what direction should the dependency link have is not settled, but I sympathise with @idanov arguments to keep the interface as part of the framework. I don't have articulate arguments for this, but definitely there are precedents of plugin ecosystems depending on the central package: for example, I picked 4 pytest plugins at random and they all depend on Given the lack of consensus, I'm inclined to say that we should close this as "Won't fix", and instead focus our efforts on (1) improving the current design of Kedro datasets (#1936 (comment), #1778) and (2) continue working on making Kedro more modular (#2388). |
@astrojuanlu you bring up some really good points, and it seems like the core Issues I was wanting to address have been addressed. Thank you team for the work and conversation. |
Thank you everyone! I'll also drop in the conversation that prompted this too: https://linen-slack.kedro.org/t/9724969/hey-all-if-i-wanted-to-request-that-a-part-of-kedro-be-moved |
Thank you @bpmeek for raising this, because without the discussion here we wouldn't have found out about kedro-org/kedro-plugins#140 Meanwhile after we drop Python 3.7 we maybe should look into using Protocols in as many places as possible as @astrojuanlu suggested, because I have the suspicion that this might make things a lot more decoupled and give the freedom to users in more similar situations. |
Description
I wouldd like to be able to share implementations of
AbstractDataSet
with non-Kedro users.Context
I have a library with several implementations of
AbstractDataSet
that I use to access proprietary data connectors at my employer, I would like to share this library with coworkers that are not using Kedro for use through the code API without the overhead of a full Kedro install.Possible Implementation
Move the contents of
kedro.io.core.py
to a standalone plugin within theKedro-Plugins
repo and migrate imports away fromcore.py
to the new pluginPossible Alternatives
An alternative solution would be to use
setuptools
to build a second module with only theAbstractDataSet
The text was updated successfully, but these errors were encountered: