You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have a custom dataset implementation that is supposed to fetch some data into a cache folder that you specify in the dataset configuration. If the data isn't in the folder, it fetches it, then loads from disk. Simple enough.
I want to be able to specify a path relative to the project root e.g. data/01_raw, however I can't figure out how to access the project_root directory at runtime. Things work fine if I run from the cli since it starts in the project root, but if I manipulate the catalog in say kedro jupyter lab and then make a notebook in say notebooks/my_notebook.ipynb, working in that notebook my working directory will be notebooks. Hence if I load my dataset from the catalog it will resolve the cache folder to notebooks/data/01_raw and redownload all of my datasets.
Best I can figure, however, there is not a good way to get the project_root in a dataset implementation as you need to know the project_root folder to instantiate config/context.
Context
This could be worked around by using an absolute path, but I want to be able to redistribute my project to share with other users. Being able to specify a path that is always interpreted as relative to project_root would be helpful.
Possible Implementation
Expose project_root in a way that can be accessed from a dataset implementation. Maybe there is already a way, but I can't seem to work it out.
Possible Alternatives
Workaround is to specify an absolute path but then users need to remember to fix the path when they clone my project.
I suppose in my notebooks I could have something like a %cd context.project_root since I would have access to it, but that seems like not a great solution.
Thanks for your help as always.
The text was updated successfully, but these errors were encountered:
Hi @jasonmhite , thanks for flagging this. We're having a discussion related to this in #2965 but the solution is still not clear. Could you have a look and tell us how it relates to your feature request?
I want to be able to specify a path relative to the project root e.g. data/01_raw, however I can't figure out how to access the project_root directory at runtime.
Description
I have a custom dataset implementation that is supposed to fetch some data into a cache folder that you specify in the dataset configuration. If the data isn't in the folder, it fetches it, then loads from disk. Simple enough.
I want to be able to specify a path relative to the project root e.g.
data/01_raw
, however I can't figure out how to access theproject_root
directory at runtime. Things work fine if I run from the cli since it starts in the project root, but if I manipulate thecatalog
in saykedro jupyter lab
and then make a notebook in saynotebooks/my_notebook.ipynb
, working in that notebook my working directory will benotebooks
. Hence if I load my dataset from the catalog it will resolve the cache folder tonotebooks/data/01_raw
and redownload all of my datasets.Best I can figure, however, there is not a good way to get the
project_root
in a dataset implementation as you need to know theproject_root
folder to instantiate config/context.Context
This could be worked around by using an absolute path, but I want to be able to redistribute my project to share with other users. Being able to specify a path that is always interpreted as relative to project_root would be helpful.
Possible Implementation
Expose
project_root
in a way that can be accessed from a dataset implementation. Maybe there is already a way, but I can't seem to work it out.Possible Alternatives
Workaround is to specify an absolute path but then users need to remember to fix the path when they clone my project.
I suppose in my notebooks I could have something like a
%cd context.project_root
since I would have access to it, but that seems like not a great solution.Thanks for your help as always.
The text was updated successfully, but these errors were encountered: