-
Notifications
You must be signed in to change notification settings - Fork 21
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add variable for caching pipeline steps #138
base: main
Are you sure you want to change the base?
Conversation
Signed-off-by: Michael Clifford <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure this will work as intended. The problem is in how caching mechanism works in KFP.
KFP caching mechanism is based on "hashing" input params which determines if the output of current step changes or not. In the cases like kubectl_wait_task
or kubectl_apply_task
the input params are the same no matter the data content (in case of kubectl_apply_task
only the PyTorchJob manifest is considered, in case of kubectl_wait_task
its the PyTorchJob resource name).
Therefore to these steps, their inputs "stay the same", unless there's a PyTorchJob parameter change.
Setting this to true
would make these steps skipped even if the pipeline is pointed to different taxonomy data. The caching would simply complete this step and no PyTorchJob would ever be schedulled.
While debugging steps after training, that is what we want, to simply skip over scheduling the PytorchJobs. |
Ok, then I may have not understood the purpose. So you're not using the "caching" feature as a "do not reevaluate what can't change" but as a "skip everything until I say stop", correct? 🙂 |
Yes. Since we enforce some "no-caching" steps during the pipeline runs (which is usually the correct thing to do), while debugging it can be annoying and error prone to change them all individually. This is more of a quality of life change to just make all those "no-caching" steps into "caching" in one spot. Maybe my reasoning is explained a bit better here: #136 |
Right.. I'm just confused (and I think others may be as well) by naming it "cache" in such cases. Can we change the |
Address #136 until KFP adds feature to allow setting
set_caching_option()
as a pipeline parameter.This PR adds a variable
KFP_PIPELINE_CACHE
used to populate theset_caching_option()
throughout the pipeline. This will make switching between caching and not-caching during debugging much simpler.