Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add variable for caching pipeline steps #138

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

MichaelClifford
Copy link
Collaborator

Address #136 until KFP adds feature to allow setting set_caching_option() as a pipeline parameter.

This PR adds a variable KFP_PIPELINE_CACHE used to populate the set_caching_option() throughout the pipeline. This will make switching between caching and not-caching during debugging much simpler.

@MichaelClifford MichaelClifford requested review from tumido and sallyom and removed request for tumido October 30, 2024 16:02
Copy link
Member

@tumido tumido left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure this will work as intended. The problem is in how caching mechanism works in KFP.

KFP caching mechanism is based on "hashing" input params which determines if the output of current step changes or not. In the cases like kubectl_wait_task or kubectl_apply_task the input params are the same no matter the data content (in case of kubectl_apply_task only the PyTorchJob manifest is considered, in case of kubectl_wait_task its the PyTorchJob resource name).

Therefore to these steps, their inputs "stay the same", unless there's a PyTorchJob parameter change.

Setting this to true would make these steps skipped even if the pipeline is pointed to different taxonomy data. The caching would simply complete this step and no PyTorchJob would ever be schedulled.

@MichaelClifford
Copy link
Collaborator Author

Setting this to true would make these steps skipped even if the pipeline is pointed to different taxonomy data. The caching would simply complete this step and no PyTorchJob would ever be schedulled.

While debugging steps after training, that is what we want, to simply skip over scheduling the PytorchJobs.

@tumido
Copy link
Member

tumido commented Oct 31, 2024

Ok, then I may have not understood the purpose. So you're not using the "caching" feature as a "do not reevaluate what can't change" but as a "skip everything until I say stop", correct? 🙂

@MichaelClifford
Copy link
Collaborator Author

Yes. Since we enforce some "no-caching" steps during the pipeline runs (which is usually the correct thing to do), while debugging it can be annoying and error prone to change them all individually. This is more of a quality of life change to just make all those "no-caching" steps into "caching" in one spot. Maybe my reasoning is explained a bit better here: #136

@tumido
Copy link
Member

tumido commented Oct 31, 2024

Right.. I'm just confused (and I think others may be as well) by naming it "cache" in such cases. Can we change the KFP_PIPELINE_CACHE param to, I don't know, KFP_SKIP_STEPS_IF_POSSIBLE or something? Cause as you've described, we're not interested in the caching here. 😄 🤷

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants