Add variable for caching pipeline steps #138

MichaelClifford · 2024-10-30T16:02:03Z

Address #136 until KFP adds feature to allow setting set_caching_option() as a pipeline parameter.

This PR adds a variable KFP_PIPELINE_CACHE used to populate the set_caching_option() throughout the pipeline. This will make switching between caching and not-caching during debugging much simpler.

Signed-off-by: Michael Clifford <[email protected]>

tumido

I am not sure this will work as intended. The problem is in how caching mechanism works in KFP.

KFP caching mechanism is based on "hashing" input params which determines if the output of current step changes or not. In the cases like kubectl_wait_task or kubectl_apply_task the input params are the same no matter the data content (in case of kubectl_apply_task only the PyTorchJob manifest is considered, in case of kubectl_wait_task its the PyTorchJob resource name).

Therefore to these steps, their inputs "stay the same", unless there's a PyTorchJob parameter change.

Setting this to true would make these steps skipped even if the pipeline is pointed to different taxonomy data. The caching would simply complete this step and no PyTorchJob would ever be schedulled.

MichaelClifford · 2024-10-31T14:39:24Z

Setting this to true would make these steps skipped even if the pipeline is pointed to different taxonomy data. The caching would simply complete this step and no PyTorchJob would ever be schedulled.

While debugging steps after training, that is what we want, to simply skip over scheduling the PytorchJobs.

tumido · 2024-10-31T14:47:13Z

Ok, then I may have not understood the purpose. So you're not using the "caching" feature as a "do not reevaluate what can't change" but as a "skip everything until I say stop", correct? 🙂

MichaelClifford · 2024-10-31T15:03:59Z

Yes. Since we enforce some "no-caching" steps during the pipeline runs (which is usually the correct thing to do), while debugging it can be annoying and error prone to change them all individually. This is more of a quality of life change to just make all those "no-caching" steps into "caching" in one spot. Maybe my reasoning is explained a bit better here: #136

tumido · 2024-10-31T15:23:16Z

Right.. I'm just confused (and I think others may be as well) by naming it "cache" in such cases. Can we change the KFP_PIPELINE_CACHE param to, I don't know, KFP_SKIP_STEPS_IF_POSSIBLE or something? Cause as you've described, we're not interested in the caching here. 😄 🤷

add variable for caching pipeline steps

de54391

Signed-off-by: Michael Clifford <[email protected]>

MichaelClifford requested review from tumido and sallyom and removed request for tumido October 30, 2024 16:02

tumido reviewed Oct 31, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add variable for caching pipeline steps #138

Add variable for caching pipeline steps #138

MichaelClifford commented Oct 30, 2024

tumido left a comment

MichaelClifford commented Oct 31, 2024

tumido commented Oct 31, 2024

MichaelClifford commented Oct 31, 2024

tumido commented Oct 31, 2024

Add variable for caching pipeline steps #138

Are you sure you want to change the base?

Add variable for caching pipeline steps #138

Conversation

MichaelClifford commented Oct 30, 2024

tumido left a comment

Choose a reason for hiding this comment

MichaelClifford commented Oct 31, 2024

tumido commented Oct 31, 2024

MichaelClifford commented Oct 31, 2024

tumido commented Oct 31, 2024