Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix Bug: with_overrides for cache doesn't work #2975

Open
wants to merge 6 commits into
base: master
Choose a base branch
from

Conversation

hebiao064
Copy link

@hebiao064 hebiao064 commented Dec 4, 2024

Tracking issue

Closes flyteorg/flyte#6079

Why are the changes needed?

I tried the with_overrides to override cache configuration for a local flyte @task, but it didn't work as expected.

Code:

@task(
    cache=False,
    cache_version="v1",
    cache_serialize=False,
)
def to_be_overridden_task(a: str) -> str:
    return f"*~*~*~{a}*~*~*~"

def to_be_overridden_task_wrapper(a: str, cache: bool = False, cache_version: str = "", cache_serialize: bool = False) -> str:
    return to_be_overridden_task(a=a).with_overrides(
        cache=cache,
        cache_version=cache_version,
        cache_serialize=cache_serialize
    )

@workflow
def my_overridden_wf(a: str):
    return to_be_overridden_task_wrapper(a=a, cache=True, cache_version="foo", cache_serialize=True)

After pyflyte run, I found the cache is still False and cache_version still ""

What changes were proposed in this pull request?

  1. When cache overrides specified, we also override the flyteentity's metadata
  2. Reorder the serialization order to serialize workflow first hence with_overrides will be called first, otherwise the task will be always registered as original task configuration without overrides.

How was this patch tested?

make test with a new test case

Also tested in our company's fork E2E with many workflows submission.

Setup process

Screenshots

Before:
Screenshot 2024-12-03 at 5 57 29 PM
After:
Screenshot 2024-12-03 at 5 57 22 PM

Check all the applicable boxes

  • I updated the documentation accordingly.
  • All new and existing tests passed.
  • All commits are signed-off.

Related PRs

#2154

Docs link

Copy link

welcome bot commented Dec 4, 2024

Thank you for opening this pull request! 🙌

These tips will help get your PR across the finish line:

  • Most of the repos have a PR template; if not, fill it out to the best of your knowledge.
  • Sign off your commits (Reference: DCO Guide).

Copy link

codecov bot commented Dec 6, 2024

Codecov Report

Attention: Patch coverage is 0% with 10 lines in your changes missing coverage. Please review.

Project coverage is 42.28%. Comparing base (55d4d2d) to head (698d62b).
Report is 6 commits behind head on master.

Files with missing lines Patch % Lines
flytekit/core/node.py 0.00% 8 Missing ⚠️
flytekit/tools/serialize_helpers.py 0.00% 2 Missing ⚠️

❗ There is a different number of reports uploaded between BASE (55d4d2d) and HEAD (698d62b). Click for more details.

HEAD has 11 uploads less than BASE
Flag BASE (55d4d2d) HEAD (698d62b)
18 7
Additional details and impacted files
@@             Coverage Diff             @@
##           master    #2975       +/-   ##
===========================================
- Coverage   79.62%   42.28%   -37.34%     
===========================================
  Files         200      226       +26     
  Lines       20997    22685     +1688     
  Branches     2705     2710        +5     
===========================================
- Hits        16719     9593     -7126     
- Misses       3511    12876     +9365     
+ Partials      767      216      -551     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@@ -194,6 +194,8 @@ def with_overrides(

if name is not None:
self._metadata._name = name
if self.run_entity and hasattr(self.run_entity, "metadata"):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should not override the run_entity.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suppose you have two tasks in a workflow, like below. The cache version of the second task will be overridden as well.

@workflow
def wf():
  t1().with_override(cache_version="v2")
  t1()

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can pass the node info to local_execute, and use the cache version in the node info to retrieve the data from the cache.

if self.metadata.cache and local_config.cache_enabled:

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pingsutw Could I know how to do this?

"You can pass the node info to local_execute`

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the problem is:

In node.py, self._metadata._cacheable won't be affecting the actual Task entity.

That's why we want to modify the Task's entity's metadata.

But I acknowledge the issue that the two tasks example woulld be problematic. I am not sure how local_config can help with it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[BUG] with_overrides for cache doesn't work
2 participants