Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Cache doesn't work when running entities by FlyteRemote #5823

Open
2 tasks done
peterkun23 opened this issue Oct 8, 2024 · 2 comments
Open
2 tasks done

[BUG] Cache doesn't work when running entities by FlyteRemote #5823

peterkun23 opened this issue Oct 8, 2024 · 2 comments
Assignees
Labels
bug Something isn't working flytekit FlyteKit Python related issue flyteremote

Comments

@peterkun23
Copy link

Describe the bug

Cache hit never happens when running consecutively the following toy example:

from flytekit import task, HashMethod
from typing_extensions import Annotated
from flytekit.remote import FlyteRemote
from flytekit.configuration import Config, PlatformConfig


class VideoRecord:
    def __init__(self, video_path: str):
        self.video_path = video_path


def hash_video_record(record: VideoRecord) -> str:
    return record.video_path


@task(cache=True, cache_version="1.0")
def bar_1(video_record: Annotated[VideoRecord, HashMethod(hash_video_record)]) -> str:
    print("Running bar_1")
    return video_record.video_path


if __name__ == "__main__":
    video_record = VideoRecord("path/to/video")

    remote = FlyteRemote(
        config=Config(
            platform=PlatformConfig(
                endpoint=endpoint,
                insecure=True,
                insecure_skip_verify=True,
            )
        ),
        default_project=default_project,
        default_domain=default_domain,
    )
    entity = remote.fetch_task(name="toy_example.bar_1", version="1.1")
    remote.execute(
        entity=entity,
        inputs={"video_record": video_record},
        wait=True,
        tags=[],
        overwrite_cache=False,
    )

Expected behavior

I would expect a way to make caching work for this use case (either to implement something in the TypeTransforms -FlytePickleTransformer in this case or something else). I believe the problem is that after calling guess_python_type the HashMethod information is already gone.

Additional context to reproduce

No response

Screenshots

No response

Are you sure this issue hasn't been raised already?

  • Yes

Have you read the Code of Conduct?

  • Yes
@peterkun23 peterkun23 added bug Something isn't working untriaged This issues has not yet been looked at by the Maintainers labels Oct 8, 2024
Copy link

welcome bot commented Oct 8, 2024

Thank you for opening your first issue here! 🛠

@eapolinario eapolinario self-assigned this Oct 17, 2024
@eapolinario eapolinario added flytekit FlyteKit Python related issue flyteremote and removed untriaged This issues has not yet been looked at by the Maintainers labels Oct 17, 2024
@eapolinario
Copy link
Contributor

@peterkun23 , given the Flyte execution model, we need a way to provide a hash of the object prior to its execution. In your example, if you provide the hash to the invocation of remote.execute like:

    remote.execute(
        entity=entity,
        inputs={"video_record": video_record},
        wait=True,
        tags=[],
        overwrite_cache=False,
        type_hints={"video_record": Annotated[VideoRecord, HashMethod(hash_video_record)]},  # Note the type hint
    )

You can remove the Annotated bit from the definition of bar_1:

@task(cache=True, cache_version="1.0")
def bar_1(video_record: VideoRecord) -> str:
    print("Running bar_1")
    return video_record.video_path

This should be enough to enable caching for these remote executions of bar_1.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working flytekit FlyteKit Python related issue flyteremote
Projects
None yet
Development

No branches or pull requests

2 participants