Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Can't pickle local object 'get_flyte_fs.<locals>._FlyteFS' (FlyteRemote) #5052

Closed
2 tasks done
alexbeach-bc opened this issue Mar 13, 2024 · 5 comments
Closed
2 tasks done
Labels
bug Something isn't working flytekit FlyteKit Python related issue

Comments

@alexbeach-bc
Copy link

Describe the bug

I am attempting to write pulumi dynamic provider in python, which serializes objects in the provider. The issue is that FlyteRemote cannot be serialized with pickle. The following code reproduces the issue:

import pickle
from flytekit.remote import FlyteRemote
from flytekit.configuration import Config

remote = FlyteRemote(
     config=Config.auto(),
     default_project="flytesnacks", 
     default_domain="development"
)
pickle.dumps(remote)
AttributeError: Can't pickle local object 'get_flyte_fs.<locals>._FlyteFS'

The issue stems the nested class here:
https://github.com/flyteorg/flytekit/blob/d61e79e722875348b1ccd354e1076fcf12600053/flytekit/remote/remote_fs.py#L91

I am using python 3.12.1, flytekit==1.11.0.

Expected behavior

flytekit remote objects are serializable with pickle

Additional context to reproduce

No response

Screenshots

No response

Are you sure this issue hasn't been raised already?

  • Yes

Have you read the Code of Conduct?

  • Yes
@alexbeach-bc alexbeach-bc added bug Something isn't working untriaged This issues has not yet been looked at by the Maintainers labels Mar 13, 2024
Copy link

dosubot bot commented Mar 13, 2024

Hey there, @alexbeach-bc! Great to see you back 😄 Hope you're doing well!

Thank you for providing the detailed information. It seems like you're encountering a serialization issue with FlyteRemote objects when using pickle. This could be due to the fact that pickle has certain limitations when it comes to serializing nested or local classes, which seems to be the case with the _FlyteFS class in the get_flyte_fs method.

One possible workaround could be to implement custom serialization and deserialization methods for your FlyteRemote objects. Here's a generic example of how you might do this:

import pickle

class FlyteRemoteCustom:
    def __init__(self, data):
        self.data = data

    def __getstate__(self):
        # Custom serialization logic
        state = self.__dict__.copy()
        # Convert non-serializable objects to a serializable form
        return state

    def __setstate__(self, state):
        # Custom deserialization logic
        self.__dict__.update(state)
        # Convert data back to original form if needed

# Example usage
obj = FlyteRemoteCustom(data="example")
serialized_obj = pickle.dumps(obj)
deserialized_obj = pickle.loads(serialized_obj)

This example demonstrates how to define custom serialization and deserialization methods within a class by implementing the __getstate__ and __setstate__ methods. Please note that this is a generic example and will need to be adapted to your specific use case.

Another approach could be to use JSON for serialization instead of pickle, if the objects you are trying to serialize are compatible. JSON is generally more interoperable and might not face the same serialization issues as pickle.

Also, ensure that you are using the latest version of flytekit. Although you are currently using version 1.11.0, newer versions might contain fixes or improvements related to serialization issues.

If you believe this is a bug or a limitation within flytekit itself, consider reporting it to the Flyte development team through their GitHub issues page. Providing them with detailed information about your use case, the error messages, and the versions of Python and flytekit you are using can help in diagnosing and potentially resolving the issue in future releases.

I hope this helps! If you have any further questions or need more clarification, feel free to ask.

Sources

About Dosu

This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.

Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

If you want to continue the conversation, start your reply with @dosu-bot.

@pingsutw pingsutw added flytekit FlyteKit Python related issue and removed untriaged This issues has not yet been looked at by the Maintainers labels Mar 14, 2024
@pingsutw
Copy link
Member

@alexbeach-bc just curious, why do you want to pickle the flytekit remote?

@alexbeach-bc
Copy link
Author

alexbeach-bc commented Mar 14, 2024

@alexbeach-bc just curious, why do you want to pickle the flytekit remote?

I am writing a dynamic provider in pulumi to mange deployments of workflows/projects. Pulumi uses pickle to serialize the providers. This results in an error when trying to deploy with pulumi:

    _pickle.PicklingError: Can't pickle <class 'flytekit.remote.remote_fs.get_flyte_fs.<locals>._FlyteFS'>: it's not found as flytekit.remote.remote_fs.get_flyte_fs.<locals>._FlyteFS
"""A Flyte Pulumi program"""
import importlib

import pulumi
from pulumi.dynamic import ResourceProvider, CreateResult
from typing import Optional, Callable, Any
from flytekit.remote import FlyteRemote
from flytekit.configuration import Config, ImageConfig, PlatformConfig, SerializationSettings
from flytekit.core.workflow import WorkflowBase
from pulumi.dynamic import Resource, ResourceProvider, CreateResult, UpdateResult
from pulumi import ComponentResource, export, Input, Output, ResourceOptions

from workflows import hello_world

DEFAULT_PROJECT="flytesnacks"


remote = FlyteRemote(
    config=Config.auto(),
    default_project=DEFAULT_PROJECT, 
    default_domain="development"
)

class FlyteWorkflowArgs(object):
    module: Input[str]
    workflow: Input[str]
    domain: Input[str]
    project: Input[str]
    image: Input[str]
    repo: Input[str]
    version: Input[str]
    description: Optional[Input[str]]
    def __init__(self, module, workflow, domain, project, image, repo, version, description=None):
        self.module = module
        self.workflow = workflow
        self.domain = domain
        self.project = project
        self.repo = repo
        self.image = image
        self.version = version
        self.description = description


class FlyteWorkflowProvider(ResourceProvider):
    def create(self, props):
        mod = importlib.import_module(self.module)
        entity = getattr(mod, self.workflow)
        img = ImageConfig.from_images(
            "{repo}/{image}".format(repo=props.repo, image=props.image)
        )
        wf2 = remote.register_workflow(
            entity,
            serialization_settings=SerializationSettings(image_config=img),
            version=props.version,
        )
        
        return CreateResult(id_=wf2.id, outs=props)
    def update(self, id, _olds, props):
        img = ImageConfig.from_images(
            "{repo}/{image}".format(repo=props.repo, image=props.image)
        )
        wf2 = remote.register_workflow(
            hello_world.hello_world_wf,
            serialization_settings=SerializationSettings(image_config=img),
            version=props.version,
        )
        return UpdateResult(id_=wf2.id, outs=props)
    def delete(self, id, props):
        # Cannot be implemented. Flyte only supports archiving workflows (not deleting). If the remote client can support his then we can implement 
        pass


class FlyteWorkflow(Resource):
    def __init__(self, name: str, props: FlyteWorkflowArgs, opts: Optional[ResourceOptions] = None):
         super().__init__(FlyteWorkflowProvider(), name, {**vars(props)}, opts)


hellow_world_workflow = FlyteWorkflow(
    "hello", 
    FlyteWorkflowArgs(
        module="workflows.hello_world",
        workflow="hello_world_wf",
        repo="docker.io/myrepo",
        image="flyte_workflows:458adfb631aebdce22d663240bf6b722998d567b",
        domain="development",
        project=DEFAULT_PROJECT,
        version="v0.1.0"
    ),
)

@eapolinario
Copy link
Contributor

@alexbeach-bc , first of all, thanks for working on this change, it seems very interesting and it will increase the utility of Flyte!

Speaking about your problem specifically, I don't have any experience with pulumi providers, so can you help me understand where the remote object is being serialized? I'm asking because relying on a serialized version of a flyteremote object is difficult, as it contains a reference to the underlying client used to talk to the backend and also credentials, etc. So I want to understand why the object itself has to be serialized instead of being re-hydrated when needed.

@alexbeach-bc
Copy link
Author

@eapolinario I am not sure actually my approach makes sense for a long term solution anymore. There is a difference between pulumi dynamic providers vs providers, and it i am also not super familiar with pulumi internals, its just that we use pulumi for the majority of our infra. Pulumi providers can be implemented in a multi-language way, where the above dynamic provider would limit use to python only. I am going to close this for now,.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working flytekit FlyteKit Python related issue
Projects
None yet
Development

No branches or pull requests

3 participants