-
Notifications
You must be signed in to change notification settings - Fork 674
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG] Can't pickle local object 'get_flyte_fs.<locals>._FlyteFS' (FlyteRemote) #5052
Comments
Hey there, @alexbeach-bc! Great to see you back 😄 Hope you're doing well! Thank you for providing the detailed information. It seems like you're encountering a serialization issue with One possible workaround could be to implement custom serialization and deserialization methods for your import pickle
class FlyteRemoteCustom:
def __init__(self, data):
self.data = data
def __getstate__(self):
# Custom serialization logic
state = self.__dict__.copy()
# Convert non-serializable objects to a serializable form
return state
def __setstate__(self, state):
# Custom deserialization logic
self.__dict__.update(state)
# Convert data back to original form if needed
# Example usage
obj = FlyteRemoteCustom(data="example")
serialized_obj = pickle.dumps(obj)
deserialized_obj = pickle.loads(serialized_obj) This example demonstrates how to define custom serialization and deserialization methods within a class by implementing the Another approach could be to use JSON for serialization instead of pickle, if the objects you are trying to serialize are compatible. JSON is generally more interoperable and might not face the same serialization issues as pickle. Also, ensure that you are using the latest version of flytekit. Although you are currently using version 1.11.0, newer versions might contain fixes or improvements related to serialization issues. If you believe this is a bug or a limitation within flytekit itself, consider reporting it to the Flyte development team through their GitHub issues page. Providing them with detailed information about your use case, the error messages, and the versions of Python and flytekit you are using can help in diagnosing and potentially resolving the issue in future releases. I hope this helps! If you have any further questions or need more clarification, feel free to ask. Sources
|
@alexbeach-bc just curious, why do you want to pickle the flytekit remote? |
I am writing a dynamic provider in pulumi to mange deployments of workflows/projects. Pulumi uses pickle to serialize the providers. This results in an error when trying to deploy with pulumi: _pickle.PicklingError: Can't pickle <class 'flytekit.remote.remote_fs.get_flyte_fs.<locals>._FlyteFS'>: it's not found as flytekit.remote.remote_fs.get_flyte_fs.<locals>._FlyteFS """A Flyte Pulumi program"""
import importlib
import pulumi
from pulumi.dynamic import ResourceProvider, CreateResult
from typing import Optional, Callable, Any
from flytekit.remote import FlyteRemote
from flytekit.configuration import Config, ImageConfig, PlatformConfig, SerializationSettings
from flytekit.core.workflow import WorkflowBase
from pulumi.dynamic import Resource, ResourceProvider, CreateResult, UpdateResult
from pulumi import ComponentResource, export, Input, Output, ResourceOptions
from workflows import hello_world
DEFAULT_PROJECT="flytesnacks"
remote = FlyteRemote(
config=Config.auto(),
default_project=DEFAULT_PROJECT,
default_domain="development"
)
class FlyteWorkflowArgs(object):
module: Input[str]
workflow: Input[str]
domain: Input[str]
project: Input[str]
image: Input[str]
repo: Input[str]
version: Input[str]
description: Optional[Input[str]]
def __init__(self, module, workflow, domain, project, image, repo, version, description=None):
self.module = module
self.workflow = workflow
self.domain = domain
self.project = project
self.repo = repo
self.image = image
self.version = version
self.description = description
class FlyteWorkflowProvider(ResourceProvider):
def create(self, props):
mod = importlib.import_module(self.module)
entity = getattr(mod, self.workflow)
img = ImageConfig.from_images(
"{repo}/{image}".format(repo=props.repo, image=props.image)
)
wf2 = remote.register_workflow(
entity,
serialization_settings=SerializationSettings(image_config=img),
version=props.version,
)
return CreateResult(id_=wf2.id, outs=props)
def update(self, id, _olds, props):
img = ImageConfig.from_images(
"{repo}/{image}".format(repo=props.repo, image=props.image)
)
wf2 = remote.register_workflow(
hello_world.hello_world_wf,
serialization_settings=SerializationSettings(image_config=img),
version=props.version,
)
return UpdateResult(id_=wf2.id, outs=props)
def delete(self, id, props):
# Cannot be implemented. Flyte only supports archiving workflows (not deleting). If the remote client can support his then we can implement
pass
class FlyteWorkflow(Resource):
def __init__(self, name: str, props: FlyteWorkflowArgs, opts: Optional[ResourceOptions] = None):
super().__init__(FlyteWorkflowProvider(), name, {**vars(props)}, opts)
hellow_world_workflow = FlyteWorkflow(
"hello",
FlyteWorkflowArgs(
module="workflows.hello_world",
workflow="hello_world_wf",
repo="docker.io/myrepo",
image="flyte_workflows:458adfb631aebdce22d663240bf6b722998d567b",
domain="development",
project=DEFAULT_PROJECT,
version="v0.1.0"
),
) |
@alexbeach-bc , first of all, thanks for working on this change, it seems very interesting and it will increase the utility of Flyte! Speaking about your problem specifically, I don't have any experience with pulumi providers, so can you help me understand where the remote object is being serialized? I'm asking because relying on a serialized version of a flyteremote object is difficult, as it contains a reference to the underlying client used to talk to the backend and also credentials, etc. So I want to understand why the object itself has to be serialized instead of being re-hydrated when needed. |
@eapolinario I am not sure actually my approach makes sense for a long term solution anymore. There is a difference between pulumi dynamic providers vs providers, and it i am also not super familiar with pulumi internals, its just that we use pulumi for the majority of our infra. Pulumi providers can be implemented in a multi-language way, where the above dynamic provider would limit use to python only. I am going to close this for now,. |
Describe the bug
I am attempting to write pulumi dynamic provider in python, which serializes objects in the provider. The issue is that FlyteRemote cannot be serialized with pickle. The following code reproduces the issue:
The issue stems the nested class here:
https://github.com/flyteorg/flytekit/blob/d61e79e722875348b1ccd354e1076fcf12600053/flytekit/remote/remote_fs.py#L91
I am using python 3.12.1, flytekit==1.11.0.
Expected behavior
flytekit remote objects are serializable with pickle
Additional context to reproduce
No response
Screenshots
No response
Are you sure this issue hasn't been raised already?
Have you read the Code of Conduct?
The text was updated successfully, but these errors were encountered: