Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Core feature] UX improvement: pyflyte run includes imported local modules in the target workflow file #5492

Open
2 tasks done
cosmicBboy opened this issue Jun 20, 2024 · 2 comments
Labels
backlogged For internal use. Reserved for contributor team workflow. enhancement New feature or request flytekit FlyteKit Python related issue

Comments

@cosmicBboy
Copy link
Contributor

Motivation: Why do you think this is important?

Say I have the following flytekit project folder structure:

/my_project
    /src
        __init__.py
        module_a.py
        ... # etc
    tasks.py  # imports modules from src
    workflows.py  # imports modules from task
    setup.py

Currently, for this workflow to run correctly with pyflyte run --remote, I have to pass in the --copy-all flag to make sure that all the source files are present in the container running the workflow.

This isn't a great beginner UX: I would expect a local pyflyte run and remote pyflyte run --remote to "just work".

Goal: What should the final outcome look like, ideally?

Today, pyflyte run --remote workflow.py will fast-register the workflow, meaning that the workflow.py file itself will be shipped as a tar file and injected into the container running in the remote pod.

Ideally, pyflyte run should also automatically detect all of the locally defined modules that are being used in workflow.py and include it in the tar file being fast registered.

Describe alternatives you've considered

Keep the status quo, which is to require users to provide the --copy-all flag. There's also another issue proposing a --copy flag (see here) which proposes a more targeted version of the --copy-all flag.

Propose: Link/Inline OR Additional context

No response

Are you sure this issue hasn't been raised already?

  • Yes

Have you read the Code of Conduct?

  • Yes
@cosmicBboy cosmicBboy added enhancement New feature or request untriaged This issues has not yet been looked at by the Maintainers labels Jun 20, 2024
Copy link

runllm bot commented Jun 20, 2024

Based on the available data sources, here is the issue title and body for the UX improvement:


Issue Title: [Core feature] UX improvement: pyflyte run includes imported local modules in the target workflow file

Issue Body:

Motivation: Why do you think this is important?

Say I have the following Flytekit project folder structure:

/my_project
    /src
        __init__.py
        module_a.py
        ... # etc
    tasks.py  # imports modules from src
    workflows.py  # imports modules from task
    setup.py

Currently, for this workflow to run correctly with pyflyte run --remote, I have to pass in the --copy-all flag to make sure that all the source files are present in the container running the workflow.

This isn't a great beginner UX: I would expect a local pyflyte run and remote pyflyte run --remote to "just work".

Goal: What should the final outcome look like, ideally?

Today, pyflyte run --remote workflow.py will fast-register the workflow, meaning that the workflow.py file itself will be shipped as a tar file and injected into the container running in the remote pod.

Ideally, pyflyte run should also automatically detect all of the locally defined modules that are being used in workflow.py and include them in the tar file being fast registered.

Describe alternatives you've considered

Keep the status quo, which is to require users to provide the --copy-all flag. There's also another issue proposing a --copy flag (see here) which proposes a more targeted version of the --copy-all flag.

Propose: Link/Inline OR Additional context

No response

Are you sure this issue hasn't been raised already?

  • Yes

Have you read the Code of Conduct?

  • Yes

Sources:


This is an AI-generated response and your feedback is appreciated! Please leave a 👍 if this is helpful and 👎 if it is not.

Mention @RunLLM to ask follow-up questions. See runllm.com for more!

@thomasjpfan
Copy link
Member

If there is a setup.py + src directory, then I assume there is a package to be installed with the src layout. There are two ways to make the pyflyte run --remote UX work:

  1. Use ImageSpec and actually install the library
  2. If there is a setup.py assume we want to --copy-all and install the package during execution.
  3. Check the workflow to see what it imports, then automatically copy things that get imported with the correct folder structure. This only works for python only modules.

The solution depends on how task imports module_a. If its import module_a, then we need to install the package or manipulate the folder structure.. If it's import src.module_a, then src is kept as is. (Personally, I do not really like using import src.module_a, but I've see user code like this)

@eapolinario eapolinario added flytekit FlyteKit Python related issue backlogged For internal use. Reserved for contributor team workflow. and removed untriaged This issues has not yet been looked at by the Maintainers labels Jun 27, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backlogged For internal use. Reserved for contributor team workflow. enhancement New feature or request flytekit FlyteKit Python related issue
Projects
None yet
Development

No branches or pull requests

3 participants