Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Agent/skypilot #2407

Draft
wants to merge 20 commits into
base: master
Choose a base branch
from
Draft

Agent/skypilot #2407

wants to merge 20 commits into from

Conversation

novahow
Copy link
Contributor

@novahow novahow commented May 10, 2024

Tracking issue

flyteorg/flyte#3936

Why are the changes needed?

Skypilot agent

What changes were proposed in this pull request?

Please refer to the diagram

image

How was this patch tested?

Setup process

sky_test.py

from flytekit import task, workflow, Secret
from flytekitplugins.skypilot import SkyPilot, SkyPilotFunctionTask
# import sky
from flytekit.configuration import Config, SecretsConfig, SerializationSettings
import flytekit
import textwrap

IMTERNAL_IMAGE = "flytesky/plugins:skypilot"  # "cr.flyte.org/flyteorg/flytekit:py3.10-latest"

@task(
    task_config=SkyPilot(
        cluster_name="t2",
        # prompt_cloud=True,
        resource_config={
            "instance_type": "e2-small",
            "use_spot": True,
        },
        container_run_type=0,
        job_launch_type=0,
        setup=textwrap.dedent(
                """\
                    python -m pip install -e /flytekit
                    pip install numpy==1.26.4
                """
        )
    ),
    container_image=IMTERNAL_IMAGE,
)
def t3(a: int) -> int:
    return (a + 3)

@workflow
def wf(a: int = 3):
    r = t3(a=a)
    r2 = t3(a=r)
    return r2
    

if __name__ == "__main__":
    wf()

To test it locally or in sandbox environment, users need to run python sky_server.py
pyflyte --verbose run --remote sky_test.py wf --a "3"

Screenshots

Check all the applicable boxes

  • I updated the documentation accordingly.
  • All new and existing tests passed.
  • All commits are signed-off.

Related PRs

Docs link

@novahow
Copy link
Contributor Author

novahow commented May 10, 2024

encountered StopIteration bug when using remote s3, still investigating

image

@novahow
Copy link
Contributor Author

novahow commented May 25, 2024

IMTERNAL_IMAGE = "flytesky/plugins:skypilot"
@task(
    task_config=SkyPilot(
        cluster_name="t2",
        # prompt_cloud=True,
        resource_config={
            "instance_type": "e2-small",
            "use_spot": True,
        },
        container_run_type=1,
        job_launch_type=1,
        # stop_after=3
    ),
    container_image=IMTERNAL_IMAGE,
)
def t3(a: int) -> str:
    return str(a + 3)


@task(
    task_config=SkyPilot(
        cluster_name="t4",
        resource_config={
            "ordered": [
                {
                    "cloud": "gcp",
                    "accelerators": "T4:1",
                    "instance_type": "n1-standard-2"
                },
                {
                    "cloud": "gcp",
                    "accelerators": "P4:1"
                }
            ]
        },
        
        container_run_type=0,
        setup="python -m pip install torch",
    ),
    container_image=IMTERNAL_IMAGE
)
def cuda_task() -> str:
    import torch
    return f"cuda on: {torch.cuda.is_available()}"

@workflow
def ml_wf():
    res = cuda_task()
    print(res)
container_run_type: 
- 0: use image as vm image or pull in vm 
- 1: use docker run
 Currently `1` is more recommended as skypilot alters your original image. 
job_launch_type: 
- 0: launch a cluster
- 1: launch a controller cluster to manage your job
FROM localhost:30000/flytekit:latest as base
ARG PYTHON_VERSION

MAINTAINER Flyte Team <[email protected]>
LABEL org.opencontainers.image.source=https://github.com/flyteorg/flytekit

WORKDIR /root
ENV FLYTE_SDK_RICH_TRACEBACKS 0

# Flytekit version of flytekit to be installed in the image
ARG PSEUDO_VERSION
RUN SETUPTOOLS_SCM_PRETEND_VERSION_FOR_FLYTEKIT=$PSEUDO_VERSION pip install --no-cache-dir -U \
        skypilot

USER root
RUN apt-get update && apt-get install sudo socat locales -y
RUN sudo locale-gen en_US.UTF-8
RUN deluser --remove-home flytekit
RUN useradd -u 1000 -m -d /home/flytekit flytekit
USER flytekit
# Note: Pod tasks should be exposed in the default image
# Note: Some packages will create config files under /home by default, so we need to make sure it's writable
# Note: There are use cases that require reading and writing files under /tmp, so we need to change its permissions.

# Run a series of commands to set up the environment:
# 1. Update and install dependencies.
# 2. Install Flytekit and its plugins.
# 3. Clean up the apt cache to reduce image size. Reference: https://gist.github.com/marvell/7c812736565928e602c4
# 4. Create a non-root user 'flytekit' and set appropriate permissions for directories.

FROM base as dev

COPY . /flytekit

RUN SETUPTOOLS_SCM_PRETEND_VERSION_FOR_FLYTEKIT=$PSEUDO_VERSION pip install --no-cache-dir -U \
        -e /flytekit \
        -e /flytekit/plugins/flytekit-skypilot


USER root
ENV PYTHONPATH "/flytekit:/flytekit/plugins/flytekit-k8s-pod:/flytekit/plugins/flytekit-deck-standard:"
# ENV FLYTE_AWS_ENDPOINT "http://localhost:30080/"
# ENV FLYTE_AWS_ACCESS_KEY_ID "minio"                           
# ENV FLYTE_AWS_SECRET_ACCESS_KEY "miniostorage"
RUN echo "flytekit ALL=(ALL) NOPASSWD: ALL" >> /etc/sudoers
# Switch to the 'flytekit' user for better security
SHELL ["/bin/bash", "-c"]
# RUN echo "SHELL=/bin/bash" >> /etc/profile
# RUN rm /bin/sh && ln -s /bin/bash /bin/sh
USER flytekit
# ENTRYPOINT ["/bin/bash"]
# CMD ["/bin/bash"]

@novahow novahow closed this May 25, 2024
@novahow novahow reopened this May 25, 2024
Signed-off-by: novahow <[email protected]>
"License :: OSI Approved :: Apache Software License",
"Programming Language :: Python :: 3.8",
"Programming Language :: Python :: 3.9",
"Programming Language :: Python :: 3.10",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does Sky support Python 3.11 and 3.12?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They said the nightly version of sky supports 3.11. stable version which supports 3.11 will be released soon

Copy link

codecov bot commented May 28, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 86.61%. Comparing base (bf38b8e) to head (303219d).
Report is 142 commits behind head on master.

Current head 303219d differs from pull request most recent head f6f0fae

Please upload reports for the commit f6f0fae to get more accurate results.

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #2407      +/-   ##
==========================================
+ Coverage   83.04%   86.61%   +3.57%     
==========================================
  Files         324        3     -321     
  Lines       24861      142   -24719     
  Branches     3547        0    -3547     
==========================================
- Hits        20645      123   -20522     
+ Misses       3591       19    -3572     
+ Partials      625        0     -625     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@kumare3
Copy link
Contributor

kumare3 commented Jul 20, 2024

Does this work?

novahow added 2 commits July 21, 2024 06:38
Signed-off-by: novahow <[email protected]>
@novahow
Copy link
Contributor Author

novahow commented Jul 20, 2024

Does this work?

Hi, it works when we run in sandbox or locally. Otherwise, we might need to add a new deployment and pvc in flyteagent to deploy the fastapi server

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants