Skip to content

Commit

Permalink
sagemaker agent backend setup documentation (#5064)
Browse files Browse the repository at this point in the history
* sagemaker agent backend setup doc

Signed-off-by: Samhita Alla <[email protected]>

* update requirement and remove debugging code

Signed-off-by: Samhita Alla <[email protected]>

* add python-kubernetes

Signed-off-by: Samhita Alla <[email protected]>

* incorporate suggestions by Nikki

Signed-off-by: Samhita Alla <[email protected]>

---------

Signed-off-by: Samhita Alla <[email protected]>
  • Loading branch information
samhita-alla authored Mar 22, 2024
1 parent 5fc57fa commit c13d601
Show file tree
Hide file tree
Showing 11 changed files with 2,955 additions and 2,114 deletions.
24 changes: 18 additions & 6 deletions docs/_ext/import_projects.py
Original file line number Diff line number Diff line change
Expand Up @@ -84,10 +84,11 @@ def parse(self):

def update_sys_path_for_flytekit(import_project_config: ImportProjectsConfig):
# create flytekit/_version.py file manually
with open(f"{import_project_config.flytekit_api_dir}/flytekit/_version.py", "w") as f:
with open(
f"{import_project_config.flytekit_api_dir}/flytekit/_version.py", "w"
) as f:
f.write(f'__version__ = "dev"')


# add flytekit to python path
flytekit_dir = os.path.abspath(import_project_config.flytekit_api_dir)
flytekit_src_dir = os.path.abspath(os.path.join(flytekit_dir, "flytekit"))
Expand Down Expand Up @@ -151,7 +152,7 @@ def import_projects(app: Sphinx, config: Config):
if repo:
tags = sorted(
[t for t in repo.tags if re.match(VERSION_PATTERN, t.name)],
key=lambda t: t.commit.committed_datetime
key=lambda t: t.commit.committed_datetime,
)
if not tags or import_projects_config.dev_build:
# If dev_build is specified or the tags don't exist just use the
Expand Down Expand Up @@ -187,7 +188,9 @@ def import_projects(app: Sphinx, config: Config):
update_sys_path_for_flytekit(import_projects_config)

# add functions to clean up source and docstring refs
for i, (patt, repl) in enumerate(import_projects_config.source_regex_mapping.items()):
for i, (patt, repl) in enumerate(
import_projects_config.source_regex_mapping.items()
):
app.connect(
"source-read",
partial(replace_refs_in_files, patt, repl),
Expand All @@ -200,7 +203,9 @@ def import_projects(app: Sphinx, config: Config):
)


def replace_refs_in_files(patt: str, repl: str, app: Sphinx, docname: str, source: List[str]):
def replace_refs_in_files(
patt: str, repl: str, app: Sphinx, docname: str, source: List[str]
):
text = source[0]

if re.search(patt, text):
Expand All @@ -211,7 +216,14 @@ def replace_refs_in_files(patt: str, repl: str, app: Sphinx, docname: str, sourc


def replace_refs_in_docstrings(
patt: str, repl: str, app: Sphinx, what: str, name: str, obj: str, options: dict, lines: List[str],
patt: str,
repl: str,
app: Sphinx,
what: str,
name: str,
obj: str,
options: dict,
lines: List[str],
):
replace = {}
for i, text in enumerate(lines):
Expand Down
5 changes: 2 additions & 3 deletions docs/core_use_cases/machine_learning.md
Original file line number Diff line number Diff line change
Expand Up @@ -108,9 +108,8 @@ There are many ways to extend your workloads:
[Ray Tune](https://docs.ray.io/en/latest/tune/index.html) for hyperparameter
optimization, all orchestrated by Flyte as ephemerally-provisioned Ray clusters.
* - **📦 Ephemeral Cluster Resources**
- Use the {ref}`MPI Operator <kf-mpi-op>`, {ref}`Sagemaker <aws-sagemaker>`,
{ref}`Kubeflow Tensorflow <kftensorflow-plugin>`, {ref}`Kubeflow Pytorch<kf-pytorch-op>`
and {doc}`more <_tags/DistributedComputing>` to do distributed training.
- Use the {ref}`MPI Operator <kf-mpi-op>`, {ref}`Kubeflow Tensorflow <kftensorflow-plugin>`,
{ref}`Kubeflow Pytorch<kf-pytorch-op>` and {doc}`more <_tags/DistributedComputing>` to do distributed training.
* - **🔎 Experiment Tracking**
- Auto-capture training logs with the {py:func}`~flytekitplugins.mlflow.mlflow_autolog`
decorator, which can be viewed as Flyte Decks with `@task(disable_decks=False)`.
Expand Down
3 changes: 3 additions & 0 deletions docs/deployment/agents/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -27,6 +27,8 @@ If you are using a managed deployment of Flyte, you will need to contact your de
- Configuring your Flyte deployment for the MMCloud agent.
* - {ref}`Sensor Agent <deployment-agent-setup-sensor>`
- Configuring your Flyte deployment for the sensor agent.
* - {ref}`SageMaker Inference <deployment-agent-setup-sagemaker-inference>`
- Deploy models and create, as well as trigger inference endpoints on SageMaker.
```

```{toctree}
Expand All @@ -39,6 +41,7 @@ chatgpt
databricks
bigquery
mmcloud
sagemaker_inference
sensor
snowflake
```
126 changes: 126 additions & 0 deletions docs/deployment/agents/sagemaker_inference.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,126 @@
.. _deployment-agent-setup-sagemaker-inference:

SageMaker Inference Agent
=========================

This guide provides an overview of how to set up the SageMaker inference agent in your Flyte deployment.

Specify agent configuration
---------------------------

.. tabs::

.. group-tab:: Flyte binary

Edit the relevant YAML file to specify the agent.

.. code-block:: bash
kubectl edit configmap flyte-sandbox-config -n flyte
.. code-block:: yaml
:emphasize-lines: 7,11-12,16-17
tasks:
task-plugins:
enabled-plugins:
- container
- sidecar
- k8s-array
- agent-service
default-for-task-types:
- container: container
- container_array: k8s-array
- boto: agent-service
- sagemaker-endpoint: agent-service
plugins:
agent-service:
supportedTaskTypes:
- boto
- sagemaker-endpoint
.. group-tab:: Flyte core

Create a file named ``values-override.yaml`` and add the following configuration to it:

.. code-block:: yaml
:emphasize-lines: 9,14-15,19-20
configmap:
enabled_plugins:
tasks:
task-plugins:
enabled-plugins:
- container
- sidecar
- k8s-array
- agent-service
default-for-task-types:
container: container
sidecar: sidecar
container_array: k8s-array
boto: agent-service
sagemaker-endpoint: agent-service
plugins:
agent-service:
supportedTaskTypes:
- boto
- sagemaker-endpoint
Add the AWS credentials
-----------------------

1. Install the flyteagent pod using helm:

.. code-block::
helm repo add flyteorg https://flyteorg.github.io/flyte
helm install flyteagent flyteorg/flyteagent --namespace flyte
2. Get the base64 value of your AWS credentials:

.. code-block::
echo -n "<AWS_CREDENTIAL>" | base64
3. Edit the flyteagent secret:

.. code-block:: bash
kubectl edit secret flyteagent -n flyte
.. code-block:: yaml
:emphasize-lines: 3-5
apiVersion: v1
data:
aws-access-key: <BASE64_ENCODED_AWS_ACCESS_KEY>
aws-secret-access-key: <BASE64_ENCODED_AWS_SECRET_ACCESS_KEY>
aws-session-token: <BASE64_ENCODED_AWS_SESSION_TOKEN>
kind: Secret
Upgrade the Flyte Helm release
------------------------------

.. tabs::

.. group-tab:: Flyte binary

.. code-block:: bash
helm upgrade <RELEASE_NAME> flyteorg/flyte-binary -n <YOUR_NAMESPACE> --values <YOUR_YAML_FILE>
Replace ``<RELEASE_NAME>`` with the name of your release (e.g., ``flyte-backend``),
``<YOUR_NAMESPACE>`` with the name of your namespace (e.g., ``flyte``),
and ``<YOUR_YAML_FILE>`` with the name of your YAML file.

.. group-tab:: Flyte core

.. code-block:: bash
helm upgrade <RELEASE_NAME> flyte/flyte-core -n <YOUR_NAMESPACE> --values values-override.yaml
Replace ``<RELEASE_NAME>`` with the name of your release (e.g., ``flyte``)
and ``<YOUR_NAMESPACE>`` with the name of your namespace (e.g., ``flyte``).

You can refer to the documentation `here <https://docs.flyte.org/en/latest/flytesnacks/examples/sagemaker_inference_agent/index.html>`__.
2 changes: 1 addition & 1 deletion docs/flyte_fundamentals/extending_flyte.md
Original file line number Diff line number Diff line change
Expand Up @@ -151,7 +151,7 @@ many more ways to customize Flyte tasks:
in other languages outside of the `flytekit` SDK language.
* - {ref}`Backend Plugins <extend-plugin-flyte-backend>`
- These tasks plugins require implementing a backend plugin to leverage
external services like Sagemaker, Snowflake, BigQuery, etc.
external services like SageMaker, Snowflake, BigQuery, etc.
```

## What's next?
Expand Down
3 changes: 1 addition & 2 deletions docs/flyte_fundamentals/optimizing_tasks.md
Original file line number Diff line number Diff line change
Expand Up @@ -242,8 +242,7 @@ at the most granular level of your workflow!
When this task is executed on a Flyte cluster, it automatically provisions all of
the resources that you need. In this case, that need is distributed
training, but Flyte also provides integrations for {ref}`Spark <plugins-spark-k8s>`,
{ref}`Ray <kube-ray-op>`, {ref}`MPI <kf-mpi-op>`, {ref}`Sagemaker <aws-sagemaker>`,
{ref}`Snowflake <snowflake_agent>`, and more.
{ref}`Ray <kube-ray-op>`, {ref}`MPI <kf-mpi-op>`, {ref}`Snowflake <snowflake_agent>`, and more.

Even though Flyte itself is a powerful compute engine and orchestrator for
data engineering, machine learning, and analytics, perhaps you have existing
Expand Down
5 changes: 4 additions & 1 deletion docs/user_guide/basics/workflows.md
Original file line number Diff line number Diff line change
Expand Up @@ -29,7 +29,7 @@ Workflows link multiple tasks together. They can be written as Python functions,
but it's important to distinguish tasks and workflows.

A task's body executes at run-time on a Kubernetes cluster, in a Query Engine like BigQuery,
or on hosted services like AWS Batch or Sagemaker.
or on hosted services like AWS Batch or SageMaker.

In contrast, a workflow's body doesn't perform computations; it's used to structure tasks.
A workflow's body executes at registration time, during the workflow's registration process.
Expand Down Expand Up @@ -116,6 +116,7 @@ if __name__ == "__main__":
+++ {"lines_to_next_cell": 0}

To run the workflow locally, you can use the following `pyflyte run` command:

```
pyflyte run \
https://raw.githubusercontent.com/flyteorg/flytesnacks/master/examples/basics/basics/workflow.py \
Expand All @@ -124,6 +125,7 @@ pyflyte run \

If you want to run it remotely on the Flyte cluster,
simply add the `--remote flag` to the `pyflyte run` command:

```
pyflyte run --remote \
https://raw.githubusercontent.com/flyteorg/flytesnacks/master/examples/basics/basics/workflow.py \
Expand All @@ -138,6 +140,7 @@ However, {ref}`executing an individual task <single_task_execution>` independent
without the confines of a workflow, offers a convenient approach for iterating on task logic effortlessly.

## Use `partial` to provide default arguments to tasks

You can use the {py:func}`functools.partial` function to assign default or constant values to the parameters of your tasks.

```{code-cell}
Expand Down
7 changes: 4 additions & 3 deletions docs/user_guide/extending/backend_plugins.md
Original file line number Diff line number Diff line change
Expand Up @@ -30,15 +30,15 @@ Flyte.
To recap, here are a few examples of why you would want to implement a backend plugin:

1. We want to add a new capability to the Flyte Platform, for example we might want to:
- Talk to a new service like AWS Sagemaker, Snowflake, Redshift, Athena, BigQuery, etc.
- Talk to a new service like AWS SageMaker, Snowflake, Redshift, Athena, BigQuery, etc.
- Orchestrate a set of containers in a new way like Spark, Flink, Distributed
training on Kubernetes (usually using a Kubernetes operator).
- Use a new container orchestration engine like AWS Batch/ECS, Hashicorp' Nomad
- Use a completely new runtime like AWS Lambda, KNative, etc.
3. You want to retain the capability to update the plugin implementation and roll
2. You want to retain the capability to update the plugin implementation and roll
out new changes and fixes without affecting the users code or requiring them to update
versions of their plugins.
4. You want the same plugin to be accessible across multiple language SDK's.
3. You want the same plugin to be accessible across multiple language SDK's.

```{note}
Talking to a new service can be done using flytekit extensions and usually is the better way to get started. But, once matured, most of these extensions are better to be migrated to the backend. For the rest of the cases, it is possible to extend flytekit to achieve these scenarios, but this is less desirable, because of the associated overhead of first launching a container that launches these jobs downstream.
Expand Down Expand Up @@ -85,6 +85,7 @@ The backend plugin is where the actual logic of the execution is implemented. Th
1. [Kubernetes operator Plugin](https://pkg.go.dev/github.com/lyft/[email protected]/go/tasks/pluginmachinery/k8s#Plugin): The demo in the video below shows two examples of K8s backend plugins: flytekit `Athena` & `Spark`, and Flyte K8s `Pod` & `Spark`.

```{youtube} oK2RGQuP94k
```

2. **A Web API plugin:** [Async](https://pkg.go.dev/github.com/lyft/[email protected]/go/tasks/pluginmachinery/webapi#AsyncPlugin) or [Sync](https://pkg.go.dev/github.com/lyft/[email protected]/go/tasks/pluginmachinery/webapi#SyncPlugin).
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -105,7 +105,7 @@ Flytepropeller pod would be created as:
:::

This code snippet will output two logs per task that use the log plugin.
However, not all task types use the log plugin; for example, the SageMaker plugin uses the log output provided by Sagemaker, and the Snowflake plugin will use a link to the snowflake console.
However, not all task types use the log plugin; for example, the Snowflake plugin will use a link to the Snowflake console.

## Datadog integration

Expand All @@ -128,7 +128,7 @@ If you're using environment variables, use the following config:
DD_LOGS_ENABLED: "false"
DD_LOGS_CONFIG_CONTAINER_COLLECT_ALL: "true"
DD_LOGS_CONFIG_K8S_CONTAINER_USE_FILE: "true"
DD_CONTAINER_EXCLUDE_LOGS: "name:datadog-agent" # This is to avoid tracking logs produced by the datadog agent itself
DD_CONTAINER_EXCLUDE_LOGS: "name:datadog-agent" # This is to avoid tracking logs produced by the datadog agent itself
```

:::{warning}
Expand Down
Loading

0 comments on commit c13d601

Please sign in to comment.