sagemaker agent backend setup documentation (#5064)

* sagemaker agent backend setup doc Signed-off-by: Samhita Alla <[email protected]> * update requirement and remove debugging code Signed-off-by: Samhita Alla <[email protected]> * add python-kubernetes Signed-off-by: Samhita Alla <[email protected]> * incorporate suggestions by Nikki Signed-off-by: Samhita Alla <[email protected]> --------- Signed-off-by: Samhita Alla <[email protected]>
flyteorg · Mar 22, 2024 · c13d601 · c13d601
1 parent 5fc57fa
commit c13d601
Show file tree

Hide file tree

Showing 11 changed files with 2,955 additions and 2,114 deletions.
diff --git a/docs/_ext/import_projects.py b/docs/_ext/import_projects.py
@@ -84,10 +84,11 @@ def parse(self):
 
 def update_sys_path_for_flytekit(import_project_config: ImportProjectsConfig):
     # create flytekit/_version.py file manually
-    with open(f"{import_project_config.flytekit_api_dir}/flytekit/_version.py", "w") as f:
+    with open(
+        f"{import_project_config.flytekit_api_dir}/flytekit/_version.py", "w"
+    ) as f:
         f.write(f'__version__ = "dev"')
 
-
     # add flytekit to python path
     flytekit_dir = os.path.abspath(import_project_config.flytekit_api_dir)
     flytekit_src_dir = os.path.abspath(os.path.join(flytekit_dir, "flytekit"))
@@ -151,7 +152,7 @@ def import_projects(app: Sphinx, config: Config):
         if repo:
             tags = sorted(
                 [t for t in repo.tags if re.match(VERSION_PATTERN, t.name)],
-                key=lambda t: t.commit.committed_datetime
+                key=lambda t: t.commit.committed_datetime,
             )
             if not tags or import_projects_config.dev_build:
                 # If dev_build is specified or the tags don't exist just use the
@@ -187,7 +188,9 @@ def import_projects(app: Sphinx, config: Config):
     update_sys_path_for_flytekit(import_projects_config)
 
     # add functions to clean up source and docstring refs
-    for i, (patt, repl) in enumerate(import_projects_config.source_regex_mapping.items()):
+    for i, (patt, repl) in enumerate(
+        import_projects_config.source_regex_mapping.items()
+    ):
         app.connect(
             "source-read",
             partial(replace_refs_in_files, patt, repl),
@@ -200,7 +203,9 @@ def import_projects(app: Sphinx, config: Config):
         )
 
 
-def replace_refs_in_files(patt: str, repl: str, app: Sphinx, docname: str, source: List[str]):
+def replace_refs_in_files(
+    patt: str, repl: str, app: Sphinx, docname: str, source: List[str]
+):
     text = source[0]
 
     if re.search(patt, text):
@@ -211,7 +216,14 @@ def replace_refs_in_files(patt: str, repl: str, app: Sphinx, docname: str, sourc
 
 
 def replace_refs_in_docstrings(
-    patt: str, repl: str, app: Sphinx, what: str, name: str, obj: str, options: dict, lines: List[str],
+    patt: str,
+    repl: str,
+    app: Sphinx,
+    what: str,
+    name: str,
+    obj: str,
+    options: dict,
+    lines: List[str],
 ):
     replace = {}
     for i, text in enumerate(lines):

diff --git a/docs/core_use_cases/machine_learning.md b/docs/core_use_cases/machine_learning.md
@@ -108,9 +108,8 @@ There are many ways to extend your workloads:
     [Ray Tune](https://docs.ray.io/en/latest/tune/index.html) for hyperparameter
     optimization, all orchestrated by Flyte as ephemerally-provisioned Ray clusters.
 * - **📦 Ephemeral Cluster Resources**
-  - Use the {ref}`MPI Operator <kf-mpi-op>`, {ref}`Sagemaker <aws-sagemaker>`,
-    {ref}`Kubeflow Tensorflow <kftensorflow-plugin>`, {ref}`Kubeflow Pytorch<kf-pytorch-op>`
-    and {doc}`more <_tags/DistributedComputing>` to do distributed training.
+  - Use the {ref}`MPI Operator <kf-mpi-op>`, {ref}`Kubeflow Tensorflow <kftensorflow-plugin>`,
+    {ref}`Kubeflow Pytorch<kf-pytorch-op>` and {doc}`more <_tags/DistributedComputing>` to do distributed training.
 * - **🔎 Experiment Tracking**
   - Auto-capture training logs with the {py:func}`~flytekitplugins.mlflow.mlflow_autolog`
     decorator, which can be viewed as Flyte Decks with `@task(disable_decks=False)`.

diff --git a/docs/deployment/agents/index.md b/docs/deployment/agents/index.md
@@ -27,6 +27,8 @@ If you are using a managed deployment of Flyte, you will need to contact your de
   - Configuring your Flyte deployment for the MMCloud agent.
 * - {ref}`Sensor Agent <deployment-agent-setup-sensor>`
   - Configuring your Flyte deployment for the sensor agent.
+* - {ref}`SageMaker Inference <deployment-agent-setup-sagemaker-inference>`
+  - Deploy models and create, as well as trigger inference endpoints on SageMaker.
 ```
 
 ```{toctree}
@@ -39,6 +41,7 @@ chatgpt
 databricks
 bigquery
 mmcloud
+sagemaker_inference
 sensor
 snowflake
 ```
diff --git a/docs/deployment/agents/sagemaker_inference.rst b/docs/deployment/agents/sagemaker_inference.rst
@@ -0,0 +1,126 @@
+.. _deployment-agent-setup-sagemaker-inference:
+
+SageMaker Inference Agent
+=========================
+
+This guide provides an overview of how to set up the SageMaker inference agent in your Flyte deployment.
+
+Specify agent configuration
+---------------------------
+
+.. tabs::
+
+    .. group-tab:: Flyte binary
+
+      Edit the relevant YAML file to specify the agent.
+
+      .. code-block:: bash
+
+        kubectl edit configmap flyte-sandbox-config -n flyte
+
+      .. code-block:: yaml
+        :emphasize-lines: 7,11-12,16-17
+
+        tasks:
+          task-plugins:
+            enabled-plugins:
+              - container
+              - sidecar
+              - k8s-array
+              - agent-service
+            default-for-task-types:
+              - container: container
+              - container_array: k8s-array
+              - boto: agent-service
+              - sagemaker-endpoint: agent-service
+        plugins:
+          agent-service:
+            supportedTaskTypes:
+            - boto
+            - sagemaker-endpoint
+
+    .. group-tab:: Flyte core
+
+      Create a file named ``values-override.yaml`` and add the following configuration to it:
+
+      .. code-block:: yaml
+        :emphasize-lines: 9,14-15,19-20
+
+        configmap:
+          enabled_plugins:
+            tasks:
+              task-plugins:
+                enabled-plugins:
+                  - container
+                  - sidecar
+                  - k8s-array
+                  - agent-service
+                default-for-task-types:
+                  container: container
+                  sidecar: sidecar
+                  container_array: k8s-array
+                  boto: agent-service
+                  sagemaker-endpoint: agent-service
+            plugins:
+              agent-service:
+                supportedTaskTypes:
+                - boto
+                - sagemaker-endpoint
+
+Add the AWS credentials
+-----------------------
+
+1. Install the flyteagent pod using helm:
+
+.. code-block::
+
+  helm repo add flyteorg https://flyteorg.github.io/flyte
+  helm install flyteagent flyteorg/flyteagent --namespace flyte
+
+2. Get the base64 value of your AWS credentials:
+
+.. code-block::
+  
+  echo -n "<AWS_CREDENTIAL>" | base64
+
+3. Edit the flyteagent secret:
+
+.. code-block:: bash
+  
+  kubectl edit secret flyteagent -n flyte
+
+.. code-block:: yaml
+  :emphasize-lines: 3-5
+
+  apiVersion: v1
+  data:
+    aws-access-key: <BASE64_ENCODED_AWS_ACCESS_KEY>
+    aws-secret-access-key: <BASE64_ENCODED_AWS_SECRET_ACCESS_KEY>
+    aws-session-token: <BASE64_ENCODED_AWS_SESSION_TOKEN>
+  kind: Secret
+
+Upgrade the Flyte Helm release
+------------------------------
+
+.. tabs::
+
+  .. group-tab:: Flyte binary
+
+    .. code-block:: bash
+
+      helm upgrade <RELEASE_NAME> flyteorg/flyte-binary -n <YOUR_NAMESPACE> --values <YOUR_YAML_FILE>
+
+    Replace ``<RELEASE_NAME>`` with the name of your release (e.g., ``flyte-backend``),
+    ``<YOUR_NAMESPACE>`` with the name of your namespace (e.g., ``flyte``),
+    and ``<YOUR_YAML_FILE>`` with the name of your YAML file.
+
+  .. group-tab:: Flyte core
+
+    .. code-block:: bash
+
+      helm upgrade <RELEASE_NAME> flyte/flyte-core -n <YOUR_NAMESPACE> --values values-override.yaml
+
+    Replace ``<RELEASE_NAME>`` with the name of your release (e.g., ``flyte``)
+    and ``<YOUR_NAMESPACE>`` with the name of your namespace (e.g., ``flyte``).
+
+You can refer to the documentation `here <https://docs.flyte.org/en/latest/flytesnacks/examples/sagemaker_inference_agent/index.html>`__.
diff --git a/docs/flyte_fundamentals/extending_flyte.md b/docs/flyte_fundamentals/extending_flyte.md
@@ -151,7 +151,7 @@ many more ways to customize Flyte tasks:
     in other languages outside of the `flytekit` SDK language.
 * - {ref}`Backend Plugins <extend-plugin-flyte-backend>`
   - These tasks plugins require implementing a backend plugin to leverage
-    external services like Sagemaker, Snowflake, BigQuery, etc.
+    external services like SageMaker, Snowflake, BigQuery, etc.
 ```
 
 ## What's next?

diff --git a/docs/flyte_fundamentals/optimizing_tasks.md b/docs/flyte_fundamentals/optimizing_tasks.md
@@ -242,8 +242,7 @@ at the most granular level of your workflow!
 When this task is executed on a Flyte cluster, it automatically provisions all of
 the resources that you need. In this case, that need is distributed
 training, but Flyte also provides integrations for {ref}`Spark <plugins-spark-k8s>`,
-{ref}`Ray <kube-ray-op>`, {ref}`MPI <kf-mpi-op>`, {ref}`Sagemaker <aws-sagemaker>`,
-{ref}`Snowflake <snowflake_agent>`, and more.
+{ref}`Ray <kube-ray-op>`, {ref}`MPI <kf-mpi-op>`, {ref}`Snowflake <snowflake_agent>`, and more.
 
 Even though Flyte itself is a powerful compute engine and orchestrator for
 data engineering, machine learning, and analytics, perhaps you have existing

diff --git a/docs/user_guide/basics/workflows.md b/docs/user_guide/basics/workflows.md
@@ -29,7 +29,7 @@ Workflows link multiple tasks together. They can be written as Python functions,
 but it's important to distinguish tasks and workflows.
 
 A task's body executes at run-time on a Kubernetes cluster, in a Query Engine like BigQuery,
-or on hosted services like AWS Batch or Sagemaker.
+or on hosted services like AWS Batch or SageMaker.
 
 In contrast, a workflow's body doesn't perform computations; it's used to structure tasks.
 A workflow's body executes at registration time, during the workflow's registration process.
@@ -116,6 +116,7 @@ if __name__ == "__main__":
 +++ {"lines_to_next_cell": 0}
 
 To run the workflow locally, you can use the following `pyflyte run` command:
+
 ```
 pyflyte run \
   https://raw.githubusercontent.com/flyteorg/flytesnacks/master/examples/basics/basics/workflow.py \
@@ -124,6 +125,7 @@ pyflyte run \
 
 If you want to run it remotely on the Flyte cluster,
 simply add the `--remote flag` to the `pyflyte run` command:
+
 ```
 pyflyte run --remote \
   https://raw.githubusercontent.com/flyteorg/flytesnacks/master/examples/basics/basics/workflow.py \
@@ -138,6 +140,7 @@ However, {ref}`executing an individual task <single_task_execution>` independent
 without the confines of a workflow, offers a convenient approach for iterating on task logic effortlessly.
 
 ## Use `partial` to provide default arguments to tasks
+
 You can use the {py:func}`functools.partial` function to assign default or constant values to the parameters of your tasks.
 
 ```{code-cell}

diff --git a/docs/user_guide/extending/backend_plugins.md b/docs/user_guide/extending/backend_plugins.md
@@ -30,15 +30,15 @@ Flyte.
 To recap, here are a few examples of why you would want to implement a backend plugin:
 
 1. We want to add a new capability to the Flyte Platform, for example we might want to:
-   - Talk to a new service like  AWS Sagemaker, Snowflake, Redshift, Athena, BigQuery, etc.
+   - Talk to a new service like AWS SageMaker, Snowflake, Redshift, Athena, BigQuery, etc.
    - Orchestrate a set of containers in a new way like Spark, Flink, Distributed
      training on Kubernetes (usually using a Kubernetes operator).
    - Use a new container orchestration engine like AWS Batch/ECS, Hashicorp' Nomad
    - Use a completely new runtime like AWS Lambda, KNative, etc.
-3. You want to retain the capability to update the plugin implementation and roll
+2. You want to retain the capability to update the plugin implementation and roll
    out new changes and fixes without affecting the users code or requiring them to update
    versions of their plugins.
-4. You want the same plugin to be accessible across multiple language SDK's.
+3. You want the same plugin to be accessible across multiple language SDK's.
 
 ```{note}
 Talking to a new service can be done using flytekit extensions and usually is the better way to get started. But, once matured, most of these extensions are better to be migrated to the backend. For the rest of the cases, it is possible to extend flytekit to achieve these scenarios, but this is less desirable, because of the associated overhead of first launching a container that launches these jobs downstream.
@@ -85,6 +85,7 @@ The backend plugin is where the actual logic of the execution is implemented. Th
 1. [Kubernetes operator Plugin](https://pkg.go.dev/github.com/lyft/[email protected]/go/tasks/pluginmachinery/k8s#Plugin): The demo in the video below shows two examples of K8s backend plugins: flytekit `Athena` & `Spark`, and Flyte K8s `Pod` & `Spark`.
 
    ```{youtube} oK2RGQuP94k
+
    ```
 
 2. **A Web API plugin:** [Async](https://pkg.go.dev/github.com/lyft/[email protected]/go/tasks/pluginmachinery/webapi#AsyncPlugin) or [Sync](https://pkg.go.dev/github.com/lyft/[email protected]/go/tasks/pluginmachinery/webapi#SyncPlugin).

diff --git a/docs/user_guide/productionizing/configuring_logging_links_in_the_ui.md b/docs/user_guide/productionizing/configuring_logging_links_in_the_ui.md
@@ -105,7 +105,7 @@ Flytepropeller pod would be created as:
 :::
 
 This code snippet will output two logs per task that use the log plugin.
-However, not all task types use the log plugin; for example, the SageMaker plugin uses the log output provided by Sagemaker, and the Snowflake plugin will use a link to the snowflake console.
+However, not all task types use the log plugin; for example, the Snowflake plugin will use a link to the Snowflake console.
 
 ## Datadog integration
 
@@ -128,7 +128,7 @@ If you're using environment variables, use the following config:
 DD_LOGS_ENABLED: "false"
 DD_LOGS_CONFIG_CONTAINER_COLLECT_ALL: "true"
 DD_LOGS_CONFIG_K8S_CONTAINER_USE_FILE: "true"
-DD_CONTAINER_EXCLUDE_LOGS: "name:datadog-agent"  # This is to avoid tracking logs produced by the datadog agent itself
+DD_CONTAINER_EXCLUDE_LOGS: "name:datadog-agent" # This is to avoid tracking logs produced by the datadog agent itself
 ```
 
 :::{warning}