From 815f85d0ce90a3ace61cce17c0bfb441ac2dbcc3 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Fabio=20Gr=C3=A4tz?= Date: Wed, 31 Jul 2024 21:53:57 +0200 Subject: [PATCH] Fix typos MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Signed-off-by: Fabio Grätz --- rfc/system/5598-deterministic-errors-distributed-training.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/rfc/system/5598-deterministic-errors-distributed-training.md b/rfc/system/5598-deterministic-errors-distributed-training.md index daee233757..23063e2f57 100644 --- a/rfc/system/5598-deterministic-errors-distributed-training.md +++ b/rfc/system/5598-deterministic-errors-distributed-training.md @@ -33,7 +33,7 @@ For distributed training tasks, the [pod entrypoint `pyflyte-execute`](https://g For this purpose, we propose that `flyteplugins` injects the environment variable `FLYTE_INTERNAL_POD_NAME` using the Kubernetes [downward api](https://kubernetes.io/docs/concepts/workloads/pods/downward-api/#downwardapi-fieldRef). -Furthermore, we propose that distributed task plugins in `flyteplugins` inject the environment variable `FLYTE_INTERNAL_ERROR_PROPAGATION=earlist` (where `earliest` is the first of potentially multiple strategies to determine the root cause error, see below). +Furthermore, we propose that distributed task plugins in `flyteplugins` inject the environment variable `FLYTE_INTERNAL_ERROR_PROPAGATION=earliest` (where `earliest` is the first of potentially multiple strategies to determine the root cause error, see below). If the `FLYTE_INTERNAL_ERROR_PROPAGATION` environment variable is set, `pyflyte-execute` includes the pod name in the error file. @@ -65,7 +65,7 @@ Open questions: Currently, [here](https://github.com/flyteorg/flyte/blob/4514860cf56ba62717f6c207f269410a8c1a5461/flytepropeller/pkg/controller/nodes/task/k8s/plugin_manager.go#L290) in the plugin manager, where we call `NewRemoteFileOutputReader`, we do have access to `e.plugin`, and thus to `PluginProperties` and could make use of that information to instantiate another output reader. * Could we alternatively add an `OutputReader` to the [`PluginContext`](https://github.com/flyteorg/flyte/blob/4514860cf56ba62717f6c207f269410a8c1a5461/flyteplugins/go/tasks/pluginmachinery/k8s/plugin.go#L51)? Where would we customize this plugin context for e.g. the kubeflow plugins? -#### Backwards compatability +#### Backwards compatibility We propose that the new `MultiErrorFileRemoteFileOutputReader` falls back to reading the `error.pb` if no `error-.pb` files are found in order to solve the problem of backwards compatibility: * If flytekit uses a version that supports multiple error files but the backend does not yet, `pyflyte-execute` will not upload multiple error files for distributed tasks since the `FLYTE_INTERNAL_ERROR_PROPAGATION` environment variable will not be set.