From 1483a30e23057299e8f5ad1cc43d4c5939326327 Mon Sep 17 00:00:00 2001 From: "Fabio M. Graetz, Ph.D" Date: Sun, 10 Sep 2023 11:32:04 +0200 Subject: [PATCH 1/2] Document simplified retry behaviour introduced in #3902 MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Signed-off-by: Fabio M. Graetz, Ph.D. Signed-off-by: Fabio Grätz --- rsts/concepts/tasks.rst | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/rsts/concepts/tasks.rst b/rsts/concepts/tasks.rst index 1ca43d5ea8..e38e16839d 100644 --- a/rsts/concepts/tasks.rst +++ b/rsts/concepts/tasks.rst @@ -106,6 +106,10 @@ System retry can be of two types: Recoverable vs. Non-Recoverable failures: Recoverable failures will be retried and counted against the task's retry count. Non-recoverable failures will just fail, i.e., the task isn’t retried irrespective of user/system retry configurations. All user exceptions are considered non-recoverable unless the exception is a subclass of FlyteRecoverableException. +.. note:: + + `RFC 3902 `_ implements an alternative, simplified retry behaviour with which both system and user retries are counted towards a single retry budget defined in the task decorator (thus, without a second retry budget defined in the platform configuration). The last retries are always performed on non-spot instances to guarantee completion. To activate this behaviour, set ``TODO`` to ``TODO`` in the helm values. + **Timeouts** To ensure that the system is always making progress, tasks must be guaranteed to end gracefully/successfully. The system defines a default timeout period for the tasks. It is possible for task authors to define a timeout period, after which the task is marked as ``failure``. Note that a timed-out task will be retried if it has a retry strategy defined. The timeout can be handled in the `TaskMetadata `__. From af16c202f56ccfe5f05d660ab5906caab79c9ac3 Mon Sep 17 00:00:00 2001 From: "Fabio M. Graetz, Ph.D" Date: Mon, 30 Oct 2023 09:16:50 +0100 Subject: [PATCH 2/2] Update rsts/concepts/tasks.rst MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Signed-off-by: Fabio M. Graetz, Ph.D. Signed-off-by: Fabio Grätz --- rsts/concepts/tasks.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/rsts/concepts/tasks.rst b/rsts/concepts/tasks.rst index e38e16839d..46d7b1963c 100644 --- a/rsts/concepts/tasks.rst +++ b/rsts/concepts/tasks.rst @@ -108,7 +108,7 @@ System retry can be of two types: .. note:: - `RFC 3902 `_ implements an alternative, simplified retry behaviour with which both system and user retries are counted towards a single retry budget defined in the task decorator (thus, without a second retry budget defined in the platform configuration). The last retries are always performed on non-spot instances to guarantee completion. To activate this behaviour, set ``TODO`` to ``TODO`` in the helm values. + `RFC 3902 `_ implements an alternative, simplified retry behaviour with which both system and user retries are counted towards a single retry budget defined in the task decorator (thus, without a second retry budget defined in the platform configuration). The last retries are always performed on non-spot instances to guarantee completion. To activate this behaviour, set ``configmap.core.propeller.node-config.ignore-retry-cause`` to ``true`` in the helm values. **Timeouts**