From b12dd4dca732af915a184f18a6f9535609add47a Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Lovro=20Ma=C5=BEgon?= Date: Wed, 14 Aug 2024 15:56:41 +0200 Subject: [PATCH] Update 20240812-recover-from-pipeline-errors.md --- .../20240812-recover-from-pipeline-errors.md | 15 ++++++++++++--- 1 file changed, 12 insertions(+), 3 deletions(-) diff --git a/docs/design-documents/20240812-recover-from-pipeline-errors.md b/docs/design-documents/20240812-recover-from-pipeline-errors.md index 0b463d32d..439a79a09 100644 --- a/docs/design-documents/20240812-recover-from-pipeline-errors.md +++ b/docs/design-documents/20240812-recover-from-pipeline-errors.md @@ -79,6 +79,18 @@ pipeline successfully processes at least one record or runs without encountering an error for a predefined time frame, the count of consecutive failures will reset. +### Dead-letter queue errors + +If a DLQ ([Dead-letter queue](https://conduit.io/docs/features/dead-letter-queue)) +is configured with a nack threshold greater than 0, the user has configured a +DLQ and we should respect that. This means that if the nack threshold gets +exceeded and the DLQ returns an error, that error should degrade the pipeline +_without_ triggering the recovery mechanism. + +In case the nack threshold is set to 0 (default), then any record sent to the +DLQ will cause the pipeline to stop. In that case, the recovery mechanism should +try to restart the pipeline, since the DLQ is essentially disabled. + ## Pipeline state management A new pipeline state, `recovering`, will be introduced to indicate that a @@ -201,6 +213,3 @@ design document. re-delivered, the same records could land in the DLQ multiple times. I propose documenting this edge case for now and tackling the solution as part of [record deduplication](#record-deduplication). - As for the nack threshold - if that threshold is reached it should emit a - fatal error and put the pipeline into a `degraded` state without triggering - the recovery mechanism, otherwise the nack threshold would lose its purpose.