Accommodate when a 404 is not a client error #35780
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Technical Summary
Traefik proxy returns (client error) 404 instead of (server error) 503 when it has not been configured with a catch-all router and is unable to match a request to a router. See https://doc.traefik.io/traefik/getting-started/faq/#404-not-found
Traefik ought to be configured in such a way that it only responds with a client error when a client has made an error. This change is to accommodate when Traefik has not been configured that way.
Almost 100% of the time, a server returns a 404 response when a request refers to a resource that is not found because the request is permanently incorrect. In a vanishingly small number of situations -- really just this one -- that is not true. The implications of this change are that all payloads that cause 404 errors will be retried
MAX_ATTEMPTS
(3) times, instead of being cancelled immediately. That feels wrong, but this is a pragmatic solution.Another option would be to make
HTTP_STATUS_4XX_RETRY
configurable on a per-ConnectionSettings basis. That is significantly more effort, and I'm not sure it's worth it. I'm curious to hear your thoughts.Safety Assurance
Safety story
Our large number of Celery workers will easily handle the small increase in load.
Automated test coverage
HTTP_STATUS_4XX_RETRY
is covered bycorehq/motech/repeaters/tests/test_models.py::TestRepeaterHandleResponse::test_handle_4XX_retry_codes
.QA Plan
No QA planned
Rollback instructions
Labels & Review