Replies: 2 comments
-
Devils advocate: If I have 2 backends: - name: handle-heavy-load
weight: 999999
- name: super-fragile
weight: 1 and suddenly I could maybe accept that this is fine if |
Beta Was this translation helpful? Give feedback.
-
@kflynn, the way that you've read this is correct - this is designed to make implementations fail fast if there's a mistake. We made a conscious decision, knowing that this will not protect end users in this case, which sucks, but we considered the alternative worse. At the end of the day, it came down to "respecting user intent". The only way we have to understand what the user is intending to do is what's in the API objects. I agree that it's likely that at some point, end users are going to make errors with adding backends. But we have no way of telling if it's an error, or if it's what the user wants. For example, take doing blue/green rollouts. If you're adding two backends, like this: - name: app-v1
weight: 99
- name: app-v2
weight: 1 If the This is what I mean by "respecting user intent" - we have no way of knowing if the user intended to set things up the way they did. Maybe they want the nonexistent service there for some reason, but at the end of the day, we decided that we had to trust that the end user knows what they're doing, and give them tools to check to know if there's a mistake (i.e. the status). I think that when @kflynn and I discussed this before, he referred to this as an "SRE-centric" view, which I will cop to - I think that a lot of the design for Gateway API is centered around Gateway, which is effectively an SRE-owned resource (usually). To me, it comes down to the fact that the use cases we are handling are complex enough that it's difficult to impossible to guess if a particular config is a mistake or not. So we have to err on the side of letting people shoot themselves in the foot. Sadly. |
Beta Was this translation helpful? Give feedback.
-
First, let's talk about invalid
backendRefs
. The HTTPRoute specification states:and the HTTPBackendRef adds:
It appears to me (and that word "appears" is important!) that the intent here is to try to make a misconfigured
backendRef
very obvious, very quickly. That makes all kinds of sense to me: we clearly should surface errors quickly and meaningfully. What I'm worried about is that the specified behavior won't be meaningful to end users.Consider the end user's experience when you define an HTTPRoute with a single valid
backendRef
: they see a working application. Great.Now suppose you, as the app developer or as cluster ops, add a second
backendRef
-- maybe you're doing a canary release of a new version of the workload, maybe you just want two for redundancy, who knows. In any case, you botch your typing and your newbackendRef
points to a nonexistent resource... and boom, half your end users are facing a dead application.This feels like the opposite of how an HTTPRoute should treat the end user. 🙂 I'd like to see this changed: as long as you have valid
backendRefs
, I think that any invalidbackendRefs
should be ignored for routing while continuing to be surfaced in the Route, as currently defined.From there, we can continue to considering backends that become unresponsive, fail health checks, etc.: in these situations, the
backendRef
is "valid", but the user experience will be much improved by the implementation choosing not to consider these backends in routing decisions. It's not clear to me that the spec as written allows for that; thoughts? I definitely believe that not forcing the implementation to route to a backend it knows to be dead will be a better user experience.Beta Was this translation helpful? Give feedback.
All reactions