Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Conformance]: Adds GatewayClass SupportedVersion Test #3368

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

danehans
Copy link
Contributor

What type of PR is this?
/kind test
/area conformance

What this PR does / why we need it:
Adds a test to ensure implementations conform to the GatewayClass SupportedVersion status condition.

Which issue(s) this PR fixes:
Fixes #3367

Does this PR introduce a user-facing change?:

NONE

@k8s-ci-robot k8s-ci-robot added release-note-none Denotes a PR that doesn't merit a release note. kind/test do-not-merge/invalid-commit-message Indicates that a PR should not merge because it has an invalid commit message. area/conformance cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Sep 27, 2024
@k8s-ci-robot k8s-ci-robot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Sep 27, 2024
@youngnick
Copy link
Contributor

The change basically lgtm, but you'll need to remove the "fixes ..." from the commit message @danehans, Prow doesn't let us do that in commit messages.

@k8s-ci-robot k8s-ci-robot removed the do-not-merge/invalid-commit-message Indicates that a PR should not merge because it has an invalid commit message. label Sep 30, 2024
@danehans
Copy link
Contributor Author

Commit b40de38 removes "Fixes..." from the commit message. Thanks for the review @youngnick.

@arkodg
Copy link
Contributor

arkodg commented Sep 30, 2024

thanks @danehans, does this test pass on any implementations ?

@danehans
Copy link
Contributor Author

danehans commented Oct 1, 2024

@arkodg thanks for the review. Yes, this PR was tested with solo-io/gloo#10140 for Gloo Gateway:

go test ./conformance -run TestConformance -args \
    --gateway-class=gloo-gateway \
    --supported-features=Gateway \
    --run-test=GatewayClassSupportedVersionCondition
ok  	sigs.k8s.io/gateway-api/conformance	18.789s

Copy link
Member

@robscott robscott left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for working on this @danehans!

Adds a test to ensure implementations conform to the GatewayClass
SupportedVersion status condition.

Signed-off-by: Daneyon Hansen <[email protected]>
@danehans
Copy link
Contributor Author

@arkodg @robscott I have addressed your comments. PTAL when you have a moment.

Copy link
Contributor

@arkodg arkodg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks !
/approve

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: arkodg, danehans

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Nov 19, 2024
Copy link
Member

@robscott robscott left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @danehans!

}

// Ensure the SupportedVersion is false
kubernetes.GWCMustHaveSupportedVersionConditionFalse(t, s.Client, s.TimeoutConfig, gwc.Name)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some implementations like GKE will ~immediately overwrite this CRD with the original version, can you add that as an acceptable outcome of this test?

Features: []features.FeatureName{
features.SupportGateway,
},
Description: "A GatewayClass should set the SupportedVersion condition based on the presence and version of Gateway API CRDs in the cluster",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure this makes sense.

Consider I build an implementation that supports v1.1.

v1.2 comes out. Per the spec, even if I am willing to support it, I must set the condition to "false"

This means I cannot pass this conformance test for 1.2 until I upgrade to 1.2 which doesn't actually seem desired?

}

// Ensure the SupportedVersion status condition is false
kubernetes.GWCMustHaveSupportedVersionConditionFalse(t, s.Client, s.TimeoutConfig, gwc.Name)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we really expect implementations to have to watch CRDs and constantly reconcile the Gatewayclass? I get how we got to this point, but it feels like we are building APIs in isolation from considerations about user needs and implementation complexity.

Users are unlikely to ever have an incompatibility and somehow realize to check a random object (that they don't even create, so why would they know to look at it?).

The cost to implement this, however, is extremely high...

Copy link
Contributor

@howardjohn howardjohn Nov 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW the api-machinery folks have, historically, advised against an implementation reading CRDs at all or even assuming an API is defined by a CRD (vs another mechanism).

I realize this has not much to do with the test... but I missed the PR adding the condition

Copy link
Member

@mlavacca mlavacca left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tend to agree with @howardjohn about this feature: it looks like we are tightly coupling the implementation's internal dependency with the installed Gateway API version. In case Implementation X uses Gateway API 1.3 as a dependency, and Gateway API 1.4 gets released, if a user performs an upgrade of the API before the implementation adoption of that specific version, the GatewayClass gets marked as not accepted anymore.

@arkodg
Copy link
Contributor

arkodg commented Nov 19, 2024

this test is adding conformance for this section https://gateway-api.sigs.k8s.io/guides/api-design/?h=supportedversi#supported-api-versions, if this is undesirable, then the section should be modified

@mlavacca
Copy link
Member

this test is adding conformance for this section https://gateway-api.sigs.k8s.io/guides/api-design/?h=supportedversi#supported-api-versions, if this is undesirable, then the section should be modified

Yes, my concern is mostly related to the feature itself, not to this specific conformance test

@youngnick
Copy link
Contributor

The docs don't say that the GatewayClass needs to not be accepted, just that the unsupportedVersion Condition needs to be added.

The intent here is to give implementations a way to say "hey, you're running with a version I don't support, maybe check that before logging a support issue". That is, it's intended to be a feature that serves both users and maintainers by giving users a way to check that they are doing supported things.

@howardjohn I'm certainly ready to hear feedback on how you think that could be achieved without doing the things documented in the test. Or if you think that the goal is not worthwhile, I'd be interested to hear why.

@howardjohn
Copy link
Contributor

The intent here is to give implementations a way to say "hey, you're running with a version I don't support, maybe check that before logging a support issue". That is, it's intended to be a feature that serves both users and maintainers by giving users a way to check that they are doing supported things.

The intent makes sense at a high level, I am just not sure it makes sense in practice. Its very easy to come up with ideas to put an ever-increasing list of bits of information into an objects status. But we should do so with extreme caution -- each one adds real risk, complexity, and scale limitations. A slippery-slope strawman is we start forcing implementations to start putting their entire documentation in the status (which is not far off from the SupportFeatures field!). If you look at Kubernetes core, you can see they are very conservative in ensure each status is only critical information.

The user journeys for this field don't make sense. Which user is going to know to look at a GatewayClass to find this field?

The cost to implement this is high for an implementation. API Machinery folks have consistently strongly recommended against controllers assuming an API is created by a CustomResourceDefinition, and instead relying on API server discovery and the standardized REST APIs to consume APIs; this conformance test strictly enforces reading CRDs.

@kflynn
Copy link
Contributor

kflynn commented Nov 21, 2024

Apologies in advance: I'm intending this as an actual question, but I expect that it may sound overly snarky, which isn't my intent.

That said -- what's the intended user experience here, and which role are we designing for? I can think of a couple of possibilities:

  1. Chihiro sets up a cluster with Gateway API v1.x and the AwesomeGatewayController, which supports v1.x. Later, they upgrade the version of Gateway API to v1.y, which AwesomeGatewayController does not support.

    I think that expecting Chihiro to look at the Gateway's status when weird things start happening is pretty reasonable. OTOH I'm a little unsure why Chihiro wouldn't just hold off on the upgrade until AwesomeGatewayController supports v1.y, since they're the one who installed AwesomeGatewayController as well?

  2. Ana wants to use the incredibleNewFeature stanza in her HTTPRoutes, but that's not supported in Gateway API v1.x, so she installs v1.y and then things break.

    I don't think we'd expect Ana to know to check the Gateway, but then... we also don't expect Ana to get to upgrade Gateway API herself, right?

So... overall, I feel like neither of these hold together all that well, which makes me wonder what I'm missing. 🙂

@howardjohn
Copy link
Contributor

So... overall, I feel like neither of these hold together all that well, which makes me wonder what I'm missing. 🙂

Its slightly worse than your examples actually -- the status is on GatewayClass not Gateway, which I feel is less likely to be looked at

@kflynn
Copy link
Contributor

kflynn commented Nov 21, 2024

Its slightly worse than your examples actually -- the status is on GatewayClass

😂 🤦‍♂️ and I even read the chunk of the doc before writing those out... clearly time for me to knock off work for the day!

But you're right: that makes things weirder still.

@youngnick
Copy link
Contributor

The idea behind all of this is to teach users to check the GatewayClass to find out what their controller supports (this is the point of the status.supportedFeatures field after all). As an extension of that, it seemed like it would be nice for the controller to have a way to signal that it didn't fully support the installed version of the CRD (which we are, again, using in a way that's not really recommended by upstream because upstream does not support what we need - a way to say "this is the released version of the schema including all compatible additions". apiVersion is only useful for information about breaking changes, not about feature additions.)

The original purpose of all of this is to at least include some information about relevant things that didn't require going out to the implementation's docs site to find the details. @howardjohn, it seems like you don't agree with that goal? Or are you worried about the asymptotic case of having too much information in the GatewayClass status?

It seems to me that our options here are as follows:

  • We stop doing anything here and remove any mention of this Condition. This means that implementations now don't need to care about looking at the installed version of CRDs, and users can enjoy the experience of heading off to the implementation's documentation page (which is not listed anywhere in the system) to try and determine the version that's running and what versions of Gateway API it supports. Hope everyone is maintaining a feature matrix!
  • We loosen this requirement to a may, and make this an extended feature, allowing implementations that feel comfortable with this to implement it as they wish. This would mean that implementations can opt-in to this Condition, but that we have conformance for it if so.
  • We keep the spec as it is and include this test also as-is. This does not seem viable, given the level of discussion on this issue already.

@howardjohn
Copy link
Contributor

I have a few issues:

  • I don't think resource status should replace documentation. Resource status should be used, essentially, as asynchronous validation webhook type workflows. Its hard for me to explicitly quantify what makes one or the other, since you could, of course, argue supportedX is a validation as well...
  • I don't think users will ever look at GatewayClass, even if we try to make them, and I am not sure they should.
  • I think SupportVersion is fundamentally wrong, regardless of how its exposed to users, since support of a version is nowhere near black and white. For example, when GW API v1.3 comes out, our controller will support it without any changes, because the CRDs are backwards compatible. But presumably, with this condition, we would have to report SupportedVersion=false since we don't know about it (one could make the argument "given backwards compatibility we support all versions", but they would fail this conformance test!). Now we may not support some new field or feature, but that is already handled by the conformance reports/supportedFeatures.
  • Reading CRDs is considered "bad practice" from api-machinery, so enforcing an implementation does it is suspect. To be fair, in the implementation I work on we do it anyways for different reasons, so this is more theoretical for me(/arguing on behalf of others)

If I was the BDFL of the project I would personally just remove the condition entirely. I can see the merit in saying "its extended", though I could also argue its a bit questionable. There are an infinite set of APIs we could add as "extended" because "some implementation may want it"; we should probably be just as cautious in adding extended APIs as core ones, with the main distinction that something is extended IFF it cannot be implemented by everyone. In this case, I would imagine every can implement it. Additionally, if its only implemented across some implementations, it makes it more likely that users will not read GatewayClass status which makes the feature less useful

@youngnick
Copy link
Contributor

@howardjohn, I know you don't like GatewayClass because it's cluster scoped, but the intent behind GatewayClass was always:

  • provide a way for implementations to own some subset of Gateways in the cluster, enabling multiple implementations per cluster. There is no "bigger than one namespace but smaller than cluster-wide" scope in Kubernetes currently, so cluster-wide is what we've got here. The controllerName field is included to allow implementations to support multiple GatewayClasses (presumably with different properties somehow).
  • provide a central place for Gateway API users (in our current persona model, Chihiro or whoever else has rights to create Gateways) a way to see what implementations are available in the cluster, and what features they have available.

This Condition is an extension of this idea, it's a way to signal to users (probably Ian and Chihiro rather than Ana) that an implementation may not fully support a particular version. It's not intended to be a be-all-and-end-all, but a breadcrumb for users to follow. I think that having some way to signal to Ian or Chihiro that the CRD update you just installed may not work for every implementation would be a very useful feature, personally, that would save having to go and find the correct version matrix page on every implementation you have in your cluster.

Regarding watching CRD objects, I'd love to have a way not to do that, but currently, the versioning problem means that we don't. There's no way, aside from checking the CRD, for an implementation to know what version of the CRD is installed.

This is fine for single-implementation CRDs, because generally you install the CRDs and the implementation in the same operation, and there's a tight coupling between versions. But this breaks down for a community CRD like this one. I've asked in #api-machinery on Kube Slack for some more guidance here.

@robscott
Copy link
Member

robscott commented Dec 6, 2024

Trying to catch up on this thread, I think there are two conflicting ideas at play here:

  1. We should find a way to warn users if the implementation they're using doesn't support the CRD version they're using
  2. Requiring implementations to watch CRDs and populate a condition based on the version of those requires implementations to do something that k8s API Machinery historically recommends against

I think it may be helpful to consider an alternative here where implementations have a way to communicate via GWC status the range of versions they support, and then tools like gwctl or a theoretical future CRD installation tool can warn when there are incompatible versions.

Spitballing here, but this could involve a supportedVersions struct in GWC status, conceptually similar to supportedFeatures, and that would allow implementations to signal the combination of Gateway API versions and release channels that they support. This would have the added advantage of signaling to a CRD upgrade tool that a potential CRD upgrade was indeed safe.

I think I agree with @howardjohn that many/most users are unlikely to look at GatewayClass status, but I think that we can build useful tooling that does, and that could potentially dramatically improve the overall UX of working with Gateway API.

@youngnick
Copy link
Contributor

I had a chat with some folks in #sig-apimachinery about this (see https://kubernetes.slack.com/archives/C0EG7JC6T/p1733285894738669), and the consensus was that, while it's important to be careful about watching CRDs, it's okay in Gateway API's case because:

  • we're a Kubernetes API anyway that has multiple consumers (rather than an implementation and api bundled together)
  • we're only talking about watching metadata (labels and annotations).

So we probably can do
some CRD watching if we really need to.

That said, I really like @robscott's idea of a supportedVersions struct, that addresses my concern here in a better, more standard, and more consumable way anyway.

@howardjohn
Copy link
Contributor

Spitballing here, but this could involve a supportedVersions struct in GWC status, conceptually similar to supportedFeatures, and that would allow implementations to signal the combination of Gateway API versions and release channels that they support. This would have the added advantage of signaling to a CRD upgrade tool that a potential CRD upgrade was indeed safe.

This still doesn't make sense. Gateway API is a backwards compatible stable API. As an implementation today, I support v1.2, and v1.3 when it comes out, and v1.999 when it comes out.

I may not support a new feature in a new version, but that is already a part of the API.

Users of an API really probably shouldn't actually worry about the version, they care about features. For example, as a user of Service I don't care if the API is from k8s 1.28 or 1.30, but I may care if it has the trafficDistribution field (which, 'version' is only one factor into determining if it does!)

@mlavacca
Copy link
Member

We should find a way to warn users if the implementation they're using doesn't support the CRD version they're using

How can this happen? As @howardjohn already said, Gateway API is a backward-compatible API, and this should never happen. An implementation can implement feature X or feature Y, and this is already captured in the SupportedFeatures field, but I really don't see how version 1.3 can be released and the project foo already compatible with version 1.2 is not compatible anymore. The only possibility I see here is some bug on our side, but for all the other things, we are a stable API and aim at not introducing breaking changes.

@robscott
Copy link
Member

but I really don't see how version 1.3 can be released and the project foo already compatible with version 1.2 is not compatible anymore

This should not be possible within standard channel, but is entirely possible within experimental channel. I would request that we move this discussion over to #3494 since this PR is just about a conformance test and not the overall usefulness of the feature.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. area/conformance cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/test release-note-none Denotes a PR that doesn't merit a release note. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add a GatewayClass SupportedVersion Status Condition Test
8 participants