diff --git a/geps/gep-2648/index.md b/geps/gep-2648/index.md index 96366db784..804f65a361 100644 --- a/geps/gep-2648/index.md +++ b/geps/gep-2648/index.md @@ -1,12 +1,17 @@ # GEP-2648: Direct Policy Attachment * Issue: [#2648](https://github.com/kubernetes-sigs/gateway-api/issues/2648) -* Status: Provisional +* Status: Declined (See [status definitions](../overview.md#gep-states).) ## TLDR +!!! warning + This GEP has been merged back into [GEP-713](https://gateway-api.sigs.k8s.io/geps/gep-713/) + and now it's now obsolete. Please refer the original specification of Metaresources + and Policy Attachment for the current state of the pattern. + Describe and specify a design pattern for a class of metaresource that can affect specific settings across a single target object. @@ -21,13 +26,6 @@ Policy. This is a design for a _pattern_, not an API field or new object. -!!! danger - This GEP is in the process of being updated. - Please see the discussion at https://github.com/kubernetes-sigs/gateway-api/discussions/2927 - and expect further changes. - Some options under discussion there may make the distinction between Direct - and Inherited Policies moot, which would require a rework. - ## Goals * Specify what common properties all Direct Attached Policies MUST have diff --git a/geps/gep-2648/metadata.yaml b/geps/gep-2648/metadata.yaml index 7e9b54a867..c7de608f81 100644 --- a/geps/gep-2648/metadata.yaml +++ b/geps/gep-2648/metadata.yaml @@ -2,17 +2,17 @@ apiVersion: internal.gateway.networking.k8s.io/v1alpha1 kind: GEPDetails number: 2648 name: Direct Policy Attachment -status: Provisional +status: Declined # Any authors who contribute to the GEP in any way should be listed here using # their GitHub handle. authors: - youngnick - robscott relationships: - extends: + obsoletedBy: - name: Metaresources and Policy Attachment number: 713 - description: Split out Direct Policy Attachment into its own GEP + description: Merged back into the original spec for Metaresources and Policy Attachment where it's presented as a well-defined class of metaresource # references is a list of hyperlinks to relevant external references. # It's intended to be used for storing GitHub discussions, Google docs, etc. references: diff --git a/geps/gep-2649/index.md b/geps/gep-2649/index.md index dbcc73f78a..7cc17bca68 100644 --- a/geps/gep-2649/index.md +++ b/geps/gep-2649/index.md @@ -1,24 +1,22 @@ # GEP-2649: Inherited Policy Attachment * Issue: [#2649](https://github.com/kubernetes-sigs/gateway-api/issues/2649) -* Status: Experimental +* Status: Declined (See [status definitions](../overview.md#gep-states).) ## TLDR +!!! warning + This GEP has been merged back into [GEP-713](https://gateway-api.sigs.k8s.io/geps/gep-713/) + and now it's now obsolete. Please refer the original specification of Metaresources + and Policy Attachment for the current state of the pattern. + Describe and specify a design pattern for a class of metaresource that can affect specific settings across a multiple target objects. This is a design for a _pattern_, not an API field or new object. -!!! danger - This GEP is in the process of being updated. - Please see the discussion at https://github.com/kubernetes-sigs/gateway-api/discussions/2927 - and expect further changes. - Some options under discussion there may make the distinction between Direct - and Inherited Policies moot, which would require a rework. - ## Goals * Specify what common properties all Inherited Policies MUST have @@ -224,7 +222,7 @@ proposal](https://github.com/kubernetes-sigs/gateway-api/issues/611). ### Policy Attachment for Ingress When talking about Direct Attached Policy attaching to Gateway resources for -ingress use cases (as discussed in GEP-2648), the flow is relatively +ingress use cases (as discussed in GEP-2648), the flow is relatively straightforward. A policy can reference the resource it wants to apply to, and only affects that resource. @@ -245,7 +243,7 @@ namespaces. ![Complex Ingress Example](images/2649-ingress-complex.png) In this example, the Gateway has a TimeoutPolicy attached, which affects the -HTTPRoute in the App namespace. That HTTPRoute also has the Direct Attached +HTTPRoute in the App namespace. That HTTPRoute also has the Direct Attached RetryPolicy attached, which affects the HTTPRoute itself, and one of the backends has a HealthCheckPolicy attached to the Service, which is also a Direct Attached Policy. diff --git a/geps/gep-2649/metadata.yaml b/geps/gep-2649/metadata.yaml index 403a0c72ec..23e99cdc54 100644 --- a/geps/gep-2649/metadata.yaml +++ b/geps/gep-2649/metadata.yaml @@ -2,15 +2,15 @@ apiVersion: internal.gateway.networking.k8s.io/v1alpha1 kind: GEPDetails number: 2649 name: Inherited Policy Attachment -status: Provisional +status: Declined authors: - youngnick - robscott relationships: - extends: + obsoletedBy: - name: Metaresources and Policy Attachment number: 713 - description: Split out Inherited Policy Attachment + description: Merged back into the original spec for Metaresources and Policy Attachment where it's presented as a well-defined class of metaresource # references is a list of hyperlinks to relevant external references. # It's intended to be used for storing GitHub discussions, Google docs, etc. references: diff --git a/geps/gep-713/index.md b/geps/gep-713/index.md index 1038eb06eb..cdf9754ab1 100644 --- a/geps/gep-713/index.md +++ b/geps/gep-713/index.md @@ -3,1155 +3,1192 @@ * Issue: [#713](https://github.com/kubernetes-sigs/gateway-api/issues/713) * Status: Memorandum -## TLDR +(See status definitions [here](/geps/overview/#gep-states)) + +## TL;DR + +This GEP aims to standardize terminology and processes around "metaresources", i.e., using one Kubernetes object to modify the functions of one or more other objects. + +It lays out guidelines for Gateway API implementations and other stakeholders for the design and/or handling of custom resources in compliance with a pattern known as Policy Attachment. + +!!! warning + This GEP specifies a _pattern_, not an API field or new object. It defines some terms, including _Metaresource_, _Policies_ and _Policy Attachment_, and their related concepts. !!! danger - This GEP is in the process of being updated. - Please see the discussion at https://github.com/kubernetes-sigs/gateway-api/discussions/2927 - and expect further changes, although they will not be as extensive as the - more focussed GEP-2648 and GEP-2649. - Some options under discussion there may make the distinction between Direct - and Inherited Policies moot, which would require a rework. - -This GEP aims to standardize terminology and processes around using one Kubernetes -object to modify the functions of one or more other objects. - -This GEP defines some terms, firstly: _Metaresource_. - -A Kubernetes object that _augments_ the behavior of an object -in a standard way is called a _Metaresource_. - -This document proposes controlling the creation of configuration in the underlying -Gateway data plane using two types of Policy Attachment. -A "Policy Attachment" is a specific type of _metaresource_ that can affect specific -settings across either one object (this is "Direct Policy Attachment"), or objects -in a hierarchy (this is "Inherited Policy Attachment"). - -Individual policy APIs: - -- MUST be their own CRDs (e.g. `TimeoutPolicy`, `RetryPolicy` etc), -- MUST include both `spec` and `status` stanzas -- MUST have the `status` stanza include a `conditions` section using the standard - upstream Condition type -- MAY be included in the Gateway API group and installation or be defined by - implementations -- MUST include a common `TargetRef` struct in their specification to identify - how and where to apply that policy. -- MAY affect more objects than the object specified in the `targetRef`. In this - case, the Policy is an Inherited Policy. A common way to do this is to include - either a `defaults` section, an `overrides` section, or both. -- Policy objects that affect _only_ the object specified in the `targetRef` are - Direct Attached Policies (or more simply, Direct Policies.) - -The biggest difference between the two types of Policy is that Direct Attached -Policies are a strict subset of Policy objects with criteria designed to make -it _much_ easier to understand the state of the system, and so are simpler to -use and can use a more simple `status` design. - -However, Inherited Policies, because of the nature of the useful feature of having -settings cascade across multiple objects in a hierarchy, require knowledge of -more resources, and are consequently harder to understand and require a more -complex status design. - -Splitting these two design patterns apart into separate GEPs is intended to -allow proceeding with stabilizing the simpler (Direct) case while we work on -solving the status problem for the more complex (Inherited) case. - -Direct Attached Policies are further specified in the addendum GEP GEP-2648, -Direct Policy Attachment. + This pattern is so far agreed upon only by Gateway API implementers who were in need of an immediate solution and didn't want all their solutions to be completely different and disparate, but does not have wide agreement or review from the rest of Kubernetes (particularly API Machinery). + It is then conceivable that this problem domain gets a different solution in core in the future at which time this pattern might be considered obsoleted by that one. + When implementations have need of something that is not in the spec and free from the [user stories](#user-stories) for which this pattern has been primarily thought, they are encouraged to explore other means (e.g. trying to work their feature into the upstream spec) before considering introducing their own custom metaresources. + Examples of challenges associated with this pattern include the [Discoverability problem](#the-discoverability-problem) and the [Fanout status update problem](#fanout-status-update-problems). + +## Overview and Concepts + +### Background + +When designing Gateway API, a recurring challenge became apparent. There was often a need to change ("augment") the behavior of objects without modifying their specs. + +There are several cases where this happens, such as: +- when changing the spec of the object to hold the new piece of information is not possible (e.g., `ReferenceGrant`, from [GEP-709](../gep-709/index.md), when affecting Secrets and Services); +- when the new specification applies at different scopes (different object kinds), making it more maintainable if the declaration is extracted to a separate object, rather than adding new fields representing the same functionality across multiple objects; +- when the augmented behavior is intended to [span across relationships of an object](#spanning-behavior-across-relationships-of-a-target) other than the object that is directly referred in the declaration; +- when the augmented behavior is subject to different RBAC rules than the object it refers to; +- to circumvent having to enforce hard changes to established implementations. + +To put this another way, sometimes we need ways to be able to affect how an object is interpreted in the API, without representing the description of those effects inside the spec of the object. This document describes the ways to design objects to meet use cases like these. + +This document introduces the concept of a "metaresource", a term used to describe the class of objects that _only_ augment the behavior of another Kubernetes object, regardless of what they are targeting. + +"Meta" here is used in its Greek sense of "more comprehensive" or "transcending", and "resource" rather than "object" because "metaresource" is more pronounceable than "meta object". + +Moreover, this document defines a particular class of metaresource, called "policies". Policy kinds have a well-defined structure and behavior, both specified in this GEP. + +From policies emerges the concept of Policy Attachment, which consists of augmenting the behavior of other Kubernetes resources by attaching policies to them. + +After multiple iterations of Gateway API experimenting with policies—whether through common kinds of policies like `BackendTLSPolicy` and `XBackendTrafficPolicy`, or various implementation-specific ones (see [Current use of policies](#current-use-of-policies))—and after rounds of discussion (such as [kubernetes-sigs/gateway-api/discussions#2927](https://github.com/kubernetes-sigs/gateway-api/discussions/2927)), the pattern has evolved into its current form. + +### User stories + +- [Ana](../../concepts/roles-and-personas.md#ana) or [Chihiro](../../concepts/roles-and-personas.md#Chihiro) would like to specify some new behavior for a standard Kubernetes resource, but that resource doesn't have a way to specify the behavior and neither Ana nor Chihiro can modify it. + - For example, Ana may want to add a rate limit to a Kubernetes Service. The Service object itself doesn't have a field for rate limiting, and Ana can't modify the Service object's definition. +- A Gateway API implementer would like to define some implementation-specific behaviors for Gateway API objects that are already standard. + - For example, an implementer might want to provide a way for Chihiro to plug in a WebAssembly module to a particular Gateway listener, including all the configuration required by the module. Support for WebAssembly modules is a feature of this implementation specifically and the Gateway listener spec does not contain fields to declare WebAssembly configuration. +- Chihiro would like a way to allow Ana to specify certain behaviors, but not others, in a very fine-grained way. + - For example, Chihiro might want to allow Ana to specify rate limits for a Service, but not to specify the Service's ports. +- A Gateway API implementer would like to define a way to specify a behavior that applies to a whole hierarchy of objects. + - For example, an implementer might want to define a way to specify a behavior that applies to all HTTPRoutes that are attached to a Gateway. +- A Gateway API implementer would like to define a way to specify a behavior that applies to multiple kinds of objects with a single declaration. + - For example, an implementer might want to define a way to specify a behavior that applies to selected HTTPRoutes and selected TCPRoutes. Even though the HTTPRoute object could otherwise be extended via an implementation-specific filter, the TCPRoute object cannot. +- A third-party provider would like to offer a way to independently extend the behavior of Gateways controlled by one or more Gateway API implementers. + - For example, a provider that knows how to configure Gateways controlled by one or more Gateway API implementers might want to define a way for Gateway API users to activate this feature in a standard way across the supported implementations, without direct involvement of the implementers. + +All [risks and caveats](#tldr) considered, these are in general a few reasons for using metaresources and policies over another (possibly more direct) way to modify the spec ("augment the behavior") of an object: + +* Extending otherwise stable APIs-e.g. to specify additional network settings for the Kubernetes Service object. +* Defining implementation-specific functionalities for otherwise common APIs-e.g. to specify implementation-specific behavior for Gateway API HTTPRoute objects. +* Decoupling concerns for targeting personas with specific functionality and configuration-delegation of responsibilities, fine-grained RBAC, etc. +* Decoupling responsibility over the management and implementation of the metaresources themselves. +* Avoid alternatives based on annotations which are often non-standardized, poorly documented, and generally hard to maintain, in favor of proper, expressive APIs (self-documenting intents) instead. + +### Definitions + +- _**Metaresource**_: a resource that augments the behavior of another resource without modifying the definition of the augmented resource. Metaresources typically specify a _target_ and an _intent_: + - The _target_ of a metaresource is the resource or resources whose behavior the metaresource intends to augment. + - The _intent_ of a metaresource is what augmentation the metaresource will apply. + +- _**Policy**_: an instance of a subclass of metaresources ("policies") whose intent is to specify _rules that control the behavior_ of the target resources. + + Policies are Custom Resource Definitions (CRDs) that MUST comply with a particular [structure](#policy-structure). This structure includes standardized fields for specifying the target(s), policy-specific fields to describe the intended augmentation, and standardized status fields to communicate whether the augmentation is happening or not. + + Policy kinds are typically named _xPolicy_, such as `BackendTLSPolicy` (a policy kind implemented by Gateway API to augment Backends with TLS configuration.) + +- _**Policy Attachment**_: the application of policies, implemented by a controller, to augment the behavior of other Kubernetes objects. -Inherited Policies are further specified in the addendum GEP-2649, Inherited -Policy Attachment. GEP-2649 also describes a set of expected behaviors -for how settings can flow across a defined hierarchy. +### Goals +* Establish a pattern which will be used for any Policy resources included in the Gateway API spec. +* Establish a pattern that must be adopted for any implementation-specific Policy resources used with Gateway API resources. +* Discuss the problems with communicating status for metaresource and policy objects, and suggest mechanisms that APIs can use to mitigate some of them. +* Provide a way to distinguish between required and default values for all policy API implementations. +* Enable Policy Attachment at all relevant scopes in Gateway API, including Gateways, Routes, Backends, along with how values should flow across a hierarchy if necessary. +* Ensure the Policy Attachment specification is generic and forward thinking enough that it could be easily adapted to other grouping mechanisms like Namespaces in the future. +* Provide a means of attachment that works for both ingress and mesh implementations of Gateway API. +* Provide a consistent specification that will ensure familiarity between both API-defined and implementation-specific Policy resources so they can both be interpreted the same way. +* Provide a reference pattern to other implementations of metaresource and policy APIs outside of Gateway API, that are based on similar concepts (i.e., augmenting the behavior of other Kubernetes objects, attachment points, nested contexts and inheritance, Defaults & Overrides, etc.) -## Goals +### Out of scope -* Establish a pattern for Policy resources which will be used for any policies - included in the Gateway API spec -* Establish a pattern for Policy attachment, whether Direct or Inherited, - which must be used for any implementation specific policies used with - Gateway API resources -* Discuss the problems with communicating status for Policy objects, and suggest - mechanisms that Policy APIs can use to mitigate some of them. -* Provide a way to distinguish between required and default values for all - policy API implementations -* Enable policy attachment at all relevant scopes, including Gateways, Routes, - Backends, along with how values should flow across a hierarchy if necessary -* Ensure the policy attachment specification is generic and forward thinking - enough that it could be easily adapted to other grouping mechanisms like - Namespaces in the future -* Provide a means of attachment that works for both ingress and mesh - implementations of this API -* Provide a consistent specification that will ensure familiarity between both - included and implementation-specific policies so they can both be interpreted - the same way. +* Define all potential metaresource and/or policy kinds that may be attached to resources. -## Deferred Goals and Discussions +## Guide-level explanation -* Should Policy objects be able to target more than one object? At the time of - writing, the answer to this is _no_, in the interests of managing complexity - in one change. But this rule can and should be discussed and reexamined in - light of community feedback that users _really_ want this. Any discussion will - need to consider the complexity tradeoffs here. +This section describes concepts and aspects for designing and using metaresource and policy objects. + +It reinforces previously defined concepts and defines other important ones such as the concepts of [Hierarchy of target kinds](#hierarchy-of-target-kinds), [Merge strategy](#merge-strategies), and [Effective policies](#effective-policies). It also describes an [Abstract process for calculating effective specs](#abstract-process-for-calculating-effective-policies) out of a set of Policy objects. + +Designers of new policy kinds are encouraged to read this section top-to-bottom while users of policies may refer to it more specifically, to further understand about the design decisions and thus make inferences about the behavior and alternatives for a given Policy kind. + +### Metaresources + +As defined above, a metaresource is a resource whose purpose is to augment the behavior of some other resource. At its most basic level, the metaresource pattern consists of: +- A user defines a metaresource describing both the target resource(s) they want to augment, and the intent of the augmentation. +- The controller(s) implementing the metaresource notices the metaresource and applies the intent to the target resource(s). +- The controller(s) implementing the metaresource reports the status of the metaresource, indicating whether the intent is being applied or not. + +In the real world, of course, things can be much more complex. There may be multiple conflicting metaresources, or the user might attempt to apply a metaresource that they aren't allowed to, or there may be errors in the metaresources. The controller(s) implementing the metaresources MUST be able to handle all of these cases, and MUST communicate status correctly in all situations. + +Additionally, since this GEP defines a pattern rather than an API field or resource, it is not possible to enumerate all possible metaresource and/or policy kinds in this GEP. This means that policies MUST follow a well-known structure so that Gateway API users and implementations can work with them in a consistent way, and this GEP focuses on that well-known structure. + +#### Policy structure + +A typical Policy resource might look like the following: + +```yaml +apiVersion: policies.controller.io/v1 +kind: ColorPolicy +metadata: + name: my-color-policy +spec: + targetRefs: ## target objects whose behaviour to augment + - group: gateway.networking.k8s.io/v1 + kind: Gateway + name: my-gateway + color: blue ## the "spec proper", i.e., one or more fields that specify the intent – e.g. to color the traffic flowing through the my-gateway Gateway blue +``` + +_(This is a hypothetical example: no ColorPolicy resource is defined in Gateway API.)_ + +- Every policy MUST include a `targetRefs` stanza specifying which resource(s) the policy intends to augment. +- Every policy MUST include one or more implementation-specific fields specifying how the policy will augment the behavior of the target resource(s). This is informally referred to as the "spec proper." +- A policy MAY include additional fields specifying a so-called [_merge strategy_](#merge-strategies), i.e., how the policy should be combined with other policies that affect the same target resource(s). This typically include directives for dealing with conflicting and/or missing specs. + +#### The `targetRefs` stanza + +The targets of a Policy object are other Kubernetes objects (or parts of objects), including virtual kinds. They are referenced in the policies by name or using other referencing mechanisms. + +In order to fit within the framework described in this document, the targets MUST be declared within a `targetRefs` field within the spec of the Policy object. + +All kinds of references SHOULD also specify Group, Version and Kind (GVK) information as part of the target (unless the API ensures no more than one kind of object can be targeted). + +##### Reference by name + +The target reference includes the exact name of an object whose behavior to augment. E.g.: + +```yaml +apiVersion: policies.controller.io/v1 +kind: ColorPolicy +metadata: + name: my-color-policy +spec: + targetRefs: + - group: gateway.networking.k8s.io/v1 + kind: Gateway + name: my-gateway ## name of the target object of Gateway kind + color: blue +``` -## Out of scope +
+ Implementation tip -* Define all potential policies that may be attached to resources -* Design the full structure and configuration of policies + This targeting method can be implemented in Golang by using a type such as Gateway API's [`LocalPolicyTargetReference`](https://pkg.go.dev/sigs.k8s.io/gateway-api/apis/v1alpha2#LocalPolicyTargetReference) type. E.g.: -## Background and concepts + ```go + package color -When designing Gateway API, one of the things we’ve found is that we often need to be -able change the behavior of objects without being able to make changes to the spec -of those objects. Sometimes, this is because we can’t change the spec of the object -to hold the information we need ( ReferenceGrant, from -[GEP-709](../gep-709/index.md), affecting Secrets -and Services is an example, as is Direct Policy Attachment), and sometimes it’s -because we want the behavior change to flow across multiple objects -(this is what Inherited Policy Attachment is for). + import ( + metav1 "k8s.io/apimachinery/pkg/apis/meta/v1" + gatewayapiv1alpha2 "sigs.k8s.io/gateway-api/apis/v1alpha2" + ) -To put this another way, sometimes we need ways to be able to affect how an object -is interpreted in the API, without representing the description of those effects -inside the spec of the object. + type ColorPolicy struct { + metav1.TypeMeta `json:",inline"` + metav1.ObjectMeta `json:"metadata,omitempty"` -This document describes the ways we design objects to meet these two use cases, -and why you might choose one or the other. + // Spec defines the desired state of the policy. + Spec ColorPolicySpec `json:"spec"` -We use the term “metaresource” to describe the class of objects that _only_ augment -the behavior of another Kubernetes object, regardless of what they are targeting. + // Status defines the current state of the policy. + Status ColorPolicyStatus `json:"status,omitempty"` + } -“Meta” here is used in its Greek sense of “more comprehensive” -or “transcending”, and “resource” rather than “object” because “metaresource” -is more pronounceable than “metaobject”. Additionally, a single word is better -than a phrase like “wrapper object” or “wrapper resource” overall, although both -of those terms are effectively synonymous with “metaresource”. + type ColorPolicySpec struct { + // TargetRefs specify the targets of the policy by name. + // The following kinds are supported: … + // +listType=map + // +listMapKey=group + // +listMapKey=kind + // +listMapKey=name + // +kubebuilder:validation:MinItems=1 + // +kubebuilder:validation:MaxItems=16 + TargetRefs []gatewayapiv1alpha2.LocalPolicyTargetReference `json:"targetRefs"` -A "Policy Attachment" is a metaresource that affects the fields in existing objects -(like Gateway or Routes), or influences the configuration that's generated in an -underlying data plane. + // rest of the spec ("spec proper")… + } + ``` +
-"Direct Policy Attachment" is when a Policy object references a single object _only_, -and only modifies the fields of or the configuration associated with that object. +##### Cross namespace references -"Inherited Policy Attachment" is when a Policy object references a single object -_and any child objects of that object_ (according to some defined hierarchy), and -modifies fields of the child objects, or configuration associated with the child -objects. +Policies can opt for allowing instances to target objects across Kubernetes namespaces, in which case an optional `namespace` field MUST be defined with the target reference. -In either case, a Policy may either affect an object by controlling the value -of one of the existing _fields_ in the `spec` of an object, or it may add -additional fields that are _not_ in the `spec` of the object. +!!! warning + Although not strictly forbidden, this is in general discouraged due to [discoverability](#the-discoverability-problem) issues and security implications. Cross namespace references can often lead to escalation of privileges associated with the [Confused deputy problem](https://en.wikipedia.org/wiki/Confused_deputy_problem). -### Why use Policy Attachment at all? +Implementations that opt for designing policies that allow for cross namespace references MUST support one of the following combined approaches, to address the security concern: +- The policy is paired with [ReferenceGrants](https://gateway-api.sigs.k8s.io/api-types/referencegrant/?h=referencegrant) or some other form of equivalent handshake that ensures that the target is accepting the policy. +- The policy applied client-side and does not grant the client any additional access or permissions than it would otherwise have. +
+ Implementation tip -Consistent UX across GW implementations + This targeting method can be implemented in Golang by using a type such as Gateway API's [`NamespacedPolicyTargetReference`](https://pkg.go.dev/sigs.k8s.io/gateway-api/apis/v1alpha2#NamespacedPolicyTargetReference) type. E.g.: -Support for common tooling such as gwctl that can compute and display effective policies at each layer + ```go + package color -Avoid annotation hell + import ( + metav1 "k8s.io/apimachinery/pkg/apis/meta/v1" + gatewayapiv1alpha2 "sigs.k8s.io/gateway-api/apis/v1alpha2" + ) + type ColorPolicy struct { + metav1.TypeMeta `json:",inline"` + metav1.ObjectMeta `json:"metadata,omitempty"` -### Direct Policy Attachment + // Spec defines the desired state of the policy. + Spec ColorPolicySpec `json:"spec"` -For more description of the details of Direct Policy Attachment, -see [GEP-2648](../gep-2648/index.md). + // Status defines the current state of the policy. + Status ColorPolicyStatus `json:"status,omitempty"` + } -### Inherited Policy Attachment + type ColorPolicySpec struct { + // TargetRefs specify the targets of the policy by name. + // The following kinds are supported: … + // +listType=map + // +listMapKey=group + // +listMapKey=kind + // +listMapKey=namespace + // +listMapKey=name + // +kubebuilder:validation:MinItems=1 + // +kubebuilder:validation:MaxItems=16 + TargetRefs []gatewayapiv1alpha2.NamespacedPolicyTargetReference `json:"targetRefs"` -For more description of the details of Inherited Policy Attachment, -see [GEP-2649](../gep-2649/index.md). + // rest of the spec ("spec proper")… + } + ``` +
-### How to determine if a Policy is a Direct or Inherited one +##### Targeting sections of an object -The basic rule here is "Does the Policy affect _any_ other object aside from -the one it targets?" If not, it's Direct. If so, it's Inherited. +Policy CRDs can offer the option to target a section of an object whose spec defines sections uniquely identifiable by name. These policies typically include a field `spec.targetRefs.sectionName` that can be used along with compatible kinds. -The reason for this is that Direct Attached Policies make it _much_ easier to -understand the state of the system, and so can use a more simple `status` design. -However, Inherited Policies require knowledge of more resources, and consequently -a more complex status design. +E.g. – a policy that specifies additional behaviour for a given listener of a Gateway API Gateway object, though not for all listeners of the Gateway, MUST (i) require the Gateway listener to be uniquely named and (ii) provide the `sectionName` field of target reference with the name of the targeted listener. -#### Policy type examples +```yaml +apiVersion: policies.controller.io/v1 +kind: ColorPolicy +metadata: + name: my-color-policy +spec: + targetRefs: + - group: gateway.networking.k8s.io/v1 + kind: Gateway + name: my-gateway + sectionName: https ## unique name of a listener specified in the object of Gateway kind + color: blue +``` -The separate GEPs have more examples of policies of each type, but here are two -small examples. Please see the separated GEPs for more examples. +
+ Implementation tip -**BackendTLSPolicy** is the canonical example of a Direct Attached Policy because -it _only_ affects the Service that the Policy attaches to, and affects how that -Service is consumed. But you can know everything you need to about the Service -and BackendTLSPolicy just by looking at those two objects. + This targeting method can be implemented in Golang by using a type such as Gateway API's [`LocalPolicyTargetReferenceWithSectionName`](https://pkg.go.dev/sigs.k8s.io/gateway-api/apis/v1alpha2#LocalPolicyTargetReferenceWithSectionName) type. E.g.: -**Hypothetical max body size Policy**: Kate Osborn -[raised this on Slack](https://kubernetes.slack.com/archives/CR0H13KGA/p1708723178714389), -asking if a policy applied to a Gateway configures a data plane setting that -affects routes counts as an Inherited Policy, giving the example of a max body -size Policy. + ```go + package color -In this sort of case, the object does count as an Inherited Policy because -it's affecting not just the properties of the Gateway, but properties of the -Routes attached to it (and you thus need to know about the Policy, the Gateway, -_and_ the Routes to be able to understand the system). + import ( + metav1 "k8s.io/apimachinery/pkg/apis/meta/v1" + gatewayapiv1alpha2 "sigs.k8s.io/gateway-api/apis/v1alpha2" + ) + type ColorPolicy struct { + metav1.TypeMeta `json:",inline"` + metav1.ObjectMeta `json:"metadata,omitempty"` -## Naming Policy objects + // Spec defines the desired state of the policy. + Spec ColorPolicySpec `json:"spec"` -Although Direct and Inherited Policies behave differently in many respects, in -general they should be named using similar rules. + // Status defines the current state of the policy. + Status ColorPolicyStatus `json:"status,omitempty"` + } -Policy objects MUST be clearly named so as to indicate that they are Policy -metaresources. + type ColorPolicySpec struct { + // TargetRefs specify the targets of the policy by name. + // The following kinds are supported: … + // +listType=map + // +listMapKey=group + // +listMapKey=kind + // +listMapKey=name + // +listMapKey=sectionName + // +kubebuilder:validation:MinItems=1 + // +kubebuilder:validation:MaxItems=16 + TargetRefs []gatewayapiv1alpha2.LocalPolicyTargetReferenceWithSectionName `json:"targetRefs"` -The simplest way to do that is to ensure that the type's name contains the `Policy` -string. + // rest of the spec ("spec proper")… + } + ``` +
-Implementations SHOULD use `Policy` as the last part of the names of object types -that use this pattern. +##### Targeting virtual types -If an implementation does not, then they MUST clearly document what objects -are Policy metaresources in their documentation. Again, this is _not recommended_ -without a _very_ good reason. +_Virtual types_ are defined as those with a group unknown by the Kubernetes API server. They can be used to apply policies to objects that are not actual Kubernetes resources nor Kubernetes custom resources. Rather, virtual types have a meaning for the controller(s) responsible for implementing the policy. -### Targeting Virtual Types -In some cases (likely limited to mesh) we may want to apply policies to requests -to external services. To accomplish this, implementations MAY choose to support -a reference to a virtual resource type. For example: +An example of such, from Gateway API mesh case, would be a hypothetical need for defining a policy to "color requests" to external services. To accomplish this, implementations MAY choose to support a reference to a virtual resource type `ExternalService`, unknown by the Kuberentes API server but known by the controller. E.g.: ```yaml -apiVersion: networking.acme.io/v1alpha1 -kind: RetryPolicy +apiVersion: policies.controller.io/v1 +kind: ColorPolicy metadata: - name: foo + name: my-color-policy spec: - default: - maxRetries: 5 - targetRef: - group: networking.acme.io + targetRefs: + - group: networking.acme.io kind: ExternalService name: foo.com + color: blue ``` -### Conflict Resolution -It is possible for multiple policies to target the same object _and_ the same -fields inside that object. If multiple policy resources target -the same resource _and_ have an identical field specified with different values, -precedence MUST be determined in order of the following criteria, continuing on -ties: - -* Direct Policies override Inherited Policies. If preventing settings from - being overwritten is important, implementations should only use Inherited - Policies, and the `override` stanza that implies. Note also that it's not - intended that Direct and Inherited Policies should overlap, so this should - only come up in exceptional circumstances. -* Inside Inherited Policies, the same setting in `overrides` beats the one in - `defaults`. -* The older Policy based on creation timestamp beats a newer one. For example, -  a Policy with a creation timestamp of "2021-07-15 01:02:03" MUST be given -  precedence over a Policy with a creation timestamp of "2021-07-15 01:02:04". - The goal is to ensure that introducing new, unused policies doesn’t disrupt - existing ones, since changing active rules can cause outages while altering - unused policies poses no risk. -* The Policy appearing first in alphabetical order by `{namespace}/{name}`. For - example, foo/bar is given precedence over foo/baz. - -For a better user experience, a validating webhook can be implemented to prevent -these kinds of conflicts all together. - -## Status and the Discoverability Problem - -So far, this document has talked about what Policy Attachment is, different types -of attachment, and how those attachments work. - -Probably the biggest impediment to this GEP moving forward is the discoverability -problem; that is, it’s critical that an object owner be able to know what policy -is affecting their object, and ideally its contents. - -To understand this a bit better, let’s consider this parable, with thanks to Flynn: - -### The Parable - -It's a sunny Wednesday afternoon, and the lead microservices developer for -Evil Genius Cupcakes is windsurfing. Work has been eating Ana alive for the -past two and a half weeks, but after successfully deploying version 3.6.0 of -the `baker` service this morning, she's escaped early to try to unwind a bit. - -Her shoulders are just starting to unknot when her phone pings with a text -from Chihiro, down in the NOC. Waterproof phones are a blessing, but also a -curse. - -**Chihiro**: _Hey Ana. Things are still running, more or less, but latencies -on everything in the `baker` namespace are crazy high after your last rollout, -and `baker` itself has a weirdly high load. Sorry to interrupt you on the lake -but can you take a look? Thanks!!_ - -Ana stares at the phone for a long moment, heart sinking, then sighs and -turns back to shore. - -What she finds when dries off and grabs her laptop is strange. `baker` does -seem to be taking much more load than its clients are sending, and its clients -report much higher latencies than they’d expect. She doublechecks the -Deployment, the Service, and all the HTTPRoutes around `baker`; everything -looks good. `baker`’s logs show her mostly failed requests... with a lot of -duplicates? Ana checks her HTTPRoute again, though she's pretty sure you -can't configure retries there, and finds nothing. But it definitely looks like -clients are retrying when they shouldn’t be. +As a pattern, targeting virtual types has prior art in Kubernetes with the Role Based Access Control (RBAC), where Roles and ClusterRoles can be used to specify permissions regarding any kind of resource including non-Kubernetes resources. -She pings Chihiro. +### Scoping the intent -**Ana**: _Hey Chihiro. Something weird is up, looks like requests to `baker` -are failing but getting retried??_ +The targets of a policy must be interpreted within a given semantics that is proper to the policy kind. Sometimes the declared targets define the direct scope of application of the policy. Inversely, depending on the policy kind, the targets can also represent indirections to the actual scope of application of the policy. -A minute later they answer. +Two different policy kinds that support targeting the same kind X may have very different semantics. This is not only because the policy kinds' purposes differ, but also because the scopes induced by specifying instances of X as targets differ, with consequences to the entire mechanics of calculating and applying the augmented behavior in each case. -**Chihiro**: 🤷 _Did you configure retries?_ +#### Spanning behavior across relationships of a target -**Ana**: _Dude. I don’t even know how to._ 😂 +Often, the semantics of scoping a policy is tightly related to the connections the target kind has with other kinds of objects. In this scenario, targeting a given resource kind may have the semantics of spanning effect across these other objects to which the target is related. -**Chihiro**: _You just attach a RetryPolicy to your HTTPRoute._ +Typically, the relationships between direct and indirect target kinds are organized in a _hierarchy of nested contexts_. -**Ana**: _Nope. Definitely didn’t do that._ +An example of such is a policy that targets a Namespace. Depending on the design of the policy kind, the policy object may declare intent to affect the behavior of the namespace itself (for what concerns the implementation of Namespaces in Kubernetes) or alternatively it can act as a means to affect the behavior of other objects that exist in the referred namespace (e.g. ConfigMaps). While in the former case, the (direct) target object is the Namespace itself, in the latter the (indirect) target is a set of objects of a different kind (e.g. ConfigMaps.) -She types `kubectl get retrypolicy -n baker` and gets a permission error. +Another example of this semantic difference in the context of Gateway API objects is a policy that targets the `Gateway` kind, which can be: +* a way to augment the behavior of the `Gateway` object itself (e.g. reconcile cloud infrastructure provider settings from the spec declared by the `Gateway` according to the rules specified by the policy attached to the `Gateway`), or +* a means to augment the behavior of all `HTTPRoute` objects attached to the `Gateway` (in a way that every new `HTTPRoute` that gets created or modified so it enters the context of the `Gateway` is automatically put in the scope of the policy.) -**Ana**: _Huh, I actually don’t have permissions for RetryPolicy._ 🤔 +#### Declared targets versus Effective targets -**Chihiro**: 🤷 _Feels like you should but OK, guess that can’t be it._ +The target kinds specified in the `targetRefs` stanza of a policy are referred to as *Declared target* kinds. -Minutes pass while both look at logs. +These are distinct from *Effective target* kinds, which are the kinds of target objects whose behaviors are actually augmented by the policy. That occurs when declared targets are not equal to the actual targets augmented by the policy, but rather serve as a means for reaching other levels (typically lower levels) of a hierarchy of related object kinds ("hierarchy of nested contexts"). -**Chihiro**: _I’m an idiot. There’s a RetryPolicy for the whole namespace – -sorry, too many policies in the dashboard and I missed it. Deleting that since -you don’t want retries._ +To avoid ambiguity in the interpretation of the targets, policy designs MUST clearly define the extent of the effects of the policy respectively to the object kinds they can target (semantics of scoping a policy). This can be done via documentation and it typically refers to a known hierarchy of resource kinds. -**Ana**: _Are you sure that’s a good–_ +### Conflicting specs, Inheritance, Merge strategies, and Effective policies -Ana’s phone shrills while she’s typing, and she drops it. When she picks it -up again she sees a stack of alerts. She goes pale as she quickly flips -through them: there’s one for every single service in the `baker` namespace. +With policies (and metaresources in general), declaring additional specifications to objects from the outside will often yield conflicts that need to be addressed. -**Ana**: _PUT IT BACK!!_ +Multiple policy resources may (directly or indirectly) affect the same object (same effective target), thus posing a conflict to be resolved regarding which amongst the two declared intents the controller shall honor, i.e. which spec to use to augment the behavior of the object. -**Chihiro**: _Just did. Be glad you couldn't hear all the alarms here._ 😕 +Another way that conflicts may arise is by allowing policies to target different levels of the same hierarchy. This includes hierarchies between different kinds of objects, as well as hierarchies between objects and sections of these objects. -**Ana**: _What the hell just happened??_ +There are multiple ways to resolve these conflicts. -**Chihiro**: _At a guess, all the workloads in the `baker` namespace actually -fail a lot, but they seem OK because there are retries across the whole -namespace?_ 🤔 +In some cases, for example, the most recent spec between two conflicting policies may be desired to win, whereas in other cases it might be the oldest. In a different scenario, the winning spec may not be based on creation timestamp but rather determined by the hierarchical level that the policy applies (e.g. specs defined higher in the hierarchy wins over specs defined lower, or the other way around). And sometimes other criteria must be adopted to resolve conflicts between policies that are ultimately affecting the same target. -Ana's blood runs cold. +This section describes the concepts and rules for dealing with conflicting specs, including the concept of [hierarchy and the semantics of inheritance](#hierarchy-of-target-kinds), and how to calculate so-called [_Effective policies_](#effective-policies) by applying [_Merge strategies_](#merge-strategies), two other concepts defined in this section. -**Chihiro**: _Yeah. Looking a little closer, I think your `baker` rollout this -morning would have failed without those retries._ 😕 +#### Hierarchy of target kinds -There is a pause while Ana's mind races through increasingly unpleasant -possibilities. +Policy CRDs MUST clearly define the hierarchy of target resources they have effects upon, as well as the [semantics](#scoping-the-intent) of targeting each kind in this hierarchy. -**Ana**: _I don't even know where to start here. How long did that -RetryPolicy go in? Is it the only thing like it?_ +The best way to visualize this hierarchy-and therefore the instances of objects organized by the hierarchy-is in the form of a Directed Acyclic Graph (DAG) whose roots are the least specific objects and the leaves are the most specific ones (and ultimately the effective targets of the policies). Using a DAG to represent the hierarchy of effective targets ensures that all the relevant objects are represented, and makes the calculation of corresponding combinatorial specs much easier. -**Chihiro**: _Didn’t look closely before deleting it, but I think it said a few -months ago. And there are lots of different kinds of policy and lots of -individual policies, hang on a minute..._ +Example of a DAG for Gateway API resources: -**Chihiro**: _Looks like about 47 for your chunk of the world, a couple hundred -system-wide._ +```mermaid +--- +config: + look: handDrawn + theme: neutral +--- +graph + gc@{ shape: rect, label: "GatewayClass 1" } + g@{ shape: rect, label: "Gateway 1" } + r1@{ shape: rect, label: "Route 1" } + r2@{ shape: rect, label: "Route 2" } + b@{ shape: rect, label: "Backend 1" } -**Ana**: 😱 _Can you tell me what they’re doing for each of our services? I -can’t even_ look _at these things._ 😕 + gc --> g + g --> r1 + g --> r2 + r1 --> b + r2 --> b +``` -**Chihiro**: _That's gonna take awhile. Our tooling to show us which policies -bind to a given workload doesn't go the other direction._ +For any given path within the DAG, nodes closer to a root are considered "higher" in the hierarchy, while nodes closer to a leaf are "lower." Higher nodes define broader, less specific configurations, whereas lower nodes define more specific ones. -**Ana**: _...wait. You have to_ build tools _to know if retries are turned on??_ +Lower levels in a hierarchy (e.g., more specific kinds) *inherit* the definitions applied at the higher levels (e.g. less specific kinds), in such a way that higher level rules may be understood as having an "umbrella effect" over everything beneath. -Pause. +E.g., given the Gateway API’s hierarchy of network resources for the ingress use case `GatewayClass` > `Gateway` > `HTTPRoute` > `Backend`. A policy that attaches to a `GatewayClass` object, if defined as a policy kind ultimately to augment the behavior of `HTTPRoute` objects, affects all `Gateways` under the `GatewayClass`, as well as all `HTTPRoutes` under those `Gateways`. Any other instance of this policy kind targeting a lower level than the `GatewayClass` (e.g. `Gateway` or `HTTPRoute`, assuming it's supported) should be treated as a conflict against the higher level policy spec in the specific scope that is rooted at the lower level target, i.e., for the subset of the topology that is afftected by both policies. -**Chihiro**: _Policy attachment is more complex than we’d like, yeah._ 😐 -_Look, how about roll back your `baker` change for now? We can get together in -the morning and start sorting this out._ +Conflicts between policies ultimately affecting the same scope MUST be resolved into so-called [*Effective policies*](#effective-policies), according to some defined [*merge strategies*](#merge-strategies). -Ana shakes her head and rolls back her edits to the `baker` Deployment, then -sits looking out over the lake as the deployment progresses. +#### Effective policies -**Ana**: _Done. Are things happier now?_ +The DAG that represents the hierarchy of targetable objects works as a map to orderly resolve, for each [effective target](#declared-targets-versus-effective-targets), a combinatorial spec that MUST be computed from the set of policies affecting the target. This combinatorial spec of each effective target is referred to as the *Effective policy*. -**Chihiro**: _Looks like, thanks. Reckon you can get back to your sailboard._ 🙂 +The process of calculating Effective policies consists of walking the hierarchy of target objects, from least specific to most specific (i.e., "top-down" or, equivalently, from the roots towards the leaves of the DAG of target objects) or from most specific to least specific ("bottom-up"), map reducing to a single policy spec each pair of policies adjacent to each other in the hierarchy, by applying at each step one of the supported [*merge strategies*](#merge-strategies) (described below), until no more than one spec remains for each effective target. -Ana sighs. +Example of Effective policies based on a hierarchy of Gateway API resources: -**Ana**: _Wish I could. Wind’s died down, though, and it'll be dark soon. -Just gonna head home._ +```mermaid +--- +config: + look: handDrawn + theme: neutral +--- +graph + gc@{ shape: rect, label: "GatewayClass 1" } + g@{ shape: rect, label: "Gateway 1" } + r1@{ shape: rect, label: "Route 1" } + r2@{ shape: rect, label: "Route 2" } + b@{ shape: rect, label: "Backend 1" } + p1@{ shape: stadium, label: "Policy 1" } + p2@{ shape: stadium, label: "Policy 2" } -**Chihiro**: _Ouch. Sorry to hear that._ 😐 + gc --> g + g --> r1 + g --> r2 + r1 --> b + r2 --> b -One more look out at the lake. + p1 -.-> g + p2 -.-> r1 +``` -**Ana**: _Thanks for the help. Wish we’d found better answers._ 😢 +The above yields 2 Effective policies: +- For `Route 1`: some combination of `Policy 1` and `Policy 2` +- For `Route 2`: equal to `Policy 1` -### The Problem, restated -What this parable makes clear is that, in the absence of information about what -Policy is affecting an object, it’s very easy to make poor decisions. +#### Merge strategies -It’s critical that this proposal solve the problem of showing up to three things, -listed in increasing order of desirability: +If multiple policies have the same scope (that is, multiple CRs based on the same Policy kind affect the same [effective target](#declared-targets-versus-effective-targets)), this is considered to be a _conflict_. -- _That_ some Policy is affecting a particular object -- _Which_ Policy is (or Policies are) affecting a particular object -- _What_ settings in the Policy are affecting the object. +Conflicts MUST be resolved according to a defined _merge strategy_. A merge strategy is a function that receives two conflicting specs and returns a new spec with the conflict resolved. -In the parable, if Ana and Chihiro had known that there were Policies affecting -the relevant object, then they could have gone looking for the relevant Policies -and things would have played out differently. If they knew which Policies, they -would need to look less hard, and if they knew what the settings being applied -were, then the parable would have been able to be very short indeed. +This GEP defines the following merge strategies, specified in the subsections below: +* None +* Atomic defaults +* Atomic overrides +* Patch defaults +* Patch overrides +* Custom -(There’s also another use case to consider, in that Chihiro should have been able -to see that the Policy on the namespace was in use in many places before deleting -it.) +Policy CRDs MUST implement at least one of the merge strategies listed above. -To put this another way, Policy Attachment is effectively adding a fourth Persona, -the Policy Admin, to Gateway API’s persona list, and without a solution to the -discoverability problem, their actions are largely invisible to the Application -Developer. Not only that, but their concerns cut across the previously established -levels. +Policy CRD that implement more than one merge strategy MUST provide a way for users to select the merge strategy at runtime. This typically involves defining additional fields that users can configure at individual Policy CRs or settings of the controllers implementing the policy. -![Gateway API diagram with Policy Admin](images/713-the-diagram-with-policy-admin.png) +##### Conflict resolution rules +In a conflict resolution scenario between two specs (two policies), one spec MUST be assigned as the _established_ spec and the other one as the _challenger_ spec, according to rules specified in this GEP. -From the Policy Admin’s point of view, they need to know across their whole remit -(which conceivably could be the whole cluster): +Knowing the distinction between _established_ and _challenger_ is useful to determine which and how a particular merge strategy will be applied. For Policy CRDs that let users specify merge strategies at individual Policy CRs, the spec assigned as _established_ MUST dictate the merge strategy to apply to resolve a conflict. -- _What_ Policy has been created -- _Where_ it’s applied -- _What_ the resultant policy is saying +In other words: +- When the Policy CRD allows specifying the merge strategy at individual CRs, then `established ⇒ 𝑓`. +- When a merge strategy `𝑓` is known (e.g., due to dictated by _established_ or, implicitly, due to only supported strategy associated with the Policy CRD), then `𝑓(established, challenger) ?= 𝑓(challenger, established)`, including occasionally `𝑓(established, challenger) ≠ 𝑓(challenger, established)`. -Which again, come down to discoverability, and can probably be addressed in similar -ways at an API level to the Application Developer's concerns. +With the exception of the **None** merge strategy, the following rules, continuing on ties, MUST be followed to assign which spec (which policy object) is the _established_ and which one is the _challenger_: +1. Between two policies targeting at different levels of the hierarchy, the one attached higher (less specific) MUST be assigned as the _established_ one. +2. Between two policies targeting at the same level of the hierarchy, the older policy based on creation timestamp MUST be assigned as the _established_ one. +3. Between two policies targeting at the same level of the hierarchy and identical creation timestamps, the policy appearing first in alphabetical order by `{namespace}/{name}` MUST be assigned as the _established_ one. -An important note here is that a key piece of information for Policy Admins and -Cluster Operators is “How many things does this Policy affect?”. In the parable, -this would have enabled Chihiro to know that deleting the Namespace Policy would -affect many other people than just Ana. +##### Merge strategy: None -### Problems we need to solve +The spec (policy resource) with the oldest creation timestamp MUST be considered the _established_ spec and that spec beats all _challenger_ specs (policy resources with newer creation timestamps). In short: `𝑓(established = oldest, challenger) → established`. -Before we can get into solutions, we need to discuss the problems that solutions -may need to solve, so that we have some criteria for evaluating those solutions. +In case the conflicting policy resources have identical creation timestamps, the one appearing first in alphabetical order by `{namespace}/{name}` MUST be considered as the _established_ one and that spec beats all others (i.e., beats all _challenger_ specs). -#### User discoverability +In other words, for the **None** merge strategy, rules ② → ③ of the [Conflict resolution rules](#conflict-resolution-rules) MUST be used to assign the _established_ and _challenger_ specs, and the _established_ spec (policy resource) always wins. All _challenger_ specs (policy resources) MUST be rejected. -Let's go through the various users of Gateway API and what they need to know about -Policy Attachment. +For all policies rejected due the application of the **None** merge estrategy, the [`Accepted`](#policy-status) status condition of the policy SHOULD be set to false. -In all of these cases, we should aim to keep the troubleshooting distance low; -that is, that there should be a minimum of hops required between objects from the -one owned by the user to the one responsible for a setting. +The **None** merge strategy MUST NOT be implemented in combination with any other merge strategy. I.e., if the Policy CRD implements the **None** merge strategy, then no other merge strategy MUST be implemented by the Policy CRD. -Another way to think of the troubleshooting distance in this context is "How many -`kubectl` commands would the user need to do to understand that a Policy is relevant, -which Policy is relevant, and what configuration the full set of Policy is setting?" +Policy kinds that do not specify any merge strategy and only support targeting a single kind, with [Declared target equal to Effective target](#declared-targets-versus-effective-targets), by default MUST implement the **None** merge strategy. (See the definition of [Direct](#direct) class of policies below.) -##### Application Developer Discoverability +##### Merge strategy: Atomic defaults -How does Ana, or any Application Developer who owns one or more Route objects know -that their object is affected by Policy, which Policy is affecting it, and what -the content of the Policy is? +Between two specs (two policy resources) in conflict, the _challenger_ spec beats the _established_ one. The conflicting specs MUST be treated as atomic units (indivisible), therefore the effective policy's spec proper MUST be set to equal to the winning spec in its entirety (rather than parts ot it.) In short: `𝑓(established, challenger) → challenger`. -The best outcome is that Ana needs to look only at a specific route to know what -Policy settings are being applied to that Route, and where they come from. -However, some of the other problems below make it very difficult to achieve this. +For example, if two policies are attached at different levels of the hierarchy, e.g. `Gateway` and `HTTPRoute`, by application of the [Conflict resolution rules](#conflict-resolution-rules), the policy attached to the `Gateway` (higher, less specific level) will be considered the _established_ spec, whereas the policy attached to the `HTTPRoute` (lower, more specific level) will be considered the _challenger_ spec. By applying the **Atomic defaults** merge strategy, the effective policy is set to equal to the spec proper of the policy attached to the `HTTPRoute`, and the policy attached to the `Gateway` MUST NOT be enforced in the scope of the `HTTPRoute` augmented by the effective policy (although occasionally it might in the scope of other effective targets, i.e., other HTTPRoutes). -##### Policy Admin Discoverability +Policy kinds that do not specify any merge strategy and support targeting multiple effective kinds MUST by default implement the **Atomic Defaults** merge strategy. -How does the Policy Admin know what Policy is applied where, and what the content -of that Policy is? -How do they validate that Policy is being used in ways acceptable to their organization? -For any given Policy object, how do they know how many places it's being used? +##### Merge strategy: Atomic overrides -##### Cluster Admin Discoverability +Between two specs (two policy resources) in conflict, the _established_ spec beats the _challenger_ one. The conflicting specs MUST be treated as atomic units (indivisible), therefore the effective policy's spec proper MUST be set to equal to the winning spec in its entirety (rather than parts ot it.) In short: `𝑓(established, challenger) → established`. -The Cluster Admin has similar concerns to the Policy Admin, but with a focus on -being able to determine what's relevant when something is broken. +For example, if two policies are attached at different levels of the hierarchy, e.g. `Gateway` and `HTTPRoute`, by application of the [Conflict resolution rules](#conflict-resolution-rules), the policy attached to the `Gateway` (higher, less specific level) will be considered the _established_ spec, whereas the policy attached to the `HTTPRoute` (lower, more specific level) will be considered the _challenger_ spec. By applying the **Atomic overrides** merge strategy, the effective policy is set to equal to the spec proper of the policy attached to the `Gateway`, and the policy attached to the `HTTPRoute` MUST NOT be enforced in the scope of the `Gateway` augmented by the effective policy (although occasionally it might in the scope of other effective targets, i.e., other Gateways). -How does the Cluster Admin know what Policy is applied where, and what the content -of that Policy is? +##### Merge strategy: Patch defaults -For any given Policy object, how do they know how many places it's being used? +Between two specs (two policy resources) in conflict, the _challenger_ spec is applied onto the _established_ one in a [JSON Merge Patch (RFC 7386)](https://datatracker.ietf.org/doc/html/rfc7386) operation. Therefore, the effective policy's spec proper MUST be set to a combination of both specs where the _challenger_ spec beats the _established_ one only for all conflicting fields, at the scalar level, with non-conflicting fields from both specs occasionally remaining. In short: `𝑓(established, challenger) → rfc7386(target = established, patch = challenger)`. -#### Evaluating and Displaying Resultant Policy +For example, if two policies are attached at different levels of the hierarchy, e.g. `Gateway` and `HTTPRoute`, by application of the [Conflict resolution rules](#conflict-resolution-rules), the policy attached to the `Gateway` (higher, less specific level) will be considered the _established_ spec, whereas the policy attached to the `HTTPRoute` (lower, more specific level) will be considered the _challenger_ spec. By applying the **Patch defaults** merge strategy, the effective policy is set to equal to the spec of the policy attached to the `Gateway` JSON-merge-patched using the spec of the policy attached to the `HTTPRoute`, i.e., with any conflicting fields at the scalar level set to their values as specified in the policy attached to the `HTTPRoute`. -For any given Policy type, whether Direct Attached or Inherited, implementations -will need to be able to _calculate_ the resultant set of Policy to be able to -apply that Policy to the correct parts of their data plane configuration. -However, _displaying_ that resultant set of Policy in a way that is straightforward -for the various personas to consume is much harder. +##### Merge strategy: Patch overrides -The easiest possible option for Application Developers would be for the -implementation to make the full resultant set of Policy available in the status -of objects that the Policy affects. However, this runs into a few problems: +Between two specs (two policy resources) in conflict, the _established_ spec is applied onto the _challenger_ one in a [JSON Merge Patch (RFC 7386)](https://datatracker.ietf.org/doc/html/rfc7386) operation. Therefore, the effective policy's spec proper MUST be set to a combination of both specs where the _established_ spec beats the _challenger_ one only for all conflicting fields, at the scalar level, with non-conflicting fields from both specs occasionally remaining. In short: `𝑓(established, challenger) → rfc7386(target = challenger, patch = established)`. -- The status needs to be namespaced by the implementation -- The status could get large if there are a lot of Policy objects affecting an - object -- Building a common data representation pattern that can fit into a single common - schema is not straightforward. -- Updating one Policy object could cause many affected objects to need to be - updated themselves. This sort of fan-out problem can be very bad for apiserver - load, particularly if Policy changes rapidly, there are a lot of objects, or both. +For example, if two policies are attached at different levels of the hierarchy, e.g. `Gateway` and `HTTPRoute`, by application of the [Conflict resolution rules](#conflict-resolution-rules), the policy attached to the `Gateway` (higher, less specific level) will be considered the _established_ spec, whereas the policy attached to the `HTTPRoute` (lower, more specific level) will be considered the _challenger_ spec. By applying the **Patch overrides** merge strategy, the effective policy is set to equal to the spec of the policy attached to the `HTTPRoute` JSON-merge-patched using the spec of the policy attached to the `Gateway`, i.e., with any conflicting fields at the scalar level set to their values as specified in the policy attached to the `Gateway`. -##### Status needs to be namespaced by implementation +##### Custom merge strategies -Because an object can be affected by multiple implementations at once, any status -we add must be namespaced by the implementation. +Implementations MAY specify _custom_ merge strategies. These are implementation-specific strategies where the specs of two policies in conflict are resolved into one following a custom merge algorithm specified by the policy kind. -In Route Parent status, we've used the parentRef plus the controller name for this. +##### Selecting a merge strategy at runtime -For Policy, we can do something similar and namespace by the reference to the -implementation's controller name. +Implementations that support multiple merge strategies associated with a particular Policy kind MUST define how a particular merge strategy can be selected at runtime. I.e., how users can specify their preferred merge strategy to use to resolve the conflicts between Policy CRs of that kind. One of the following approaches SHOULD be adopted for this: +- The Policy CRD allows specifying, at any individual Policy CR, one and only one of the merge strategies associated with the Policy CRD, and that specified merged strategy MUST be used to resolve conflicts involving this Policy CR according to the [Conflict resolution rules](#conflict-resolution-rules) specified in this GEP. +- The controller implementing the policy has its own predefined way to determine among multiple implemented merge strategies which merge strategy to apply to resolve the conflicts between the Policy CRs according to the [Conflict resolution rules](#conflict-resolution-rules) specified in this GEP. This approach MAY include configurations of the controller implementing the Policy kind or any other way other than specifying the merge strategy at individual Policy CRs. -We can't easily namespace by the originating Policy because the source could be -more than one Policy object. +Policy CRDs that let users specify at any individual Policy CR one of multiple implemented merge strategies MUST define a clear structure for the users to do so. -##### Creating common data representation patterns +User MUST NOT be allowed to specify at any individual Policy CR more than one merge strategy at a time. -The problem here is that we need to have a _common_ pattern for including the -details of an _arbitrarily defined_ object, that needs to be included in the base -API. +Two known patterns adopted by Policy implementations that support specifying one of multiple merge strategies in the Policy CRs are: +- The definition of a `strategy` field in the `spec` stanza of the Policy, or equivalentely a `mergeType` field. +- The definition of `defaults` and/or `overrides` fields in the `spec` stanza of the policy wrapping the "spec proper" fields. -So we can't use structured data, because we have no way of knowing what the -structure will be beforehand. +Policy CRDs that define a `defaults` field to specify the merge strategy at individual Policy CRs, in the lack further discrimination of a more specific strategy, SHOULD assume the **Atomic Defaults** merge strategy whenever this field is used to determine the merge strategy. -This suggests that we need to use unstructured data for representing the main -body of an arbitrary Policy object. +Policy CRDs that define an `overrides` field to specify the merge strategy at individual Policy CRs, in the lack further discrimination of a more specific strategy, SHOULD assume the **Atomic Overrides** merge strategy whenever this field is used to determine the merge strategy. -Practically, this will need to be a string representation of the YAML form of the -body of the Policy object (absent the metadata part of every Kubernetes object). +For Policy kinds that implement multiple merge strategies, whenever the merge strategy is not specified, the first of the following merge strategies associated with the Policy kind, in order, SHOULD be assumed: +- Atomic Defaults +- Patch Defaults +- Atomic Overrides +- Patch Overrides +- Custom -Policy Attachment does not mandate anything about the design of the object's top -level except that it must be a Kubernetes object, so the only thing we can rely -on is the presence of the Kubernetes metadata elements: `apiVersion`, `kind`, -and `metadata`. +##### Reflecting the applied merge strategy in the status stanza of the policy -A string representation of the rest of the file is the best we can do here. +Policy implementations SHOULD reflect in the `status` stanza of the policies how the applied merge strategies are altering the effectiveness of the policy spec declared in that particular policy object. Merge strategies referred in the status message MUST use the same name of strategy as defined in this GEP. -##### Fanout status update problems +Whenever possible, each scope targeted by a policy SHOULD be explained in the `status` stanza regarding how they are being affected by the policy due to applying the merge strategies. -The fanout problem is that, when an update takes place in a single object (a -Policy, or an object with a Policy attached), an implementation may need to -update _many_ objects if it needs to place details of what Policy applies, or -what the resultant set of policy is on _every_ object. +Examples of policy status conditions include if a policy has been successfuly programmed to be enforced or if has been overridden, partially or completely, given all the different scopes targeted by the policy and variations to the spec after occasionally merging with other policies. -Historically, this is a risky strategy and needs to be carefully applied, as -it's an excellent way to create apiserver load problems, which can produce a large -range of bad effects for cluster stability. +See the [Policy status](#policy-status) section for more details. -This does not mean that we can't do anything at all that affects multiple objects, -but that we need to carefully consider what information is stored in status so -that _every_ Policy update does not require a status update. +#### Abstract process for calculating Effective policies -#### Solution summary +The following is a description of an abstract process for calculating effective policies. -Because Policy Attachment is a pattern for APIs, not an API, and needs to address -all the problems above, the strategy this GEP proposes is to define a range of -options for increasing the discoverability of Policy resources, and provide -guidelines for when they should be used. +Given: -It's likely that at some stage, the Gateway API CRDs will include some Policy -resources, and these will be designed with all these discoverabiity solutions -in mind. +* the target resource kinds `A`, `B` and `C`, organized in a hierarchy of resource kinds where `A` > `B` > `C`, i.e. `A` is the least specific kind (roots of the hierarchical tree) and `C` is the most specific kind (leaves of the tree)-without loss of generality for cases where these kinds are not necessarily proper Kubernetes kinds, but also possibly named sections of a proper Kubernetes kind or virtual kinds; +* the policy kind `P`, whose instances can target resources of kind `A`, `B` or `C`, ultimately intending to augment the behavior of instances of resource kind `C`; +* the tree of targetable resources `a1` > (`b1` > `c1`, `b2` > (`c1`, `c2`)), where `x` > `Y` represents all the directed relationships from targetable resource `x` of kind `X` and its children, and recursively for `Y`, without loss of generality for any other set of instances of target resources; +* the policy objects `p1` → `a1` and `p2` → `b2`, where `p` → `y` represents the attachment of policy `p` of kind `P` to the target resource `y` of kind `A`, `B` or `C`, without loss of generality for any other set of instances of policies. +Depicted in the following Directed Acyclic Graph (DAG): -### Solution cookbook +```mermaid +--- +config: + look: handDrawn + theme: neutral +--- +graph + a1@{ shape: rect } + b1@{ shape: rect } + b2@{ shape: rect } + c1@{ shape: rect } + c2@{ shape: rect } + p1@{ shape: stadium } + p2@{ shape: stadium } -This section contains some required patterns for Policy objects and some -suggestions. Each will be marked as MUST, SHOULD, or MAY, using the standard -meanings of those terms. + a1 --> b1 + a1 --> b2 + b1 --> c1 + b2 --> c1 + b2 --> c2 -Additionally, the status of each solution is noted at the beginning of the section. + p1 -.-> a1 + p2 -.-> b2 +``` + +For each expanded context that is induced by the instances of targetable resource of kind `C` and its relationships given by the hierarchy, i.e. for each of: `a1` > `b1` > `c1`, `a1` > `b2` > `c1`, and `a1` > `b2` > `c2`, stack the policies targeting the context at any level, ordered from the most specific level (i.e. `C`) to the least specific one (i.e. `A`), applying the [conflict resolution rules](#conflict-resolution-rules) described before if necessary: + +1. Pop two policies from the stack and combine them into one effective policy. +2. Push the calculated effective policy back into the stack. +3. Repeat until there is no more than one policy in the stack. + +The last policy in each stack (if any) specifies the intended augmented behavior for the effective target resource of kind `C` within that corresponding context. + +The following diagram generalizes the described process for calculating Effective policies: + +```mermaid +--- +config: + look: handDrawn + theme: neutral +--- +graph + A[Start: Given a DAG of target objects and set of policies] --> sub + sub --> L["Output result: mapping of each leaf of the DAG to its corresponding Effective policy (if any)"] + + subgraph sub["For each path of target objects in the DAG"] + C["Order policies affecting the objects in the path from most specific to least specific, applying conflict resolution if necessary"] --> D["Push ordered policies to stack (least specific policy on top of the stack)"] + D --> E{More than one policy in stack?} + E -- Yes --> F[Pop two policies _pA_ and _pB_] + F --> G["Combine _pA_ and _pB_ into policy _pX_ by applying the merge strategy (predefined or dictated by _pA_ if more than one is supported)"] + G --> H[If more than one merge strategy is supported, make the merge strategy specified by _pB_ the merge strategy of _pX_] + H --> I[Push _pX_ back into the stack] + I --> E + E -- No --> J[Map the end of the path to a single policy remaining in the stack or none] + end +``` -#### Standard label on CRD objects +In the example above, the expected outcome of the process is: -Status: Required +* `c1` is augmented by `p1`, whenever activated in the context of `b1`; +* `c1` is augmented by the combination of `p1` + `p2`, whenever activated in the context of `b2`; +* `c2` is augmented by the combination of `p1` + `p2`. -Each CRD that defines a Policy object MUST include a label that specifies that -it is a Policy object, and that label MUST specify the _type_ of Policy attachment -in use. - -The label is `gateway.networking.k8s.io/policy: inherited|direct`. - -This solution is intended to allow both users and tooling to identify which CRDs -in the cluster should be treated as Policy objects, and so is intended to help -with discoverability generally. It will also be used by the forthcoming `kubectl` -plugin. - -##### Design considerations - -This is already part of the API pattern, but is being lifted to more prominence -here. - -#### Standard status struct - -Status: Experimental - -Included in the Direct Policy Attachment GEP. - -Policy objects SHOULD use the upstream `PolicyAncestorStatus` struct in their respective -Status structs. Please see the included `PolicyAncestorStatus` struct, and its use in -the `BackendTLSPolicy` object for detailed examples. Included here is a representative -version. - -This pattern enables different conditions to be set for different "Ancestors" -of the target resource. This is particularly helpful for policies that may be -implemented by multiple controllers or attached to resources with different -capabilities. This pattern also provides a clear view of what resources a -policy is affecting. - -For the best integration with community tooling and consistency across -the broader community, we recommend that all implementations transition -to Policy status with this kind of nested structure. - -This is an `Ancestor` status rather than a `Parent` status, as in the Route status -because for Policy attachment, the relevant object may or may not be the direct -parent. - -For example, `BackendTLSPolicy` directly attaches to a Service, which may be included -in multiple Routes, in multiple Gateways. However, for many implementations, -the status of the `BackendTLSPolicy` will be different only at the Gateway level, -so Gateway is the relevant Ancestor for the status. - -Each Gateway that has a Route that includes a backend with an attached `BackendTLSPolicy` -MUST have a separate `PolicyAncestorStatus` section in the `BackendTLSPolicy`'s -`status.ancestors` stanza, which mandates that entries must be distinct using the -combination of the `AncestorRef` and the `ControllerName` fields as a key. - -See [GEP-1897][gep-1897] for the exact details. - -[gep-1897]: ../gep-1897/index.md - -```go -// PolicyAncestorStatus describes the status of a route with respect to an -// associated Ancestor. -// -// Ancestors refer to objects that are either the Target of a policy or above it in terms -// of object hierarchy. For example, if a policy targets a Service, an Ancestor could be -// a Route or a Gateway. - -// In the context of policy attachment, the Ancestor is used to distinguish which -// resource results in a distinct application of this policy. For example, if a policy -// targets a Service, it may have a distinct result per attached Gateway. -// -// Policies targeting the same resource may have different effects depending on the -// ancestors of those resources. For example, different Gateways targeting the same -// Service may have different capabilities, especially if they have different underlying -// implementations. -// -// For example, in BackendTLSPolicy, the Policy attaches to a Service that is -// used as a backend in a HTTPRoute that is itself attached to a Gateway. -// In this case, the relevant object for status is the Gateway, and that is the -// ancestor object referred to in this status. -// -// Note that a Target of a Policy is also a valid Ancestor, so for objects where -// the Target is the relevant object for status, this struct SHOULD still be used. -type PolicyAncestorStatus struct { - // AncestorRef corresponds with a ParentRef in the spec that this - // RouteParentStatus struct describes the status of. - AncestorRef ParentReference `json:"ancestorRef"` - - // ControllerName is a domain/path string that indicates the name of the - // controller that wrote this status. This corresponds with the - // controllerName field on GatewayClass. - // - // Example: "example.net/gateway-controller". - // - // The format of this field is DOMAIN "/" PATH, where DOMAIN and PATH are - // valid Kubernetes names - // (https://kubernetes.io/docs/concepts/overview/working-with-objects/names/#names). - // - // Controllers MUST populate this field when writing status. Controllers should ensure that - // entries to status populated with their ControllerName are cleaned up when they are no - // longer necessary. - ControllerName GatewayController `json:"controllerName"` - - // Conditions describes the status of the Policy with respect to the given Ancestor. - // - // +listType=map - // +listMapKey=type - // +kubebuilder:validation:MinItems=1 - // +kubebuilder:validation:MaxItems=8 - Conditions []metav1.Condition `json:"conditions,omitempty"` -} - - -// PolicyStatus defines the common attributes that all Policies SHOULD include -// within their status. -type PolicyStatus struct { - // Ancestors is a list of ancestor resources (usually Gateways) that are - // associated with the route, and the status of the route with respect to - // each ancestor. When this route attaches to a parent, the controller that - // manages the parent and the ancestors MUST add an entry to this list when - // the controller first sees the route and SHOULD update the entry as - // appropriate when the relevant ancestor is modified. - // - // Note that choosing the relevant ancestor is left to the Policy designers; - // an important part of Policy design is designing the right object level at - // which to namespace this status. - // - // Note also that implementations MUST ONLY populate ancestor status for - // the Ancestor resources they are responsible for. Implementations MUST - // use the ControllerName field to uniquely identify the entries in this list - // that they are responsible for. - // - // A maximum of 32 ancestors will be represented in this list. An empty list - // means the Policy is not relevant for any ancestors. - // - // +kubebuilder:validation:MaxItems=32 - Ancestors []PolicyAncestorStatus `json:"ancestors"` -} +In the most trivial case where policies can only directly target the objects whose behavior they intend to augment (i.e. instances of `C` without any indirections) and no policy specs are merged at all, the outcome of the process of calculating effective policies is simplified to a 1:1 mapping between policy and target object at most, where the declared policy equals the effective one, with no combinatorial specs nor contextual variations. + +### Classes of policies + +While the notion of classes of policy kinds was more central in previous iterations of this GEP (see [GEP-2648](../gep-2648/index.md) and [GEP-2649](../gep-2649/index.md)), it here serves primarily as a communicative and organizational aid. The classification reflects patterns that emerge from the properties and behaviors described in earlier sections, but it does not impose any prescriptive or normative requirements on the implementations. These classes—namely _Direct_ and _Inherited_—remain in the specification to support clarity and shared understanding, especially for implementations and users familiar with earlier versions. + +#### Direct + +* A single kind supported in `spec.targetRefs.kind` +* Effects of the policies do not span across the hierarchy, i.e. the _Declared target kind_ is equal to the _Effective target kind_ +* *None* is the only merge strategy supported +* If supported, could typically be implemented by directly extending the API of the target kind with the fields otherwise defined at the policy (e.g. Gateway API xRoute filter) + +#### Inherited + +* Superset of the above +* Any policy kind that do not comply with at least one characteristic of the Direct class of policies + +## End-to-end examples + +This section presents a series of synthetic examples of applications of policies for different kinds of topologies and contexts. + +In all cases, the background of targetable object kinds is assumed to be a hierarchy of network resource kinds `Gateway` (`g`) > `Route` (`r`) > `Backend` (`b`), where `Gateway` is the least specific kind (instances denoted "`gX`") and `Backend` is the most specific kind (instances denoted "`bX`"). + +Moreover, a `ColorPolicy` kind is defined however with variations in its semantics across examples to accommodate for each case. Instances of the `ColorPolicy` kind (denoted "`pX[spec]`" and referred to simply as "policies") may target one or more kinds of targetable resources, depending on each example. A policy represents an intent to "color" the network traffic that flows through the portion of the network corresponding to the target with a given color or color set that is specified in the policy. + +### Example 1. Direct Policy + +In this example, the `ColorPolicy` policy kind is defined as an instance of the Direct class of policies. Instances of the `ColorPolicy` kind in this example can only target `Backend` resources. + +Given: + +the following state of targetable resources: + +* `g1` > `r1` > `b1` +* `g1` > `r2` > `b2` + +and the following state of `ColorPolicy` objects, where `pX[spec]` → `bX` denotes a policy `pX` attached to ("targeting") a `Backend` resource `bX`, intending to augment `bX`‘s behavior with `spec`: + +* `p1[color:red]` → `b1` +* `p2[color:blue]` → `b1` (conflicting policy, `p2.creationTimestamp` > `p1.creationTimestamp`) + +Depicted in the following Directed Acyclic Graph (DAG): + +```mermaid +--- +config: + look: handDrawn + theme: neutral +--- +graph + g1@{ shape: rect } + r1@{ shape: rect } + r2@{ shape: rect } + b1@{ shape: rect } + b2@{ shape: rect } + p1@{ shape: stadium, label: "**p1**\ncolor:red\ncreationTimestamp:t" } + p2@{ shape: stadium, label: "**p2**\ncolor:blue\ncreationTimestamp:t+Δ" } + + g1 --> r1 + g1 --> r2 + r1 --> b1 + r2 --> b2 + + p1 -.-> b1 + p2 -.-> b1 ``` -##### Design considerations +The expected outcome to be implemented by the controller is: + +1. All traffic directed to `Backend` `b1` must be colored `red`. +2. Status of `Backend` `b1` should be reported as affected by the `ColorPolicy` `p1`. +3. Status of `Backend` `b2` should NOT be reported as affected by any policy. +4. Status of `ColorPolicy` `p1` must be reported as enforced. +5. Status of `ColorPolicy` `p2` must be reported as NOT enforced, due to conflict with `ColorPolicy` `p1`. + +### Example 2. Defaults & Overrides + +In this example, the `ColorPolicy` policy kind is defined as an instance of the Inherited class of policies. Instances of the `ColorPolicy` kind in this example can target resources of the `Gateway` and `Route` kinds, always aiming to augment the behavior of resources of the `Backend` kind in the hierarchy. The policies can specify either `defaults` (assumed unless specified otherwise) or `overrides`, that are always treated at the atomic level. + +Given: + +the following state of targetable resources: + +* `g1` > `r1` > `b1` +* `g1` > `r2` > `b1` +* `g2` > `r3` > `b1` +* `g2` > `r4` > `b2` + +and the following state of `ColorPolicy` objects, where `pX[spec]` → `yX` denotes a policy `pX` attached to ("targeting") a resource `yX`, `y` ∈ {`g`, `r`}, intending to augment with `spec` the behavior of `Backend` resources when activated via `yX`: + +* `p1[color:red]` → `g1` +* `p2[color:blue]` → `r1` +* `p3[overrides:{color:yellow}]` → `g2` +* `p4[color:green]` → `r4` + +Depicted in the following Directed Acyclic Graph (DAG): + +```mermaid +--- +config: + look: handDrawn + theme: neutral +--- +graph + g1@{ shape: rect } + r1@{ shape: rect } + r2@{ shape: rect } + r3@{ shape: rect } + r4@{ shape: rect } + b1@{ shape: rect } + b2@{ shape: rect } + p1@{ shape: stadium, label: "**p1**\ncolor:red" } + p2@{ shape: stadium, label: "**p2**\ncolor:blue" } + p3@{ shape: stadium, label: "**p3**\noverrides:{color:yellow}" } + p4@{ shape: stadium, label: "**p4**\ncolor:green" } + + g1 --> r1 + r1 --> b1 + g1 --> r2 + r2 --> b1 + g2 --> r3 + r3 --> b1 + g2 --> r4 + r4 --> b2 + + p1 -.-> g1 + p2 -.-> r1 + p3 -.-> g2 + p4 -.-> r4 +``` -This is recommended as the base for Policy object's status. As Policy Attachment -is a pattern, not an API, "recommended" is the strongest we can make this, but -we believe that standardizing this will help a lot with discoverability. +The expected outcome to be implemented by the controller is: + +1. Traffic directed to `g1` > `r1` > `b1` must be colored `blue` (more specific `p2` spec beats less specific defaults at `p1`). +2. Traffic directed to `g1` > `r2` > `b1` must be colored `red` (implicit defaults specified at `p1` not replaced by any other policy). +3. Traffic directed to `g2` > `r3` > `b1` must be colored `yellow` (overrides specified at `p3` not replaced by any other policy). +4. Traffic directed to `g2` > `r4` > `b2` must be colored `yellow` (overrides specified at `p3` beats more specific policy `p4`). +5. Status of `Backend` `b1` should be reported as affected by the `ColorPolicy` resources `p1`, `p2` and `p3`. +6. Status of `Backend` `b2` should be reported as affected by the `ColorPolicy` resource `p3`. +7. Status of `ColorPolicy` `p1` must be reported as partially enforced, due to in some cases beaten by `p2`. +8. Status of `ColorPolicy` `p2` must be reported as enforced. +9. Status of `ColorPolicy` `p3` must be reported as enforced. +10. Status of `ColorPolicy` `p4` must be reported as NOT enforced, due to being overridden by `ColorPolicy` `p3`. + +### Example 3. Merged specs + +In this example, the `ColorPolicy` policy kind is defined as an instance of the Inherited class of policies. Instances of the `ColorPolicy` kind in this example can target resources of the `Gateway` and `Route` kinds, always aiming to augment the behavior of resources of the `Backend` kind in the hierarchy. The policies can specify either `defaults` (assumed unless specified otherwise) or `overrides`. Moreover, policies specify a complex color scheme composed of `dark` and `light` entries, as well as a `strategy` field to specify one of two supported merge strategies, `atomic` (assumed unless specified otherwise) or `patch`. + +Given: + +the following state of targetable resources: + +* `g1` > `r1` > `b1` +* `g1` > `r2` > `b1` +* `g2` > `r3` > `b1` +* `g2` > `r4` > `b2` + +and the following state of `ColorPolicy` objects, where `pX[spec]` → `yX` denotes a policy `pX` attached to ("targeting") a resource `yX`, `y` ∈ {`g`, `r`}, intending to augment with `spec` the behavior of `Backend` resources when activated via `yX`: + +* `p1[colors:{dark:brown,light:red},strategy:atomic]` → `g1` +* `p2[colors:{light:blue}]` → `r1` +* `p3[overrides:{colors:{light:yellow},strategy:patch}]` → `g2` +* `p4[colors:{dark:olive,light:green}]` → `r4` + +Depicted in the following Directed Acyclic Graph (DAG): + +```mermaid +--- +config: + look: handDrawn + theme: neutral +--- +graph + g1@{ shape: rect } + r1@{ shape: rect } + r2@{ shape: rect } + r3@{ shape: rect } + r4@{ shape: rect } + b1@{ shape: rect } + b2@{ shape: rect } + p1@{ shape: stadium, label: "**p1**\ncolors:{dark:brown light:red}\nstrategy:atomic" } + p2@{ shape: stadium, label: "**p2**\ncolors:{light:blue}" } + p3@{ shape: stadium, label: "**p3**\noverrides:{\ncolors:{light:yellow}\nstrategy:patch}" } + p4@{ shape: stadium, label: "**p4**\ncolors:{dark:olive light:green}" } + + g1 --> r1 + r1 --> b1 + g1 --> r2 + r2 --> b1 + g2 --> r3 + r3 --> b1 + g2 --> r4 + r4 --> b2 + + p1 -.-> g1 + p2 -.-> r1 + p3 -.-> g2 + p4 -.-> r4 +``` -Note that is likely that all Gateway API tooling will expect policy status to follow -this structure. To benefit from broader consistency and discoverability, we -recommend transitioning to this structure for all Gateway API Policies. +The expected outcome to be implemented by the controller is: -#### Standard status Condition on Policy-affected objects +1. Traffic directed to `g1` > `r1` > `b1` must be colored `dark:UNDEFINED,light:blue` (more specific `p2` spec beats less specific atomic defaults from `p1`. +2. Traffic directed to `g1` > `r2` > `b1` must be colored `dark:brown,light:red` (implicit atomic defaults specified at `p1` not replaced by any other policy). +3. Traffic directed to `g2` > `r3` > `b1` must be colored `dark:UNDEFINED,light:yellow` (patch overrides specified at `p3` not replaced, nor extended by any other policy). +4. Traffic directed to `g2` > `r4` > `b2` must be colored `dark:olive,light:yellow` (patch overrides specified by `p3` beats more specific policy `p4`, which still extends the spec with a specific value for `dark`. +5. Status of `Backend` `b1` should be reported as affected by the `ColorPolicy` resources `p1`, `p2` and `p3`. +6. Status of `Backend` `b2` should be reported as affected by the `ColorPolicy` resource `p3` and `p4`. +7. Status of `ColorPolicy` `p1` must be reported as partially enforced, due to in some cases atomically beaten by `p2`. +8. Status of `ColorPolicy` `p2` must be reported as enforced. +9. Status of `ColorPolicy` `p3` must be reported as enforced. +10. Status of `ColorPolicy` `p4` must be reported as partially enforced, due to being partially overridden by `ColorPolicy` `p3`. -Support: Provisional +## Managing metaresources in real life -This solution is IN PROGRESS and so is not binding yet. +### Responsibility -However, a version of this proposal is now included in the Direct Policy -Attachment GEP. +Metaresources and policies are typically implemented and managed by a custom controller. This controller can be the same controller that is responsible for managing the objects that are targeted by the metaresources or another controller specifically responsible for the aspect of the object that the metaresource augments or modifies. For policy kinds of metaresources, this controller is often referred to as the "policy controller". -This solution requires definition in a GEP of its own to become binding. -[GEP-2923](https://github.com/kubernetes-sigs/gateway-api/issues/2923) has been -opened to cover some aspects of this work. +Ultimately, it is the responsibility of the controller to provide enough information to resource owners that help circumvent or mitigate the discoverability problem (described in the next section). This typically involves populating the status stanza of the target objects, although may as well resort to additional tools (e.g. CRDs, CLI tools) that help visualize the hierarchical topology of target objects and policies, effective policies, etc. -**The description included here is intended to illustrate the sort of solution -that an eventual GEP will need to provide, _not to be a binding design.** +### The discoverability problem -Implementations that use Policy objects MUST put a Condition into `status.Conditions` -of any objects affected by a Policy. +A well-known problem of declaring specifications into separate objects, that ultimately will reshape or govern the behavior of their targeted ones, regards the discoverability of metaresources. That is, how an object owner gets to know what metaresource (or set of metaresources) is affecting their object and with what content. -That Condition MUST have a `type` ending in `PolicyAffected` (like -`gateway.networking.k8s.io/PolicyAffected`), -and have the optional `observedGeneration` field kept up to date when the `spec` -of the Policy-attached object changes. +Even though Kubernetes already has analogous problems in its core-the most obvious example being the Kubernetes Role Based Access Control (RBAC)-, the discoverability issue remains a challenging one to be addressed. To better understand it, consider the following parable described in the context of Gateway API, with thanks to [Flynn](https://github.com/kflynn): -Implementations SHOULD use their own unique domain prefix for this Condition -`type` - it is recommended that implementations use the same domain as in the -`controllerName` field on GatewayClass (or some other implementation-unique -domain for implementations that do not use GatewayClass).) +#### The Parabol -For objects that do _not_ have a `status.Conditions` field available (`Secret` -is a good example), that object MUST instead have an annotation of -`gateway.networking.k8s.io/PolicyAffected: true` (or with an -implementation-specific domain prefix) added instead. +It's a sunny Wednesday afternoon, and the lead microservices developer for Evil Genius Cupcakes is windsurfing. Work has been eating Ana alive for the past two and a half weeks, but after successfully deploying version 3.6.0 of the `baker` service this morning, she's escaped early to try to unwind a bit. +Her shoulders are just starting to unknot when her phone pings with a text from Chihiro, down in the NOC. Waterproof phones are a blessing, but also a curse. -##### Design Considerations -The intent here is to add at least a breadcrumb that leads object owners to have -some way to know that their object is being affected by another object, while -minimizing the number of updates necessary. +**Chihiro**: *Hey Ana. Things are still running, more or less, but latencies on everything in the `baker` namespace are crazy high after your last rollout, and `baker` itself has a weirdly high load. Sorry to interrupt you on the lake but can you take a look? Thanks\!\!* -Minimizing the object updates is done by only having an update be necessary when -the affected object starts or stops being affected by a Policy, rather than if -the Policy itself has been updated. +Ana stares at the phone for a long moment, heart sinking, then sighs and turns back to shore. -There is already a similar Condition to be placed on _Policy_ objects, rather -than on the _targeted_ objects, so this solution is also being included in the -Conditions section below. +What she finds when dries off and grabs her laptop is strange. `baker` does seem to be taking much more load than its clients are sending, and its clients report much higher latencies than they’d expect. She doublechecks the Deployment, the Service, and all the HTTPRoutes around `baker`; everything looks good. `baker`’s logs show her mostly failed requests... with a lot of duplicates? Ana checks her HTTPRoute again, though she's pretty sure you can't configure retries there, and finds nothing. But it definitely looks like clients are retrying when they shouldn’t be. -#### GatewayClass status Extension Types listing +She pings Chihiro. -Support: Provisional +**Ana**: *Hey Chihiro. Something weird is up, looks like requests to `baker` are failing but getting retried??* -This solution is IN PROGRESS, and so is not binding yet. +A minute later they answer. -Each implementation MUST list all relevant CRDs in its GatewayClass status (like -Policy, and other extension types, like paramsRef targets, filters, and so on). +**Chihiro**: 🤷 *Did you configure retries?* -This is going to be tracked in its own GEP, https://github.com/kubernetes-sigs/gateway-api/discussions/2118 -is the initial discussion. This document will be updated with the details once -that GEP is opened. +**Ana**: *Dude. I don’t even know how to.* 😂 -##### Design Considerations +**Chihiro**: *You just attach a RetryPolicy to your HTTPRoute.* -This solution: +**Ana**: *Nope. Definitely didn’t do that.* -- is low cost in terms of apiserver updates (because it's only on the GatewayClass, - and only on implementation startup) -- provides a standard place for all users to look for relevant objects -- ties into the Conformance Profiles design and other efforts about GatewayClass - status +She types `kubectl get retrypolicy -n baker` and gets a permission error. -#### Standard status stanza +**Ana**: *Huh, I actually don’t have permissions for RetryPolicy.* 🤔 -Support: Provisional +**Chihiro**: 🤷 *Feels like you should but OK, guess that can’t be it.* -This solution is IN PROGRESS and so is not binding yet. +Minutes pass while both look at logs. -This solution requires definition in a GEP of its own to become binding. +**Chihiro**: *I’m an idiot. There’s a RetryPolicy for the whole namespace – sorry, too many policies in the dashboard and I missed it. Deleting that since you don’t want retries.* -**The description included here is intended to illustrate the sort of solution -that an eventual GEP will need to provide, _not to be a binding design. THIS IS -AN EXPERIMENTAL SOLUTION DO NOT USE THIS YET.** +**Ana**: *Are you sure that’s a good–* -An implementation SHOULD include the name, namespace, apiGroup and Kind of Policies -affecting an object in the new `effectivePolicy` status stanza on Gateway API -objects. +Ana’s phone shrills while she’s typing, and she drops it. When she picks it up again she sees a stack of alerts. She goes pale as she quickly flips through them: there’s one for every single service in the `baker` namespace. -This stanza looks like this: -```yaml -kind: Gateway -... -status: - effectivePolicy: - - name: some-policy - namespace: some-namespace - apiGroup: implementation.io - kind: AwesomePolicy - ... -``` +**Ana**: *PUT IT BACK\!\!* -##### Design Considerations +**Chihiro**: *Just did. Be glad you couldn't hear all the alarms here.* 😕 -This solution is designed to limit the number of status updates required by an -implementation to when a Policy starts or stops being relevant for an object, -rather than if that Policy's settings are updated. +**Ana**: *What the hell just happened??* -It helps a lot with discoverability, but comes at the cost of a reasonably high -fanout cost. Implementations using this solution SHOULD ensure that status updates -are deduplicated and only sent to the apiserver when absolutely necessary. +**Chihiro**: *At a guess, all the workloads in the `baker` namespace actually fail a lot, but they seem OK because there are retries across the whole namespace?* 🤔 -Ideally, these status updates SHOULD be in a separate, lower-priority queue than -other status updates or similar solution. +Ana's blood runs cold. -#### PolicyBinding resource +**Chihiro**: *Yeah. Looking a little closer, I think your `baker` rollout this morning would have failed without those retries.* 😕 -Support: Provisional +There is a pause while Ana's mind races through increasingly unpleasant possibilities. -This solution is IN PROGRESS and so is not binding yet. +**Ana**: *I don't even know where to start here. How long did that RetryPolicy go in? Is it the only thing like it?* -This solution requires definition in a GEP of its own to become binding. +**Chihiro**: *Didn’t look closely before deleting it, but I think it said a few months ago. And there are lots of different kinds of policy and lots of individual policies, hang on a minute...* -**The description included here is intended to illustrate the sort of solution -that the eventual GEP will need to provide, _not to be a binding design. THIS IS -AN EXPERIMENTAL SOLUTION DO NOT USE THIS YET.** +**Chihiro**: *Looks like about 47 for your chunk of the world, a couple hundred system-wide.* -Implementations SHOULD create an instance of a new `gateway.networking.k8s.io/EffectivePolicy` -object when one or more Policy objects become relevant to the target object. +**Ana**: 😱 *Can you tell me what they’re doing for each of our services? I can’t even* look *at these things.* 😕 -The `EffectivePolicy` object MUST be in the same namespace as the object targeted -by the Policy, and must have the _same name_ as the object targeted like the Policy. -This is intended to mirror the Services/Endpoints naming convention, to allow for -ease of discovery. +**Chihiro**: *That's gonna take awhile. Our tooling to show us which policies bind to a given workload doesn't go the other direction.* -The `EffectivePolicy` object MUST set the following information: +**Ana**: *...wait. You have to* build tools *to know if retries are turned on??* -- The name, namespace, apiGroup and Kind of Policy objects affecting the targeted - object. -- The full resultant set of Policy affecting the targeted object. +Pause. -The above details MUST be namespaced using the `controllerName` of the implementation -(could also be by GatewayClass), similar to Route status being namespaced by -`parentRef`. +**Chihiro**: *Policy Attachment is more complex than we’d like, yeah.* 😐 *Look, how about roll back your `baker` change for now? We can get together in the morning and start sorting this out.* -An example `EffectivePolicy` object is included here - this may be superseded by -a later GEP and should be updated or removed in that case. Note that it does -_not_ contain a `spec` and a `status` stanza - by definition this object _only_ -contains `status` information. +Ana shakes her head and rolls back her edits to the `baker` Deployment, then sits looking out over the lake as the deployment progresses. -```yaml -kind: EffectivePolicy -apiVersion: gateway.networking.k8s.io/v1alpha2 -metadata: - name: targeted-object - namespace: targeted-object-namespace -policies: -- controllerName: implementation.io/ControllerName - objects: - - name: some-policy - namespace: some-namespace - apiGroup: implementation.io - kind: AwesomePolicy - resultantPolicy: - awesomePolicy: - configitem1: - defaults: - foo: 1 - overrides: - bar: important-setting +**Ana**: *Done. Are things happier now?* -``` +**Chihiro**: *Looks like, thanks. Reckon you can get back to your sailboard.* 🙂 + +Ana sighs. + +**Ana**: *Wish I could. Wind’s died down, though, and it'll be dark soon. Just gonna head home.* + +**Chihiro**: *Ouch. Sorry to hear that.* 😐 + +One more look out at the lake. + +**Ana**: *Thanks for the help. Wish we’d found better answers.* 😢 -Note here that the `resultantPolicy` setting is defined using the same mechanisms -as an `unstructured.Unstructured` object in the Kubernetes Go libraries - it's -effectively a `map[string]struct{}` that is stored as a `map[string]string` - -which allows an arbitrary object to be specified there. +#### The Problem, restated -Users or tools reading the config underneath `resultantPolicy` SHOULD display -it in its encoded form, and not try to deserialize it in any way. +What this parable makes clear is that, in the absence of information about what metaresource is affecting an object, it’s very easy to make poor decisions. -The rendered YAML MUST be usable as the `spec` for the type given. +It’s critical that this proposal solve the problem of showing up to three things, listed in increasing order of desirability: -##### Design considerations +* *That* some metaresource/policy is affecting a particular object +* *Which* metaresource/policy is (or metaresources/policies are) affecting a particular object +* *What* settings in the metaresource/policy are affecting the object. -This will provide _full_ visibility to end users of the _actual settings_ being -applied to their object, which is a big discoverability win. +In the parable, if Ana and Chihiro had known that there were policies affecting the relevant object, then they could have gone looking for the relevant policies and things would have played out differently. If they knew which policies, they would need to look less hard, and if they knew what the settings being applied were, then the parable would have been able to be very short indeed. -However, it relies on the establishment and communication of a convention ("An -EffectivePolicy is right next to your affected object"), that may not be desirable. +(There’s also another use case to consider, in that Chihiro should have been able to see that the metaresource on the namespace was in use in many places before deleting it.) -Thus its status as EXPERIMENTAL DO NOT USE YET. +To put this another way, Metaresources and Policy Attachment is effectively adding another persona among the stakeholders, the Policy Admin, and without a solution to the discoverability problem, their actions are largely invisible to the Application Developer. Not only that, but their concerns cut across the previously established levels. -#### Validating Admission Controller to inform users about relevant Policy +![Gateway API diagram with Policy Admin](images/713-the-diagram-with-policy-admin.png) + +From the Policy Admin’s point of view, they need to know across their whole remit (which conceivably could be the whole cluster): -Implementations MAY supply a Validating Admission Webhook that will return a -WARNING message when an applied object is affected by some Policy. +* *What* metaresource/policy has been created +* *Where* it’s applied +* *What* the resultant (effective) metaresource/policy is saying -The warning message MAY include the name, namespace, apiGroup and Kind of relevant -Policy objects. +Which again, comes down to discoverability, and can probably be addressed in similar ways at an API level to the Application Developer's concerns. -##### Design Considerations +An important note here is that a key piece of information for Policy Admins and Cluster Operators is "How many things does this Policy affect?". In the parable, this would have enabled Chihiro to know that deleting the Namespace policy would affect many other people than just Ana. -Pro: +#### Gateway API personas and the discoverability problem -- This gives object owners a very clear signal that something some Policy is - going to affect their object, at apply time, which helps a lot with discoverability. +Let's go through the various users of Gateway API and what they need to know about policies affecting their objects. -Cons: +In all of these cases, keeping the troubleshooting distance low is desired; that is, that there should be a minimum of hops required between objects from the one owned by the user to the one responsible for a setting. -- Implementations would have to have a webhook, which is another thing to run. -- The webhook will need to have the same data model that the implementation uses, - and keep track of which GatewayClasses, Gateways, Routes, and Policies are - relevant. Experience suggests this will not be a trivial engineering exercise,and will add a lot of implementation complexity. +Another way to think of the troubleshooting distance in this context is "How many `kubectl` commands would the user need to do to understand that a policy is relevant, which policy is relevant, and what configuration the full set of policy is setting?" -#### `kubectl` plugin or command-line tool -To help improve UX and standardization, a kubectl plugin will be developed that -will be capable of describing the computed sum of policy that applies to a given -resource, including policies applied to parent resources. +##### Application Developer Discoverability -Each Policy CRD that wants to be supported by this plugin will need to follow -the API structure defined above and add the [corresponding label](index.md#standard-label-on-crd-objects) -to the CRD. +How does Ana, or any Application Developer who owns one or more Route objects know that their object is affected by a policy, which policy is affecting it, and what the content of the policy is? -### Conditions +The best outcome is that Ana needs to look only at a specific route to know what policy settings are being applied to that Route, and where they come from. However, some of the other problems below make it very difficult to achieve this. -Implementations using Policy objects MUST include a `spec` and `status` stanza, -and the `status` stanza MUST contain a `conditions` stanza, using the standard -Condition format. +##### Policy Admin Discoverability -Policy authors should consider namespacing the `conditions` stanza with a -`controllerName`, as in Route status, if more than one implementation will be -reconciling the Policy type. +How does the Policy Admin know what policy is applied where, and what the content of that policy is? How do they validate that the policy is being used in ways acceptable to their organization? For any given policy object, how do they know how many places it's being used? -#### On `Policy` objects +##### Cluster Admin Discoverability -Controllers using the Gateway API policy attachment model MUST populate the -`Accepted` condition and reasons as defined below on policy resources to provide -a consistent experience across implementations. +The Cluster Admin has similar concerns to the Policy Admin, but with a focus on being able to determine what's relevant when something is broken. -```go -// PolicyConditionType is a type of condition for a policy. -type PolicyConditionType string +How does the Cluster Admin know what policy is applied where, and what the content of that policy is? -// PolicyConditionReason is a reason for a policy condition. -type PolicyConditionReason string +For any given policy object, how do they know how many places it's being used? -const ( - // PolicyConditionAccepted indicates whether the policy has been accepted or rejected - // by a targeted resource, and why. - // - // Possible reasons for this condition to be True are: - // - // * "Accepted" - // - // Possible reasons for this condition to be False are: - // - // * "Conflicted" - // * "Invalid" - // * "TargetNotFound" - // - PolicyConditionAccepted PolicyConditionType = "Accepted" +#### Hinting on a solution for the discoverability problem - // PolicyReasonAccepted is used with the "Accepted" condition when the policy has been - // accepted by the targeted resource. - PolicyReasonAccepted PolicyConditionReason = "Accepted" +Querying the status of objects stored in the cluster may be the Kubernetes way of knowing the state of the system, in a world where objects are declarative and there are only so many links between objects to hop in between. However, this is still a proxy used to model a real life problem that often has otherwise different ways to be thought about as well. - // PolicyReasonConflicted is used with the "Accepted" condition when the policy has not - // been accepted by a targeted resource because there is another policy that targets the same - // resource and a merge is not possible. - PolicyReasonConflicted PolicyConditionReason = "Conflicted" +In the context of traffic networking, for example, often the question asked by users is *"What happens when a network request X comes in?"*. There is an implicit expectation that a set of Kubernetes resources suffices to represent all the rules for a given workload to be activated and thus process request X, and often that is the case. For more complex cases however (e.g, multiple personas, application concerns separated into dedicated resource kinds, interaction between groups of users, etc), real life can get more complicated than a simple `kubectl get x`, or at least additional steps must be automated to encompass complexity into what can be achieved with a single declarative object. - // PolicyReasonInvalid is used with the "Accepted" condition when the policy is syntactically - // or semantically invalid. - PolicyReasonInvalid PolicyConditionReason = "Invalid" +With that in mind, a possible solution for the discoverability problem may involve designing tools (e.g. CLI tools/plugins), new CRDs, etc that let users ask questions in terms of the real life problems they have to deal with on a daily basis, rather than shaped by the underlying technologies used in the process. For instance, a simple Kubernetes object that is used to declare the rules to process a HTTP request cannot have its status reported simply as Ready/Not ready. By being a complex object composed of multiple routing rules, potentially affected by specifications declared from other objects as well, its status MUST account for that complexity and be structured in such a way that informs the owner with respect to each possible case, whether the ones induced by the internal specification declared by the object itself or its external relationships. - // PolicyReasonTargetNotFound is used with the "Accepted" condition when the policy is attached to - // an invalid target resource - PolicyReasonTargetNotFound PolicyConditionReason = "TargetNotFound" -) -``` +In other words, the discoverability problem exists and must be addressed in light of the complexity associated with the topology of nested contexts induced by a set of hierarchically related resources. One should always have that topology in mind while asking questions regarding the behavior of a given resource, because just like a routing object (e.g. HTTPRoute) does not exist independently from its parent contexts (e.g. Gateways) or its children (e.g. Backends), any resource in focus may be just a part of a whole. + +### Status reporting + +#### Policy status + +Policy CRDs MUST define a status stanza that allows for reporting the status of the policy with respect to each scope the resource may apply. + +The basic status conditions are: + +* **Accepted**: the policy passed both syntactic validation by the API server and semantic validation enforced by the controller, such as whether the target objects exist. +* **Enforced**: the policy's spec is guaranteed to be fully enforced, to the extent of what the controller can ensure. +* **PartiallyEnforced**: parts of the policy's spec is guaranteed to be enforced, while other parts are known to have been superseded by other specs, to the extent of what the controller can ensure. The status should include details highlighting which parts of the policy are enforced and which parts have been superseded, with the references to all other related policies. +* **Overridden**: the policy's spec is known to have been fully overridden by other specs. The status should include the references to the other related policies. + +Policy implementations SHOULD support these basic status conditions. + +#### Target object status + +Implementations of Policies kinds SHOULD put a condition into `status.Conditions` of any objects affected by the policy. + +That condition, if added, MUST be named according to the pattern `Affected` (e.g. `colors.controller.k8s.io/ColorPolicyAffected`), and SHOULD include an `observedGeneration` field kept up to date when the spec of the target object changes. + +Implementations SHOULD use their own unique domain prefix for this condition type. Gateway API implementations, for instance, SHOULD use the same domain as in the `controllerName` field on `GatewayClass` (or some other implementation-unique domain for implementations that do not use `GatewayClass`.) + +E.g. – given a `Gateway` object that is targeted by a hypothetical `ColorPolicy` policy object named `policy-namespace/my-policy`, which is owned by a `colors.controller.k8s.io` controller and with status `Enforced` or `PartiallyEnforced`. The controller SHOULD add to the status of the `Gateway` object a condition `colors.controller.k8s.io/ColorPolicyAffected: true`, and reason ideally referring to the `policy-namespace/my-policy` by name. + +Similarly, for a hypothetical `ColorPolicy` policy that targets a specific named section of the `Gateway` object (e.g., `http-listener`), the controller SHOULD add to the status of the listener section within the `Gateway` object a condition `colors.controller.k8s.io/ColorPolicyAffected: true`. + +For objects that do not have a `status.Conditions` field available (`Secret` is a good example), that object SHOULD instead have an annotation of `colors.controller.k8s.io/ColorPolicyAffected: true` added instead. + +#### Status needs to be namespaced by implementation + +Because an object can be affected by multiple implementations at once, any added status MUST be namespaced by the implementation. + +In Gateway API's Route Parent status, `parentRef` plus the controller name have been used for this. + +For a policy, something similar can be done, namespacing by the reference to the implementation's controller name. + +Namespacing by the originating policy cannot easily be done because the source could be more than one policy object. + +#### Creating common data representation patterns + +Defining a _common_ pattern for including the details of an _arbitrarily defined_ object, to be included in a library for all possible implementations, is challenging, to say the least. + +Structured data cannot be used because there is no way of knowing what the structure will be beforehand. This suggests a need to use unstructured data for representing the main body of the arbitrary policy objects. Practically, this will have to be a string representation of the YAML form (or JSON, equivalently) of the body of the policy object (absent the metadata part of every Kubernetes object). + +Metaresources and Policy Attachment does not mandate anything about the design of the object's top level except that it must be a Kubernetes object, so the only possible thing to rely upon here is the presence of the Kubernetes metadata elements: `apiVersion`, `kind`, and `metadata`. + +Therefore, a string representation of the rest of the file is likely the best that can be done here. + +#### Fanout status update problems + +The fanout problem is that, when an update takes place in a single object (a policy, or an object with a policy attached), an implementation may need to update _many_ objects if it needs to place details of what policy applies, or what the resultant set of policies is on _every_ object. + +Historically, this is a risky strategy and needs to be carefully applied, as it's an excellent way to create apiserver load problems, which can produce a large range of bad effects for cluster stability. + +This does not mean that nothing at all that affects multiple objects can be done, but that careful consideration of what information is stored in status, so that _every_ policy update does not require a corresponding status update, is advised. + +## Current use of policies + +### Implementations + +These are a few known implementations of policies in compliance with this GEP. + +Users should refer to the official documentation from each implementation for for more up to date information. + +#### Gateway API (core) + +Gateway API defines two kinds of Direct policies, both for augmenting the behavior of Kubernetes `Service` resources: + +| Policy kind | Description | Target kinds | Merge strategies | Policy class | +| ------------------------- | --------------------------------------------------------------------------------------------------------------- |---------------- | ---------------- | ------------ | +| **BackendTLSPolicy** | TLS configuration of the connection from the Gateway to a backend pod (set of pods) via the Service API object. | Service, _Port_ | None | Direct | +| **XBackendTrafficPolicy** | Configuration for how traffic to a target backend should be handled (retries and session persistence) | _Port_ | None | Direct | + +#### Envoy Gateway + +https://gateway.envoyproxy.io/docs/api/extension_types/ + +Gateway API implementation that defines the following kinds of policies: + +| Policy kind | Description | Target kinds | Merge strategies | Policy class | +| ------------------------ | -------------------------------------------------------------------------------------------------- | ----------------------------------------------------------- | ----------------------- | ------------ | +| **ClientTrafficPolicy** | Configure the behavior of the connection between the downstream client and Envoy Proxy listener. | Gateway, _Listener_ | Atomic defaults | Inherited | +| **BackendTrafficPolicy** | Configure the behavior of the connection between the Envoy Proxy listener and the backend service. | Gateway, HTTPRoute, GRPCRoute, UDPRoute, TCPRoute, TLSRoute | Patch defaults, Custom | Inherited | +| **EnvoyExtensionPolicy** | Configure various envoy extensibility options for the Gateway. | Gateway, HTTPRoute, GRPCRoute, UDPRoute, TCPRoute, TLSRoute | Atomic defaults, Custom | Inherited | +| **EnvoyPatchPolicy** | Modify the generated Envoy xDS resources by Envoy Gateway using this patch API. | GatewayClass, Gateway | Custom | Inherited | +| **SecurityPolicy** | Configure various security settings for a Gateway. | Gateway, HTTPRoute, GRPCRoute | Atomic defaults | Inherited | + +#### Istio + +https://istio.io/latest/docs/reference/config/ + +Gateway API implementation that defines the following kinds of policies: + +| Policy kind | Description | Target kinds | Merge strategies | Policy class | +| -------------------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------- | -------------------------------------------- | ---------------- | ------------- | +| **EnvoyFilter** | Customize the Envoy configuration generated by istiod, e.g. modify values for certain fields, add specific filters, or even add entirely new listeners, clusters. | GatewayClass, Gateway, Service, ServiceEntry | Custom | Inherited | +| **RequestAuthentication** | Define request authentication methods supported by a workload. | GatewayClass, Gateway, Service, ServiceEntry | Custom | Inherited | +| **AuthorizationPolicy** | Enable access control on workloads in the mesh. | GatewayClass, Gateway, Service, ServiceEntry | Custom | Inherited | +| **WasmPlugin** | Extend the functionality provided by the Istio proxy through WebAssembly filters. | GatewayClass, Gateway, Service, ServiceEntry | Custom | Inherited | +| **Telemetry** | Defines how telemetry (metrics, logs and traces) is generated for workloads within a mesh. | GatewayClass, Gateway, Service, ServiceEntry | Custom | Inherited | + +#### NGINX Gateway Fabric + +https://docs.nginx.com/nginx-gateway-fabric/overview/custom-policies/ + +Gateway API implementation that supports Gateway API’s `BackendTLSPolicy` as well as the following kinds of policies: + +| Policy kind | Description | Target kinds | Merge strategies | Policy class | +| -------------------------- | -------------------------------------------------------- | ----------------------------- | ------------------ | ------------ | +| **ClientSettingsPolicy** | Define settings related to tracing, metrics, or logging. | Gateway, HTTPRoute, GRPCRoute | Patch defaults | Inherited | +| **ObservabilityPolicy** | Configure connection behavior between client and NGINX. | HTTPRoute, GRPCRoute | None | Direct | +| **UpstreamSettingsPolicy** | Configure connection behavior between NGINX and backend. | Service | None | Direct | + +#### Gloo Gateway + +https://docs.solo.io/gateway/latest/about/custom-resources/#policies + +Gateway API implementation that defines the following kinds of policies: + +| Policy kind | Description | Target kinds | Merge strategies | Policy class | +| ---------------------- | -------------------------------------------------------------------------- | ------------------- | ------------------- | ------------ | +| **ListenerOption** | Augment behavior of one, multiple, or all gateway listeners. | Gateway, _Listener_ | None | Direct | +| **HTTPListenerOption** | Augment behavior of one, multiple, or all HTTP and HTTPS listeners. | Gateway, _Listener_ | None | Direct | +| **RouteOption** | Augment behavior of one, multiple, or all routes in an HTTPRoute resource. | HTTPRoute | None | Direct | +| **VirtualHostOption** | Augment behavior of the hosts on one, multiple, or all gateway listeners. | Gateway, _Listener_ | Atomic defaults | Inherited | + +#### Kuadrant + +https://docs.kuadrant.io + +First Gateway API integration entirely based on the Metaresources and Policy Attachment pattern. Defines the following kinds of policies: + +| Policy kind | Description | Target kinds | Merge strategies | Policy class | +| -------------------- | ------------------------------------------------------------------------------------------------------------- | --------------------------------------------- | ----------------------- | ------------ | +| **DNSPolicy** | Manage the lifecycle of DNS records in external DNS providers such as AWS Route53, Google DNS, and Azure DNS. | Gateway, _Listener_ | Atomic defaults | Inherited | +| **TLSPolicy** | Manage the lifecycle of TLS certificate configuration on gateways using CertManager. | Gateway, _Listener_ | Atomic defaults | Inherited | +| **AuthPolicy** | Specify authentication and authorization rules for Gateways and Routes | Gateway, _Listener_, HTTPRoute, HTTPRouteRule | Atomic defaults, Custom | Inherited | +| **RateLimitPolicy** | Specify rate limiting rules for Gateways and Routes | Gateway, _Listener_, HTTPRoute, HTTPRouteRule | Atomic defaults, Custom | Inherited | + +### Other metaresource and policy-like implementations + +#### Network Policy API (Working Group, SIG-NETWORK) + +https://network-policy-api.sigs.k8s.io/ + +Defines two kinds of metaresources respectively for specifying *default* and *override* of networking policy rules: **AdminNetworkPolicy** and **BaselineAdminNetworkPolicy**. Builds on top of Kubernetes core `NetworkPolicy` kind. + +Although the Network Policy API custom resources do not strictly implement the Metaresources and Policy Attachment pattern, they are based on similar concepts that involve policy rules for augmenting the behavior of other Kubernetes objects (pods), attachment points, nested contexts (through namespaces and pod selectors), and Defaults & Overrides. + +#### Open Cluster Management + +https://open-cluster-management.io/docs/getting-started/integration/policy-controllers/policy-framework/ + +Does not implement Metaresources and Policy Attachment. However, defines a virtual policy kind (**ConfigurationPolicy**) and supports distributing other third-party kinds of policies such as Gatekeeper's **ConstraintTemplate** kind, via a **Policy** resource whose targets are nonetheless controlled by a separate set of resource (**Placement** and **PlacementBinding**). + +## Tools + +The following tools can be useful for implementing and supporting policies and policy custom controllers. + +#### gwctl + +https://github.com/kubernetes-sigs/gwctl + +CLI tool for visualizing and managing Gateway API resources in a Kubernetes cluster. Includes commands to visualize effective policies affecting the resources in compliance with the Metaresources and Policy Attachment pattern. + +#### policy-machinery + +https://github.com/Kuadrant/policy-machinery -#### On targeted resources - -(copied from [Standard status Condition on Policy-affected objects](#standard-status-condition-on-policy-affected-objects)) - -This solution requires definition in a GEP of its own to become binding. - -**The description included here is intended to illustrate the sort of solution -that an eventual GEP will need to provide, _not to be a binding design.** - -Implementations that use Policy objects MUST put a Condition into `status.Conditions` -of any objects affected by a Policy. - -That Condition must have a `type` ending in `PolicyAffected` (like -`gateway.networking.k8s.io/PolicyAffected`), -and have the optional `observedGeneration` field kept up to date when the `spec` -of the Policy-attached object changes. - -Implementations _should_ use their own unique domain prefix for this Condition -`type` - it is recommended that implementations use the same domain as in the -`controllerName` field on GatewayClass (or some other implementation-unique -domain for implementations that do not use GatewayClass).) - -For objects that do _not_ have a `status.Conditions` field available (`Secret` -is a good example), that object MUST instead have an annotation of -`gateway.networking.k8s.io/PolicyAffected: true` (or with an -implementation-specific domain prefix) added instead. - -### Interaction with Custom Filters and other extension points -There are multiple methods of custom extension in the Gateway API. Policy -attachment and custom Route filters are two of these. Policy attachment is -designed to provide arbitrary configuration fields that decorate Gateway API -resources. Route filters provide custom request/response filters embedded inside -Route resources. Both are extension methods for fields that cannot easily be -standardized as core or extended fields of the Gateway API. The following -guidance should be considered when introducing a custom field into any Gateway -controller implementation: - -1. For any given field that a Gateway controller implementation needs, the - possibility of using core or extended should always be considered before - using custom policy resources. This is encouraged to promote standardization - and, over time, to absorb capabilities into the API as first class fields, - which offer a more streamlined UX than custom policy attachment. - -2. Although it's possible that arbitrary fields could be supported by custom - policy, custom route filters, and core/extended fields concurrently, it is - recommended that implementations only use multiple mechanisms for - representing the same fields when those fields really _need_ the defaulting - and/or overriding behavior that Policy Attachment provides. For example, a - custom filter that allowed the configuration of Authentication inside a - HTTPRoute object might also have an associated Policy resource that allowed - the filter's settings to be defaulted or overridden. It should be noted that - doing this in the absence of a solution to the status problem is likely to - be *very* difficult to troubleshoot. - -## Removing BackendPolicy -BackendPolicy represented the initial attempt to cover policy attachment for -Gateway API. Although this proposal ended up with a similar structure to -BackendPolicy, it is not clear that we ever found sufficient value or use cases -for BackendPolicy. Given that this proposal provides more powerful ways to -attach policy, BackendPolicy was removed. - -## Alternatives considered - -### 1. ServiceBinding for attaching Policies and Routes for Mesh -A new ServiceBinding resource has been proposed for mesh use cases. This would -provide a way to attach policies, including Routes to a Service. - -Most notably, these provide a way to attach different policies to requests -coming from namespaces or specific Gateways. In the example below, a -ServiceBinding in the consumer namespace would be applied to the selected -Gateway and affect all requests from that Gateway to the foo Service. Beyond -policy attachment, this would also support attaching Routes as policies, in this -case the attached HTTPRoute would split requests between the foo-a and foo-b -Service instead of the foo Service. - -![Simple Service Binding Example](images/713-servicebinding-simple.png) - -This approach can be used to attach a default set of policies to all requests -coming from a namespace. The example below shows a ServiceBinding defined in the -producer namespace that would apply to all requests from within the same -namespace or from other namespaces that did not have their own ServiceBindings -defined. - -![Complex Service Binding Example](images/713-servicebinding-complex.png) - -#### Advantages -* Works well for mesh and any use cases where requests don’t always transit - through Gateways and Routes. -* Allows policies to apply to an entire namespace. -* Provides very clear attachment of polices, routes, and more to a specific - Service. -* Works well for ‘shrink-wrap application developers’ - the packaged app does - not need to know about hostnames or policies or have extensive templates. -* Works well for ‘dynamic’ / programmatic creation of workloads ( Pods,etc - see - CertManager) -* It is easy to understand what policy applies to a workload - by listing the - bindings in the namespace. - -#### Disadvantages -* Unclear how this would work with an ingress model. If Gateways, Routes, and - Backends are all in different namespaces, and each of those namespaces has - different ServiceBindings applying different sets of policies, it’s difficult - to understand which policy would be applied. -* Unclear if/how this would interact with existing the ingress focused policy - proposal described below. If both coexisted, would it be possible for a user - to understand which policies were being applied to their requests? -* Route status could get confusing when Routes were referenced as a policy by - ServiceBinding -* Introduces a new mesh specific resource. - -### 2. Attaching Policies for Ingress -An earlier proposal for policy attachment in the Gateway API suggested adding -policy references to each Resource. This works very naturally for Ingress use -cases where all requests follow a path through Gateways, Routes, and Backends. -Adding policy attachment at each level enables different roles to define -defaults and allow overrides at different levels. - -![Simple Ingress Attachment Example](images/713-ingress-attachment.png) - -#### Advantages -* Consistent policy attachment at each level -* Clear which policies apply to each component -* Naturally translates to hierarchical Ingress model with ability to delegate - policy decisions to different roles - -#### Disadvantages -* Policy overrides could become complicated -* At least initially, policy attachment on Service would have to rely on Service - annotations or references from policy to Service(s) -* No way to attach policy to other resources such as namespace or ServiceImport -* May be difficult to modify Routes and Services if other components/roles are - managing them (eg Knative) - -### 3. Shared Policy Resource -This is really just a slight variation or extension of the main proposal in this -GEP. We would introduce a shared policy resource. This resource would follow the -guidelines described above, including the `targetRef` as defined as well as -`default` and `override` fields. Instead of carefully crafted CRD schemas for -each of the `default` and `override` fields, we would use more generic -`map[string]string` values. This would allow similar flexibility to annotations -while still enabling the default and override concepts that are key to this -proposal. - -Unfortunately this would be difficult to validate and would come with many of -the downsides of annotations. A validating webhook would be required for any -validation which could result in just as much or more work to maintain than -CRDs. At this point we believe that the best experience will be from -implementations providing their own policy CRDs that follow the patterns -described in this GEP. We may want to explore tooling or guidance to simplify -the creation of these policy CRDs to help simplify implementation and extension -of this API. +Golang library for implementing policy controllers. Defines types and functions to build Directed Acyclic Graphs (DAG) to represent hierarchies of targetable resources and attached policies, calculate effective policies based on standard and custom merge strategies, etc. Includes helpers for applications based on Gateway API. ## References diff --git a/geps/gep-713/metadata.yaml b/geps/gep-713/metadata.yaml index afd8c9b75e..48cb894d65 100644 --- a/geps/gep-713/metadata.yaml +++ b/geps/gep-713/metadata.yaml @@ -9,14 +9,15 @@ authors: - kflynn - sanjaypujare - spacewander + - guicassolato relationships: - extendedBy: + obsoletes: - name: Direct Policy Attachment number: 2648 - description: Split out Direct Policy Attachment + description: Former split out of the Direct class of policies (Declined) - name: Inherited Policy Attachment number: 2649 - description: Split out Inherited Policy Attachment + description: Former split out of the Inherited class of policies (Declined) references: - "https://github.com/kubernetes-sigs/gateway-api/issues/611" - "https://docs.google.com/document/d/13fyptUtO9NV_ZAgkoJlfukcBf2PVGhsKWG37yLkppJo/edit?resourcekey=0-Urhtj9gBkGBkSL1gHgbWKw" diff --git a/mkdocs.yml b/mkdocs.yml index a697494d44..53003b608e 100644 --- a/mkdocs.yml +++ b/mkdocs.yml @@ -125,7 +125,6 @@ nav: - Provisional: - geps/gep-1494/index.md - geps/gep-1651/index.md - - geps/gep-2648/index.md - Implementable: - geps/gep-91/index.md - geps/gep-3567/index.md @@ -174,6 +173,8 @@ nav: - Declined: - geps/gep-735/index.md - geps/gep-1282/index.md + - geps/gep-2648/index.md + - geps/gep-2649/index.md - Contributing: - How to Get Involved: contributing/index.md - Developer Guide: contributing/devguide.md