Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix: Missing Kind in preview mode (yaml/v2) #2904

Merged
merged 4 commits into from
Mar 27, 2024

Conversation

EronWright
Copy link
Contributor

@EronWright EronWright commented Mar 22, 2024

Proposed changes

The yaml/v2 package eagerly resolves kinds to apply a default namespace. This PR addresses a problem with the use case
where a kind is installed by one ConfigGroup and then used by another. In preview mode, the kind cannot be found by the latter group because the expected side-effect (i.e. installation of a CRD) doesn't occur.

The solution to the issue is to maintain a cache in the provider of new CRDs. When the provider executes Check on a CustomResourceDefinition resource, it adds the CRD to the cache. Later, when a component resource must resolve a given kind, it checks the cache. In this way, the cache compensates for the lack of side-effects (i.e. API server calls) that would normally allow the kind to be resolved. Overall ordering is established using DependsOn.

It is reasonable to populate the cache during Check because a)Check produces the "planned" state that is needed later without actuating it, and b) Update is called conditionally and thus isn't a reliable alternative.

Implementation Details

The provider has three overlapping modes of operation - preview-mode, clusterUnreachable and yamlRenderMode - that affect the initialization and use of clientSet. Previously when clusterUnreachable is true then the clientSet is left null. Now, we new-up the clientSet itself but leave its various fields null. This change makes it possible to use the CRD cache in all modes, as was necessary to support yamlRenderMode.

Testing

A new integration test is added called yamlv2 that exercises this scenario.

Manual testing of "cluster unreachable mode" was done by misconfiguring my local environment (to have an invalid kube context). Tested in combination with "yaml rendering mode" and found that the latter works as expected even when the cluster is unreachable.

Related issues (optional)

Copy link

Does the PR have any schema changes?

Looking good! No breaking changes found.
No new resources/functions.

Copy link

codecov bot commented Mar 23, 2024

Codecov Report

Attention: Patch coverage is 41.79104% with 39 lines in your changes are missing coverage. Please review.

Project coverage is 27.63%. Comparing base (a15641e) to head (28c6912).

Files Patch % Lines
provider/pkg/clients/cache.go 50.00% 13 Missing and 5 partials ⚠️
provider/pkg/clients/clients.go 36.84% 11 Missing and 1 partial ⚠️
provider/pkg/provider/provider.go 22.22% 6 Missing and 1 partial ⚠️
provider/pkg/provider/invoke_decode_yaml.go 0.00% 1 Missing ⚠️
provider/pkg/provider/invoke_helm_template.go 0.00% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #2904      +/-   ##
==========================================
+ Coverage   27.55%   27.63%   +0.08%     
==========================================
  Files          53       54       +1     
  Lines        7818     7862      +44     
==========================================
+ Hits         2154     2173      +19     
- Misses       5485     5504      +19     
- Partials      179      185       +6     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@EronWright EronWright marked this pull request as ready for review March 25, 2024 17:12
@EronWright EronWright requested review from rquitales and blampe March 25, 2024 17:12
@EronWright EronWright added the impact/no-changelog-required This issue doesn't require a CHANGELOG update label Mar 25, 2024
Copy link
Contributor

@blampe blampe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mostly just some small questions and clarifications.

Higher level -- I definitely agree this makes for a better user experience, and how you've implemented things matches my understanding of the provider lifecycle (a single provider per operation etc.). I can't think of a better way to accomplish the desired UX given the current engine behavior.

For completeness, the strongest argument I can think of against this is that it's an experience the user will only see once when a CRD is first installed. Does one preview over the lifetime of a stack warrant complexity that bumps into things like the engine/provider contract?

I don't have enough context to feel strongly one way or the other. I do prefer the UX, but my default position is to usually err on the side of simpler-is-better when brushing up against system boundaries like this. If folks from the Platform side are on board with this then it seems like a no brainer.

GenericClient: nil,
DiscoveryClientCached: nil,
RESTMapper: nil,
CRDCache: &CRDCache{},
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would make CRDCache private if there's only meant to be one instance accessed via this client set.

provider/pkg/clients/clients_test.go Show resolved Hide resolved
@@ -257,7 +257,7 @@ func (c *chart) template(clientSet *clients.DynamicClientSet) (string, error) {
installAction.APIVersions = c.opts.APIVersions
}

if clientSet != nil {
if clientSet != nil && clientSet.DiscoveryClientCached != nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the tl;dr behind clientSet and DiscoveryClientCached that I would use DiscoveryClientCached if I can tolerate stale data and clientSet directly if I want to hit the live API?

Copy link
Contributor Author

@EronWright EronWright Mar 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, the clientSet is a set of clients including a discovery client (that happens to have a built-in cache), a resource client, and a REST Mapper for translating Kinds to API Resources. AFAIK the provider doesn't cache any k8s objects but does cache the discovered type information.

provider/pkg/provider/provider.go Show resolved Hide resolved
tests/provider/yamlv2_test.go Show resolved Hide resolved
provider/pkg/clients/clients_test.go Show resolved Hide resolved
@EronWright
Copy link
Contributor Author

I had a discussion with @Frassle where he expressed support but advised me to submit some tests into pu/pu to harden the contract.

@Frassle
Copy link
Member

Frassle commented Mar 26, 2024

I don't have enough context to feel strongly one way or the other. I do prefer the UX, but my default position is to usually err on the side of simpler-is-better when brushing up against system boundaries like this. If folks from the Platform side are on board with this then it seems like a no brainer.

There are a couple of arguments against this (or adding state anywhere in providers for that matter).

  1. Is it makes debug attach trickier because you probably now need a clear "Reset" signal to tell the long running attached provider process that it's time to clear its state and start again because a new deployment cycle is happening.
  2. It's less performant because we can't share the provider process across the preview and up phases of up (well I suppose we could if we had a supported Reset method like for 1).

But I suspect providers can do much smarter previews if they are stateful, and that's probably worth the downsides of the above.

provider/pkg/clients/cache.go Outdated Show resolved Hide resolved
provider/pkg/clients/cache.go Show resolved Hide resolved
@EronWright
Copy link
Contributor Author

Just want to mention that the Cancel RPC method could maybe serve as a reset signal. Today, Cancel is not called unless the user cancels the deployment, but it is probably harmless to call at the end of every run, and could reset the provider state for another run. To the point about attaching to a long-running provider, cancellation is a further complication to that story unless the provider takes pains to 'reset' (p-k doesn't; it simply closes a channel).

Copy link
Member

@rquitales rquitales left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

provider/pkg/clients/cache.go Show resolved Hide resolved
@EronWright EronWright enabled auto-merge (squash) March 27, 2024 21:58
@EronWright EronWright merged commit 76baec9 into master Mar 27, 2024
18 checks passed
@EronWright EronWright deleted the eronwright/fix-missing-kind branch March 27, 2024 22:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/yaml impact/no-changelog-required This issue doesn't require a CHANGELOG update
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants