-
Notifications
You must be signed in to change notification settings - Fork 50
Add a bundle deployment reconciler to handle v1alpha2 API #743
Add a bundle deployment reconciler to handle v1alpha2 API #743
Conversation
cc: @perdasilva @ncdc |
@@ -86,6 +86,9 @@ type BundleDeploymentStatus struct { | |||
//+kubebuilder:printcolumn:name=Provisioner,type=string,JSONPath=`.spec.provisionerClassName`,priority=1 | |||
|
|||
// BundleDeployment is the Schema for the bundledeployments API | |||
// TODO(varsha): Marking this as unserved version for now. Remove this definition during cleanup, when the all the | |||
// work on alpha2 is done. | |||
// +kubebuilder:unservedversion |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This PR does not wire in the reconciler to the controller yet. Can make this as a storage version until that is done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you mark v1alpha1 as unserved is the expectation that conversion from v1alpha1 to v1alpha2 and back exists?
Codecov ReportAttention:
Additional details and impacted files@@ Coverage Diff @@
## main #743 +/- ##
==========================================
- Coverage 24.67% 23.71% -0.96%
==========================================
Files 14 22 +8
Lines 1143 1923 +780
==========================================
+ Hits 282 456 +174
- Misses 807 1373 +566
- Partials 54 94 +40
☔ View full report in Codecov by Sentry. |
Please review from the second commit. The first one would be rebased on #713 |
|
||
controller crcontroller.Controller | ||
dynamicWatchMutex sync.RWMutex | ||
dynamicWatchGVKs map[schema.GroupVersionKind]struct{} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sets.Set[schema.GroupVersionKind]
//+kubebuilder:rbac:groups=core.rukpak.io,resources=bundledeployments,verbs=list;watch | ||
//+kubebuilder:rbac:groups=core.rukpak.io,resources=bundledeployments/status,verbs=update;patch | ||
//+kubebuilder:rbac:groups=core.rukpak.io,resources=bundledeployments/finalizers,verbs=update | ||
//+kubebuilder:rbac:groups=*,resources=*,verbs=* |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
uh oh :(
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know! But I have no idea on the intention behind this. Copied from
//+kubebuilder:rbac:groups=*,resources=*,verbs=* |
func (b *bundleDeploymentReconciler) Reconcile(ctx context.Context, req ctrl.Request) (ctrl.Result, error) { | ||
log := log.FromContext(ctx) | ||
|
||
existingBD := &v1alpha2.BundleDeployment{} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: rukpackv1alpha2
|
||
// The controller is not updating spec, we only update the status. Hence sending | ||
// a status update should be enough. | ||
if !equality.Semantic.DeepEqual(existingBD.Status, reconciledBD.Status) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we use applyconfigurations? I imagine the code generators create them already; new code should be written to use them if possible.
// by the bundle deployment store. These methods facilitate the unpacking and storage | ||
// of contents from all the sources. | ||
type Store interface { | ||
afero.Fs |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm surprised you want to inline all the low-level bits from this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right now, the sources use limited number of helpers just for creating/removing/verifying contents inside dir. Instead of specifying them separately, inlining of afero.Fs
would make it easier and save us from adding more helpers to this interface. I assume we would be using this only in rukpak, that too to access localdir, so expecting an afero.Fs implementation to be passed seemed reasonable.
internal/v1alpha2/source/image.go
Outdated
return nil, fmt.Errorf("validation unsuccessful during unpacking %v", err) | ||
} | ||
// storage path to store contents in local directory. | ||
storagePath := filepath.Join(store.GetBundleDeploymentName(), filepath.Clean(source.Destination)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You pass store
and source
into i.unpack()
so you can likely inline this logic there instead of passing it in as a parameter
op, err := i.ensureUnpackPod(ctx, store.GetBundleDeploymentName(), *source, pod, opts) | ||
if err != nil { | ||
return nil, err | ||
} else if op == controllerutil.OperationResultCreated || op == controllerutil.OperationResultUpdated || pod.DeletionTimestamp != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If i.ensureUnpackPod()
did not return Created or Updated, why is a deleting pod a pending unpack?
if !apierrors.IsInvalid(err) { | ||
return controllerutil.OperationResultNone, err | ||
} | ||
if err := i.Client.Delete(ctx, existingPod); err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Most of the Pod spec is immutable. What's the meaning of creating a broad apply-config & doing this dance? In what cases is updating the existing pod what you want? Why not delete + create always?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or is this factored to start with an apply for the happy case where there's nothing to change? I imagine there's a way to compute whether or not a change has occurred on the Bundle object and determine if it's needed to update the pod without a call to the API server
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or is this factored to start with an apply for the happy case where there's nothing to change?
This has been taken from the current alpha1 implementation. But this is the only reason I see.
I imagine there's a way to compute whether or not a change has occurred on the Bundle object and determine if it's needed to update the pod without a call to the API server
This in general needs to done on a higher level, regardless of the image unpacking using a pod. Right now, even though there is no change, the design unpacks during every reconcile (whether image, git or http) which is unnecessary. Probably a caching mechanism, where we compute a hash of the bdName and bundleSource (which contain image digest, git commit etc) to figure out if there has been a change in the bundle content would be a better way to reduce time and resource during reconcile. I had implemented something similar in #727, but wanted to keep it separate for now. If this is implemented, then we need not patch, instead as you mentioned, delete+create as we would be sure of the change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Whichever gets you iterating faster is fine. I think a more explicit input check makes the rest of those code simpler & easier to test and reason about, but since you're copying it's moot :)
|
||
// make sure the passed in pod value is updated with the latest | ||
// version of the pod | ||
*pod = *updatedPod |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably best to return the pod instead of mutating in place - is there motivation for the side-effect approach?
// compare existingPod to newPod and return an appropriate | ||
// OperatorResult value. | ||
newPod := updatedPod.DeepCopy() | ||
unsetNonComparedPodFields(existingPod, newPod) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is slightly hard to follow. If what you want is to compare pod.Spec
, why not just do that?
if err != nil { | ||
return fmt.Errorf("unpack failed: failed to retrieve failed pod logs: %v", err) | ||
} | ||
_ = i.Client.Delete(ctx, pod) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Likely want to ignore not found and return the error?
// Options to configure bundleDeploymentReconciler | ||
type Option func(bd *bundleDeploymentReconciler) | ||
|
||
func WithUnpacker(u *source.Unpacker) Option { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When is it valid not to have an unpacker?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Never, which is why this validation: https://github.com/operator-framework/rukpak/pull/743/files#diff-509cbdc8bdbb6d4f857a9b4c4a0205f715c51d4850e57838e3f2141133029c1eR287
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't we just have this as a required field in the constructor instead of an option?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't we just have this as a required field in the constructor instead of an option?
+1. An alternative would be to have a default unpacker that would be changed if this option is specified.
return ctrl.Result{}, err | ||
} | ||
|
||
switch res.State { | ||
case source.StateUnpackPending: | ||
setUnpackStatusPending(&bundleDeployment.Status.Conditions, fmt.Sprintf("pending unpack"), bundleDeployment.Generation) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't we be watching pods with some unpacking label and triggering off of them changing state?
How much of this is straight copies from v1alpha1? Notably I think @everettraven has direct client unpacking merged, are you expecting to use that too? |
@stevekuznetsov The unpacking and deploy logic is copied from v1alpha1 bits. Implementing direct client unpacking is not a now a part of this right now. The design for this was to allow different implementations of store, unpacker and deployer. Some of this are to be passed per bundle deployment (store), whereas unpacker, validator and deployer are passed while creating the controller. I'll look at your comments and address those. |
This commit intends to introduce the new v1alpha2 APIs Signed-off-by: Varsha Prasad Narsing <[email protected]> (cherry picked from commit a0238b9)
This commit adds alpha2 as storage version, and marks alpha1 to be unserved. Signed-off-by: Varsha Prasad Narsing <[email protected]>
This commit: 1. Adds skeleton for controller logic. 2. Adds "Store" interface, implementation for bundledeploymentstore and relevant tests. Signed-off-by: Varsha Prasad Narsing <[email protected]>
This commit adds the default unpackers, with the specific implementation and tests for unpacking from image source. Signed-off-by: Varsha Prasad Narsing <[email protected]>
This commit adds: 1. Logic for validation of helm, registryV1 and plain bundles. 2. Tests for image source and a mock store implementation for use. Signed-off-by: Varsha Prasad Narsing <[email protected]>
This commit introduces: 1. Deployer interface and Helm deployer - to deploy contents using helm 2. Wire deployer logic into reconciler. 3. Add SetupWithManager, so that this can be called from core/main.go in a follow up . Signed-off-by: Varsha Prasad Narsing <[email protected]>
…happy Signed-off-by: Varsha Prasad Narsing <[email protected]>
aa3dfa5
to
1237a3c
Compare
// +optional | ||
Paused bool `json:"paused,omitempty"` | ||
|
||
// Config is helm spcific configuration to load |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// Config is helm spcific configuration to load | |
// Config is helm specific configuration to load |
// Options to configure bundleDeploymentReconciler | ||
type Option func(bd *bundleDeploymentReconciler) | ||
|
||
func WithUnpacker(u *source.Unpacker) Option { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't we just have this as a required field in the constructor instead of an option?
+1. An alternative would be to have a default unpacker that would be changed if this option is specified.
// result can be nil, when there is an error during unpacking. This indicates | ||
// that unpacking was failed. Update the status accordingly. | ||
if res == nil || err != nil { | ||
setUnpackStatusFailing(&bundleDeployment.Status.Conditions, fmt.Sprintf("unpack unsuccessful %v", err), bundleDeployment.Generation) | ||
return ctrl.Result{}, err | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it possible for res == nil
and err == nil
at the same time? If so, I'd be cautious of lumping them into the same conditional here. From a ux perspective, if we hit the case where res == nil && err == nil
the message in the status is going to be something like unpack unsuccessful <nil>
which is a pretty non-descriptive message.
// Requeing after 5 sec for now since the average time to unpack an registry bundle locally | ||
// was around ~4sec. | ||
// Warning: This could end up requeing indefinitely, till an error has occurred. | ||
return ctrl.Result{RequeueAfter: 5 * time.Second}, nil |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to manually requeue? IIUC we have a watch on Pod
s because the Unpacker uses them to unpack the bundle images. It looks like the v1alpha2 unpacker only has an image
source so it seems like changes to the unpack Pod
should trigger a re-reconciliation of the BundleDeployment
that owns it making this forced requeue unnecessary
case source.StateUnpacking: | ||
setUnpackStatusPending(&bundleDeployment.Status.Conditions, "unpacking in progress", bundleDeployment.Generation) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IIRC having a returned state of Unpacking
from the unpacker used to correlate to updating the status condition Unpacked
's reason to Unpacking
. Is there a particular reason to remove it and instead use Pending
with the message unpacking in progress
?
// setUnpgradeStatusFailed sets the installed success to failing as there is an error while patching | ||
// objects on cluster. | ||
func setUnpgradeStatusFailed(conditions *[]metav1.Condition, message string, generation int64) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// setUnpgradeStatusFailed sets the installed success to failing as there is an error while patching | |
// objects on cluster. | |
func setUnpgradeStatusFailed(conditions *[]metav1.Condition, message string, generation int64) { | |
// setUpgradeStatusFailed sets the installed success to failing as there is an error while patching | |
// objects on cluster. | |
func setUpgradeStatusFailed(conditions *[]metav1.Condition, message string, generation int64) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this net new or has this code been moved+modified?
} | ||
|
||
// Result conveys the progress information about deploying content. | ||
// TODO: Refactor to use the same result struct for unpacking and deployment. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What prevents us from doing this now?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because of ResolvedSource field. This api was intended to accept multiple resolved sources for a single BD. So that would need more discussion.
// deployContents calls the registered deployer to apply the bundle contents onto the cluster. | ||
func (b *bundleDeploymentReconciler) deployContents(ctx context.Context, store store.Store, bd *v1alpha2.BundleDeployment) (*deployer.Result, error) { | ||
return b.deployer.Deploy(ctx, store, bd) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This function seems unneccessary. We could just use the b.deployer.Deploy()
function directly
switch deployResult.State { | ||
case deployer.StateIntallFailed: | ||
setInstallStatusFailed(&bundleDeployment.Status.Conditions, err.Error(), bundleDeployment.Generation) | ||
return ctrl.Result{}, err | ||
case deployer.StateUnpgradeFailed: | ||
setUnpgradeStatusFailed(&bundleDeployment.Status.Conditions, err.Error(), bundleDeployment.Generation) | ||
return ctrl.Result{}, err | ||
case deployer.StateReconcileFailed: | ||
setReconcileStatusFailed(&bundleDeployment.Status.Conditions, err.Error(), bundleDeployment.Generation) | ||
return ctrl.Result{}, err | ||
case deployer.StateObjectFetchFailed: | ||
setDynamicWatchFailed(&bundleDeployment.Status.Conditions, err.Error(), bundleDeployment.Generation) | ||
return ctrl.Result{}, err | ||
case deployer.StateDeploySuccessful: | ||
setInstallStatusSuccess(&bundleDeployment.Status.Conditions, fmt.Sprintf("installed %s", bundleDeployment.GetName()), bundleDeployment.Generation) | ||
default: | ||
return ctrl.Result{}, fmt.Errorf("unknown deploy state %q for bundle deployment %s: %v", deployResult.State, bundleDeployment.GetName(), bundleDeployment.Generation) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a way we could make deployContents return the condition that should be set?
Have discussed with @varshaprasad96 @gallettilance @joelanford - we are going to pause on this for the moment. During the pause, we are going to investigate using https://carvel.dev/kapp-controller/ an alternative to continuing to restructure the rukpak APIs. kapp-controller does almost everything rukpak currently does (it doesn't yet support registry+v1 bundles) and it will save us a lot of time if we can switch to using it. Closing this for now. We can reopen depending on the results of our research. |
Haven't seen this in WG discussion. Have I missed a thread? |
No missed threads. This is something I've been talking about with @joelanford and a few other folks. |
This PR does considerable refactoring to the existing code to introduce a new reconciler to handle v1alpha2 bundle deployment API. This does not remove or make any changes to the existing alpha1 code base. Once we have everything related to alpha2, we can cleanup and remove other controllers and manifests that are not required.
Major changes made:
The reconciler is not plugged into
core/main.go
. This PR splits #727 to make review process easier. All the features, not implemented here are tracked in there.