Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Start building out shim layer for Tinkerbell #8

Closed
wants to merge 15 commits into from
Closed

Conversation

detiber
Copy link
Contributor

@detiber detiber commented Dec 10, 2020

Description

  • Adds start of types and other scaffolding for a thinkerbell shim layer

Copy link
Contributor Author

@detiber detiber left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • Hardware type still needs to be built out.
  • Scaffolding for workflow and hardware controllers needs to be built out
  • Tinkerbell informers need to be setup for all types and plumbed through to trigger reconciliations of appropriate resources.
  • Add support/rbac permissions for creating k8s resources to mirror Tinkerbell resources when the k8s resources do not yet exist.

limitations under the License.
*/

package v1alpha3
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added this mainly for completeness

go.mod Outdated
k8s.io/component-base v0.17.14
k8s.io/klog v1.0.0
k8s.io/utils v0.0.0-20201110183641-67b214c5f920
k8s.io/utils v0.0.0-20201110183641-67b214c5f920 // indirect
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These changes came from go mod tidy

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should be committed separately I think, together with go.sum. And there should be a CI check for that.

Edit: #10

Comment on lines 28 to 98
type HardwareSpec struct {
}

// HardwareStatus defines the observed state of Hardware.
type HardwareStatus struct {
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Didn't get far enough today to finish out the Hardware types

const (
// TemplateIDAnnotation is used by the controller to store the
// ID assigned to the workflow by Tinkerbell.
TemplateIDAnnotation = "template.tinkerbell.org/id"
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Originally I was planning on adding a Spec field for the ID that is stored in Tinkerbell, but then I thought it might be confusing for users, so I figured an annotation might work. We can also add it to the Status to make it more easily visible to the user, but I also wanted to make sure that it was stored in a persistent way for idempotency.

Comment on lines 37 to 40
client.Client
Log logr.Logger
Recorder record.EventRecorder
Scheme *runtime.Scheme
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TODO: need to plumb through a tinkerbell client

Comment on lines 82 to 86
err := ErrNotImplemented

logger.Error(err, "Not yet implemented")

return ctrl.Result{}, err
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TODO: use the tinkerbell client to reconcile resource

Comment on lines 90 to 97
logger := r.Log.WithValues("template", template.Name)
err := ErrNotImplemented

logger.Error(err, "Not yet implemented")

// controllerutil.RemoveFinalizer(template, tinkv1alpha1.TemplateFinalizer)

return ctrl.Result{}, err
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TODO: use the tinkerbell client to delete resource and remove finalizer

Copy link
Contributor

@invidian invidian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be nice to get this through checks added in #9.

Left some thoughts and questions.

go.mod Outdated
k8s.io/component-base v0.17.14
k8s.io/klog v1.0.0
k8s.io/utils v0.0.0-20201110183641-67b214c5f920
k8s.io/utils v0.0.0-20201110183641-67b214c5f920 // indirect
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should be committed separately I think, together with go.sum. And there should be a CI check for that.

Edit: #10

See the License for the specific language governing permissions and
limitations under the License.
*/

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need doc.go file if it contains no doc? 😉 Perhaps we should move the docs from groupversion_info.go.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't necessarily, but we've had issues with web-based tools like godoc in the past with the lack of doc.go files when someone tries to access docs specific to the api types and the project wasn't already indexed

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Godoc require at least one file in the package to have documentation, it doesn't necessarily be doc.go file. Though when you have many files in the package, IMO it's good to have a separate file for docs, so no other files becomes "special".

TL;DR I suggest moving the Package v1alpha1 .. comment from groupversion_info.go to doc.go file.

Comment on lines 23 to 24
// HardwareIDAnnotation is used by the controller to store the
// ID assigned to the workflow by Tinkerbell.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// HardwareIDAnnotation is used by the controller to store the
// ID assigned to the workflow by Tinkerbell.
// HardwareIDAnnotation is used by the controller to store the
// hardware ID assigned to the workflow by Tinkerbell.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, copypasta issue. Should have been 'ID assigned to the hardware by Tinkerbell'

Comment on lines 60 to 69
before := template.DeepCopy()
controllerutil.AddFinalizer(template, tinkv1alpha1.TemplateFinalizer)

if err := r.Client.Patch(ctx, template, client.MergeFrom(before)); err != nil {
logger.Error(err, "Failed to add finalizer to template")

return ctrl.Result{}, fmt.Errorf("failed to add finalizer: %w", err)
}

return ctrl.Result{}, nil
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems to me that this all could go to separate function to keep Reconcile shorter.

// +kubebuilder:resource:path=hardware,scope=Cluster,categories=tinkerbell
// +kubebuilder:storageversion

// Hardware is the Schema for the Hardware API.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I understand correctly, those types will be exposed by Kubernetes API, so cluster API controller can create those instead of Tinkerbell structs directly.

I would like to know:

  • What is the benefit of this indirection? Other than visibility (ability to see Tinkerbell records using Kubernetes API). I guess also separation of concerns as creating Tinkerbell records is now independent from Cluster API.
  • If I understand correctly, Tinkerbell controllers could live as a separate project, to allow interacting with Tinkerbell via Kubernetes API. But for now they will live here? What if we have a separate controller in the future and we would like create entries for multiple Tinkerbell instances. Is this solution extensible for such use case? Perhaps having a separate controller per namespace would allow that?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I understand correctly, those types will be exposed by Kubernetes API, so cluster API controller can create those instead of Tinkerbell structs directly.

I would like to know:

  • What is the benefit of this indirection? Other than visibility (ability to see Tinkerbell records using Kubernetes API). I guess also separation of concerns as creating Tinkerbell records is now independent from Cluster API.

It is indeed mostly a separation of concerns. This also aligns with some of the discussions that have taken place upstream about trying to leverage similar tooling for other providers, such as the AWS Controllers for Kubernetes as an implementation detail of cluster-api-provider-aws.

  • If I understand correctly, Tinkerbell controllers could live as a separate project, to allow interacting with Tinkerbell via Kubernetes API. But for now they will live here?

Yes, longer term I would expect this to move out to a separate project. There will need to be some thought put into how to reconcile with similar work for the tinkerbell crossplane provider to avoid too much unnecessary duplication.

What if we have a separate controller in the future and we would like create entries for multiple Tinkerbell instances. Is this solution extensible for such use case? Perhaps having a separate controller per namespace would allow that?

That is definitely something that we need to consider. I created the resources as cluster scoped with the idea of a 1-1 mapping between Cluster API and a Tinkerbell instance, but I think we have a few options available if we want to remove that assumption:

  • Make the resources namespaced, when the controller reconciles a resource it pulls the connection information for the related tinkerbell instance from a secret stored in the namespace
    • this would simplify mapping of names between tinkerbell and the controller, but would complicate the interaction between CAPT and the controllers
  • Have a cluster scoped resource that can be used for storing connection information for multiple tinkerbell instances
    • Each resource can optionally specify which connection information to use
    • If not specified, have some method of falling back to a default
    • Will need to figure out how to handle conflicting naming between k8s resources and tinkerbell resources across multiple tinkerbell instances

The latter approach is what is done by https://github.com/aws/aws-controllers-k8s and likely the cleaner approach, however they are not doing a sync back the other way where naming could conflict. We could potentially avoid the two way sync and instead opt for having a way to import specific pre-existing resources from tinkerbell instead.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is indeed mostly a separation of concerns. This also aligns with some of the discussions that have taken place upstream about trying to leverage similar tooling for other providers, such as the AWS Controllers for Kubernetes as an implementation detail of cluster-api-provider-aws.

Make sense, thanks for confirmation and insights.

Yes, longer term I would expect this to move out to a separate project. There will need to be some thought put into how to reconcile with similar work for the tinkerbell crossplane provider to avoid too much unnecessary duplication.

I don't know about the crossplane provider, so I trust your judgement here then!

Make the resources namespaced, when the controller reconciles a resource it pulls the connection information for the related tinkerbell instance from a secret stored in the namespace
this would simplify mapping of names between tinkerbell and the controller, but would complicate the interaction between CAPT and the controllers

Yeah, namespacing seemed like an easy solution to me.

It seems to me that Tinkerbell is sort of special, as with cloud providers you usually have just one set of credentials and single API endpoint, while with Tinkerbell, you could have multiple.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are similar issues that are being tracked for various cloud providers to support the idea of multiple accounts, which isn't too far removed from what we'd need for tinkerbell, for example here's the tracking issue for cluster-api-provider-aws: kubernetes-sigs/cluster-api-provider-aws#1552

// HardwareSpec defines the desired state of Hardware.
type HardwareSpec struct {
// ID is the ID of the hardware in Tinkerbell
// +kubebuilder:validation:MinLength=1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should actually have a fixed length I think as it's an UUID.

k8s.io/api v0.17.14
github.com/google/uuid v1.1.2
github.com/onsi/gomega v1.10.1
github.com/tinkerbell/tink v0.0.0-20201210163923-6d9159b63857
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps we should use the same version as is used in sandbox repo right now: https://github.com/tinkerbell/sandbox/blob/master/current_versions.sh#L8.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I suppose this version does not have informers implemented yet 😞

Comment on lines 24 to 25
// ErrNotFound is returned if a requested resource is not found.
var ErrNotFound = errors.New("resource not found")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps we could follow https://travix.io/errors-derived-from-constants-in-go-fda6748b4072, so we avoid having exported global variable and we can use errors.Is().

Comment on lines 68 to 73
f.Objs[in.Id] = proto.Clone(in).(*hardware.Hardware)

return nil
}

return errors.New("nobody home")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd swap the logic to keep the happy path on the left side instead of indented.

}

// Get gets a Hardware from Tinkerbell.
func (f *Hardware) Get(ctx context.Context, id, ip, mac string) (*hardware.Hardware, error) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we have a struct here for selecting the hardware? Otherwise you get this ambigous calls:

f.Get(ctx, "", "", mac)
f.Get(ctx, "", ip, "")

It seems to me that it's easy to mess up the order.

With struct, you would have something like:

f.Get(ctx, &fake.HardwareSelector{Mac: mac})

Also, is it desired to return first met hardware if selectors are contradicting?

}

// Deletion is a noop.
if !hardware.DeletionTimestamp.IsZero() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps we should log or emit event here?

Comment on lines 87 to 89
logger.Error(err, "Failed to get hardware from Tinkerbell")

return ctrl.Result{}, fmt.Errorf("failed to get hardware from Tinkerbell: %w", err)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ditto. I think controller manager will log the returned error anyway.

client.Client
TemplateClient templateClient
Log logr.Logger
Scheme *runtime.Scheme
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is Scheme field needed here?

Scheme: scheme,
}

got, err := r.reconcileDelete(context.Background(), tt.in)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess we could just call Reconcile here and avoid having this test as internal? As we set deletion timestamp anyway.

*/

// Package controllers contains controllers for Tinkerbell.
package controllers
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this package be called workflow?

"github.com/tinkerbell/cluster-api-provider-tinkerbell/tink/internal/client"
)

func TestHardwareLifecycle(t *testing.T) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be good to break down those super large tests, so we can give them a better name and make them more focused on what we actually test here.

func TestHardwareLifecycle(t *testing.T) {
g := NewWithT(t)
ctx := context.Background()
hardwareClient := client.NewHardwareClient(realHardwareClient(t))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, we have a fake client for Tinkerbell, I guess we can use it here so we don't require real tinkerbell server?

BTW, if we want tests with Tinkerbell server, we can probably borrow the setup from https://github.com/tinkerbell/terraform-provider-tinkerbell/tree/master/test.

@@ -0,0 +1,123 @@
/*
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder, why tink/internal for directory structure and not internal/tink?

Also, maybe tinkerbell instead of tink?

)

type hardwareClient interface {
// Create(ctx context.Context, h *hardware.Hardware) error
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Being able to push hardware to Tinkerbell via K8s API would be very useful for testing the CAPT controllers, as then you can push some test hardware via fake client-go.

Without that, one will need to push the hardware to Tinkerbell fake client directly I think.

@invidian invidian mentioned this pull request Jan 6, 2021
3 tasks
templateClient := client.NewTemplateClient(tinkclient.TemplateClient)
workflowClient := client.NewWorkflowClient(tinkclient.WorkflowClient, hwClient)

stopCh := ctrl.SetupSignalHandler()

if webhookPort == 0 {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should flip the logic here and remove the else, as this no longer make sense.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually addressed in #24. Hope it gets merged before this PR.

templateChan := make(chan event.GenericEvent)
workflowChan := make(chan event.GenericEvent)

if err := mgr.Add(&sources.TinkEventWatcher{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we can add those in the loop to deduplicate the exit logic? I'd also move it to separate function to avoid further extension of main() function, as it should be as minimal as possible.

k8s.io/api v0.17.14
github.com/google/uuid v1.1.2
github.com/onsi/gomega v1.10.1
github.com/tinkerbell/tink v0.0.0-20201210163923-6d9159b63857
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I suppose this version does not have informers implemented yet 😞

@@ -104,7 +116,7 @@ func main() {

flag.Parse()

ctrl.SetLogger(zap.New(zap.UseDevMode(true)))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this needed?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Included committed explicitly in #24. I guess klog is standard logger we should use.

}
}()

w.Status.Data = tinkWorkflow.GetData()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, I guess this is content of rendered workflow, right? If so, it may contain secrets. And it will contain secrets when we use it for CAPT.

Should we then create a secret for each workflow and store the data there? This actually also applies to Templates and hardware selectors :/

"testing"

. "github.com/onsi/gomega"
tinkv1alpha1 "github.com/tinkerbell/cluster-api-provider-tinkerbell/tink/api/v1alpha1"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Module-scoped imports should be separate from 3rd party modules.

@detiber
Copy link
Contributor Author

detiber commented Feb 16, 2021

Closed in favor of #32

@invidian
Copy link
Contributor

@detiber so we close it? :) Has my feedback been addressed in #32 ?

@detiber
Copy link
Contributor Author

detiber commented Feb 16, 2021

@invidian yes, I had intended to close and hit the wrong button. I've created #34 to track making sure feedback on this PR is addressed.

@detiber detiber closed this Feb 16, 2021
@gauravgahlot gauravgahlot mentioned this pull request Mar 5, 2021
@gauravgahlot gauravgahlot mentioned this pull request Mar 23, 2021
3 tasks
@chrisdoherty4 chrisdoherty4 deleted the tinkshim branch March 11, 2024 22:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants