-
Notifications
You must be signed in to change notification settings - Fork 449
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Initial version of Trillian Kubernetes application #302
Conversation
@khajduczenia: Do you have any initial thoughts? |
k8s/trillian/manifest/config.yaml
Outdated
@@ -0,0 +1,15 @@ | |||
apiVersion: v1 | |||
kind: List |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the benefit of using the "kind: List" in the resources definitions? I am afraid that without listing the List type explicitly in the Application resource definition, the resource will not be properly removed when the application is removed from a K8s cluster.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It makes it possible to use YAML anchors to deduplicate labels. They aren't actually used in this file, but they're used in most of the others, so I've used a List here for consistency. Anchors only work within a document, so using the "---" separator instead of List breaks the anchors.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can't add List to the app resource definition because:
- the set_ownership.py script crashes (because it doesn't have metadata, see Fix crash in set_ownership.py when resource has no metadata marketplace-k8s-app-tools#293)
- the deployer fails with
Error from server (Forbidden): lists is forbidden: User "system:serviceaccount:apptest-2e3bcar0:apptest-2e3bcar0-deployer-sa" cannot list lists at the cluster scope
The problem seems to be the assumption that the List exists as a resource in Kubernetes. Instead, I think the Kubernetes server discards the List once it has processed all of the contained resources, so it doesn't need to be annotated with ownership info or deleted at the end.
I think I'll just duplicate the labels since it appears the marketplace tooling isn't able to handle List properly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lists removed.
k8s/trillian/manifest/logserver.yaml
Outdated
app.kubernetes.io/name: "$NAME" | ||
app.kubernetes.io/component: logserver | ||
annotations: | ||
cloud.google.com/load-balancer-type: "Internal" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This type of Service will not be portable across other K8s implementations than GKE. Are there any other requirements limiting the portability of the application to GKE?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't believe so, no. I think this is still compatible with other K8s implementations, you just won't get this feature (which is a nice-to-have, rather than required to use Trillian).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, you are right - if this is not required for Trillian, we are fine with having this annotation here.
k8s/trillian/schema.yaml
Outdated
minimum: 1 | ||
IMAGE_ETCD_OPERATOR: | ||
type: string | ||
default: quay.io/coreos/etcd-operator:v0.9.1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's discuss the option to republish the image on GCR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could this be republished by the click-to-deploy team, or would you prefer the Trillian team to republish this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually the easiest option would be to republish the image just as we do it in the Makefile, as for ubuntu image in Elasticsearch app.
k8s/trillian/schema.yaml
Outdated
title: Etcd cluster size | ||
default: 3 | ||
minimum: 3 | ||
ETCD_VERSION: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we consider having a regex for this field?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've added one to restrict ETCD_VERSION to 3.x.x, since anything other than v3 is likely to be incompatible.
@@ -0,0 +1,176 @@ | |||
# Overview |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you please also add sections about 1) backup & restore, 2) upgrades and 3) uninstallation?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Working on adding these sections now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
.build/trillian/db_server: .build/var/REGISTRY \ | ||
.build/var/TAG \ | ||
| .build/trillian | ||
docker pull gcr.io/trillian-opensource-ci/db_server:$(TAG) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you please take a look at the versioning of images at gcr.io/trillian-opensource-ci? It looks like we have there tags matching the commits, not semantic versions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have both - there's a tag for every commit to master, as well as a tag for every released version.
@RJPercival could you addressed the comments? |
@wgrzelak: Comments addressed. |
…arketplace Configs adapted from those found in: https://github.com/google/trillian/tree/v1.2.1/examples/deployment/kubernetes
No need to pull an image if it is already cached. This is only helpful if the tag being used is "latest", but we shouldn't recommend that.
Should ensure that the cluster grows to accommodate Trillian regardless of the number of replicas chosen (within reason).
Eliminates the need to communicate a GCP service account's private key to the logserver and logsigner containers. Metrics can still be obtained by setting up a Prometheus server and configuring it to monitor the /metrics endpoints.
A machine with more CPUs than default is required to meet the requirements of some pods. A minimum of 3 nodes is required.
Previously, there was a mix of braces and not-braces. This makes it consistent.
This can be overridden using the TAG parameter.
The recommended cluster size is 3-7, but the FAQ gives failure tolerance information up to 9, so allow that large a cluster (see https://coreos.com/etcd/docs/3.2.13/faq.html#deployment).
A pool of 3 n1-standard-2 nodes are required to host Trillian with the default number of replicas.
The etcd-operator only needs to manage an Etcd cluster for the namespace used by Trillian.
Limit the Etcd version to 3.x.x, as anything other than v3 is likely to be incompatible.
Marketplace tools do not work properly with List, so removing it.
This flag has been removed from the binary.
Makes it more resilient to etcd pods restarting.
This contains things like the MySQL root password, which is still useful after uninstallation because the MySQL database is preserved on a persistent volume that isn't deleted by uninstallation. This makes it possible to re-install Trillian without losing access to the database.
The deployer does not wait for etcd-operator to install the CRD for EtcdCluster before trying to use it, and so fails. Unfortunately, by removing EtcdCluster from componentKinds, ownership information will not be setup on the EtcdCluster and so it won't be uninstalled along with Trillian. GoogleCloudPlatform/marketplace-k8s-app-tools#303 tracks a feature request for supporting Kubernetes operators that install a CRD.
…odes This minimises the number that are lost when a node goes down, improving cluster resilience.
It was missing a "v" prefix.
The documentation never mentions how to create an expanded manifest file, so it shouldn't rely on it. Just direct users to use `kubectl delete` instead.
Otherwise, a single node being restarted (e.g. during a Kubernetes upgrade) can kill the cluster via loss of quorum.
The etcd-operator documentation states that 7 is the maximum: https://github.com/coreos/etcd-operator/blob/16f0e1b3693483061f2a8252c0e361b06d216424/pkg/apis/etcd/v1beta2/cluster.go#L71https://godoc.org/github.com/coreos/etcd-operator/pkg/apis/etcd/v1beta2#ClusterSpec
The etcd-operator creates some services on which it isn't possible to set labels, so an extra command is required to delete them.
It looks like there's still a problem with using etcd-operator, in that the first installation on a fresh cluster will fail because the etcd-operator CRD hasn't been installed at the time the deployer is running. A re-installation works, because the CRD is installed at that point. Presumably my only options are to either customise the deployer or wait for GoogleCloudPlatform/marketplace-k8s-app-tools#303 to be addressed? |
It looks like something has gone wrong with some GitHub triggers? |
/gcbrun |
I've fixed the missing db_server image tag that caused the Cloud Build to fail. |
I'm leaving Google tomorrow so I think this is going to be abandoned. |
I've adapted the Kubernetes manifests from https://github.com/google/trillian/tree/master/examples/deployment/kubernetes to meet the GCP Marketplace requirements. Please take a look and let me know what you think. The readme in particular I think needs more work, and
make app/verify
seems to have a problem deleting the resources at the end. However, I've tested the deployed app and it appears to work correctly according to our integration tests.