Skip to content

CD design consideration

Ross Tang edited this page Dec 22, 2020 · 9 revisions

Currently, while this repo is targeted to be base library creation, and only CI is made (in form of Github Action). The production environment is created out of the scope of this repo.

For the sake of completeness, here will add a boilerplated Continuous Delivery, for downstream project.

Key Assumption:

  • Good-enough principle: Don't Over-architect it.
  • Public Cloud: It will solely target for public cloud deployment. It will not attempt for on-premise deployment.
  • Speed: CD is made to achieve shortest delivery time. This will NOT cater for requirement, such as manual approval, or human-based promotion.
  • Lowest-cost assumption: Whichever approach is lower cost, should go first.
  • Lowest man-power assumption: No operator is assumed. It should go with auto-scaling, self-healing.
  • Integrate with existing CI pipeline
  • Cloud agnostic: The CD need to be cloud agnostic.

Not explicitly assumed:

  • Security & Compliance: The CD does not ensure to comply with security practice in first delivery.
  • Portability: The CD does not ensure portability to on-premise.
  • Build artifacts: The CD does not persist interim build artefacts, except the produced images.

Must-achieve Goal:

  • The full-stack deployment can be used to run various sorts of tests, including failure tests.
  • When there is new versions in underlying components, like Hyperledger Fabric, and/or fabric-es applications. It can easily deploy, un-deploy, rollback. (DONE)
  • Configuration and secret management is properly implemented (DONE)
  • The CD will cater the need for Prod, and Integration Test, and UAT. (DONE)
  • Use Infra-structure as Code approach, and MUST ensure repeatable immutable deployment (DONE)
  • Will start with GCP, for cost saving reason (DONE)
  • Adopt GitOps approach (DONE)

Nice-to-have Goal:

  • When there is Hyperledger Fabric Network Topology change, it can easily deploy. (DONE)
  • Monitoring and System dashboard is nice-to-have feature. (DONE)
  • Parallel delivery: Multiple-version/multiple-stream/multiple-mode. (Not yet)
  • Adopt Istio service mesh (DONE)
  • Use side-car for logging service (Not yet)
  • Attempt for multi cloud deployment (Not yet)

Technology Suggestion:

  1. Use K8S, Helm, Terraform, Prometheus for most components
  2. Cloud-based image registry, using GCR (DONE).
  3. Further evaluation, log cloud, monitoring cloud
  4. Prefer Declarative rather than Imperative approach; and use immutable infrastructure (DONE)
  5. p.s. Both Ansible and Jenkins are probably unnecessary (Confirmed)
  6. Argo CD

Kubernetes Namespace

  • Namespace will be group by org (DONE)

Persistence Volume

  • Each cluster will have ONE PV (DONE)
  • Will use dynamic provisioning (DONE)

Fabric CA

  • v1.4.7
  • change from sqlite3 to persisted Postgresql (DONE)

Cloud Networking

  • Use L7 LB/nginx ingress controller, but not successful. The "broadcast" and "deliver" blocks does not pass. (DONE)
  • Attempt to use L4 TCP LB, but not successful. (DONE)
  • Using Istio service mesh approach works. "broadcast" and "deliver" also work. However, it introduces new issue with chaincode container. (Solved by external chaincode launcher below). (DONE)
  • Istio Service Mesh is convincing approach. It imposes medium level learning curve. Note that it is not very mature, and the info/support by cloud vendor does not suffice.

External chaincode launcher

(DONE) In CD env, because docker-in-docker chaincode will create orphan container in k8s. This is necessary to change to using external chaincode launcher. And, I find that this is easy to implement, and convenient approach; without seeing any side effect.
For dev-net, it does not matter even creating orphan container; can remove it manually by developer itself. No urgent need to using external chaincode launcher.

GUpload

(Done) In the documented procedure of Hyperledger, there are a number of out-of-band communication process across orgs. In order to maintain the speedy delivery, we cannot afford a slow / manual process of passing the "unsigned block", from source, to the location of signing. In production environment, the running Peer is the only location which give signatures.
The GUpload is a grpc-based uploading utility, which one peer can directly send any files to another Org's uploading area, for him to pick. This is sharing the passage as the connection between orderer-and-peer, like "gossip". This mechanism bypasses the manual step, and send the file almost instantly.

Argo CD

  • (Done) Argo CD is deployed in-cluster. The previous shell-script deployment is migrated to ArgoCD.
  • (Done) Argo CD is integrated with helm-secret and GCP KMS; so that the encrypted 'secrets.yaml' is stored Github.com. And, ArgoCD is able to decrypt before cloud deployment.
  • (Done) All commit is pgp signed
  • (Done) Connect to Github via ssh
  • (Done) ArgoCD Server REST API
  • TLS enabled

Argo Workflow

Argo Workflow is another component to achieve auto-pilot, like:

  • (Done) artifacts from workflow is stored in GCP storage. For example, there are genesis.block and channel.tx
  • (Done) Network bootstrapping - Org0/org1
  • (Done) Argo Server REST API
  • (Done) Workflows are migrated to be WorkflowTemplate
  • (Done) Add new org workflow
  • TLS enabled
  • Upgrade chaincode
  • Backup / Restore

Except the bootstrapping step, the on-going administrative and operational tasks will be performed by Argo, and via Argo Rest API. I plan to remove the use of shell script as much as possible. Also, try to minimise the direct use of "kubectl"

Chaincode Lifecycle

The chaincode manipulate the eventstore. Because it does not carry the organisation / use-case specific domain model information. Therefore, only org1 is necessary to update / upgrade chaincode. The OrgX's is not required.

  • Only Org1 can commit chaincode
  • OrgX simply approve himself to the pre-existing/committed chaincode
  • Therefore, it does not require to gather "Majority" signature

Tools

  • (Done) in-cluster EFK deployment
  • Enhance Fluent Bit with unified logging format

Application Component

  • (Done) rediSearch, gw-orgX, auth-server, ui-control
  • (Done) Expose gw-org1 and ui-org1 via istio to public internet

Next Step

Pre-requisite tasks before starting v0.3

  1. ui-org1 image cannot register new user, seems to be bugs in the image itself. Need to further enhance the UI layer, to adapt the new production graded CD. And, automated UI tester is also required.
  2. Does not know if the gw-org works as expected. In v0.1-dev, the smoke test is made directly on fabric-cqrs. The integration tester does not work yet. Now is awaiting for Paul's robustness test images

Goal for v0.3-dev

  1. Extend to Org2
  2. Multi-organisation tests
  3. Fine tune the resource
  4. Correct formatting in Fluent Bit
  5. enable TLS for Argo and ArgoCD
  6. use Let'sEncrypt to replace self-signed cert