diff --git a/README.md b/README.md index 271cbbf..664cb5d 100644 --- a/README.md +++ b/README.md @@ -28,10 +28,9 @@ There's a [Troubleshooting Cheat Sheet](resources/troubleshooting_cheat_sheet.md ## Appendices -1. [etcd Scaleup](appendices/01_etcd_scaleup.md) -2. [Monitoring with Prometheus](appendices/02_prometheus.md) -3. [Useful Internet Resources](appendices/03_internet_resources.md) - +1. [Monitoring with Prometheus](appendices/01_prometheus.md) +2. [Useful Internet Resources](appendices/02_internet_resources.md) +3. [Using AWS EFS Storage](appendices/03_aws_storage.md) ## License diff --git a/appendices/01_etcd_scaleup.md b/appendices/01_etcd_scaleup.md deleted file mode 100644 index 8a67b0a..0000000 --- a/appendices/01_etcd_scaleup.md +++ /dev/null @@ -1,48 +0,0 @@ -# Appendix 2: etcd Scaleup - -This appendix is going to show you how to do a scaleup of etcd hosts. - - -## Adapt Inventory - -Uncomment the new etcd hosts in the Ansible inventory in the (`[new_etcd]`) section. -``` -... -[etcd] -master0.user[X].lab.openshift.ch - -[new_etcd] -master1.user[X].lab.openshift.ch -master2.user[X].lab.openshift.ch -... -``` - -## Scaleup - -Execute the playbook responsible for the etcd scaleup: -``` -ansible-playbook /usr/share/ansible/openshift-ansible/playbooks/byo/openshift-etcd/scaleup.yml -``` - - -## Verification and Finalization - -Verify your installation now consists of the original cluster plus one new etcd member: -``` -sudo etcdctl -C https://master0.user[X].lab.openshift.ch:2379 --ca-file=/etc/origin/master/master.etcd-ca.crt --cert-file=/etc/origin/master/master.etcd-client.crt --key-file=/etc/origin/master/master.etcd-client.key cluster-health -``` - -Move the now functional etcd members from the group `[new_etcd]` to `[etcd]` in your Ansible inventory at `/etc/ansible/hosts` so the group looks like: -``` -... -[etcd] -master0.user[X].lab.openshift.ch -master1.user[X].lab.openshift.ch -master2.user[X].lab.openshift.ch -``` - - ---- - -[← back to the labs overview](../README.md) - diff --git a/appendices/01_prometheus.md b/appendices/01_prometheus.md new file mode 100644 index 0000000..4cdb979 --- /dev/null +++ b/appendices/01_prometheus.md @@ -0,0 +1,225 @@ +# Prometheus +Source: https://github.com/prometheus/prometheus + +Visit [prometheus.io](https://prometheus.io) for the full documentation, +examples and guides. + +Prometheus, a [Cloud Native Computing Foundation](https://cncf.io/) project, is a systems and service monitoring system. It collects metrics +from configured targets at given intervals, evaluates rule expressions, +displays the results, and can trigger alerts if some condition is observed +to be true. + +Prometheus' main distinguishing features as compared to other monitoring systems are: + +- a **multi-dimensional** data model (timeseries defined by metric name and set of key/value dimensions) +- a **flexible query language** to leverage this dimensionality +- no dependency on distributed storage; **single server nodes are autonomous** +- timeseries collection happens via a **pull model** over HTTP +- **pushing timeseries** is supported via an intermediary gateway +- targets are discovered via **service discovery** or **static configuration** +- multiple modes of **graphing and dashboarding support** +- support for hierarchical and horizontal **federation** + +## Prometheus overview +The following diagram shows the general architectural overview of Prometheus: + +![Prometheus Architecture](../resources/images/prometheus_architecture.png) + +## Monitoring use cases +Starting with OpenShift 3.11, Prometheus is installed by default to **monitor the OpenShift cluster** (depicted in the diagram below on the left side: *Kubernetes Prometheus deployment*). This installation is managed by the "Cluster Monitoring Operator" and not intended to be customized (we will do it anyway). + +To **monitor applications** or **define custom Prometheus configurations**, the Tech Preview feature [Operator Lifecycle Manager (OLM)](https://docs.openshift.com/container-platform/3.11/install_config/installing-operator-framework.html]) can be used to install the Prometheus Operator which in turn allows to define Prometheus instances (depicted in the diagram below on the right side: *Service Prometheus deployment*). These instances are fully customizable with the use of *Custom Ressource Definitions (CRD)*. + +![Prometheus Overview](../resources/images/prometheus_use-cases.png) + +(source: https://sysdig.com/blog/kubernetes-monitoring-prometheus-operator-part3/) + +# Cluster Monitoring Operator + +![Cluster Monitoring Operator components](../resources/images/prometheus_cmo.png) + + +## Installation + + + +From OpenShift 3.11 onwards, the CMO is installed per default. To customize the installation you can set the following variables in inventory (small cluster) + +```ini +openshift_cluster_monitoring_operator_install=true # default value +openshift_cluster_monitoring_operator_prometheus_storage_enabled=true +openshift_cluster_monitoring_operator_prometheus_storage_capacity=50Gi +openshift_cluster_monitoring_operator_prometheus_storage_class_name=[tbd] +openshift_cluster_monitoring_operator_alertmanager_storage_enabled=true +openshift_cluster_monitoring_operator_alertmanager_storage_capacity=2Gi +openshift_cluster_monitoring_operator_alertmanager_storage_class_name=[tbd] +openshift_cluster_monitoring_operator_alertmanager_config=[tbd] +``` + +Run the installer + +``` +[ec2-user@master0 ~]$ ansible-playbook /usr/share/ansible/openshift-ansible/playbooks/openshift-monitoring/config.yml +``` + +### Access Prometheus + +You can login with the cluster administrator `sheriff` on: +https://prometheus-k8s-openshift-monitoring.app[X].lab.openshift.ch/ + +- Additional targets: `Status` -> `Targets` +- Scrape configuration: `Status` -> `Configuration` +- Defined rules: `Status` -> `Rules` +- Service Discovery: `Status` -> `Service Discovery` + + +### Configure Prometheus +Let Prometheus scrape service labels in different namespaces + +``` +[ec2-user@master0 ~]$ oc adm policy add-cluster-role-to-user cluster-reader -z prometheus-k8s -n openshift-monitoring +``` + +To modify the Prometheus configuration - e.g. retention time, change the ConfigMap `cluster-monitoring-config` as described here: + + +``` +[ec2-user@master0 ~]$ oc edit cm cluster-monitoring-config -n openshift-monitoring +``` + +Unfortunately, changing the default scrape config is not supported with the Cluster Monitoring Operator. + +#### etcd monitoring + +To add etcd monitoring, follow this guide: + + +## Additional services: CRD type ServiceMonitor (unsupported by Red Hat) + +Creating additional ServiceMonitor objects is not supported by Red Hat. See [Supported Configuration](https://docs.openshift.com/container-platform/3.11/install_config/prometheus_cluster_monitoring.html#supported-configuration) for details. + +We will do it anyway :sunglasses:. + +In order for the custom services to be added to the managed Prometheus instance, the label `k8s-app` needs to be present in the "ServiceMonitor" *Custom Ressource (CR)* + +See example for *Service Monitor* `router-metrics`: + +```yaml +apiVersion: monitoring.coreos.com/v1 +kind: ServiceMonitor +metadata: + generation: 1 + labels: + k8s-app: router-metrics + name: router-metrics + namespace: "" +spec: + endpoints: + - bearerTokenFile: /var/run/secrets/kubernetes.io/serviceaccount/token + honorLabels: true + interval: 30s + port: 1936-tcp + scheme: https + tlsConfig: + caFile: /var/run/secrets/kubernetes.io/serviceaccount/service-ca.crt + insecureSkipVerify: true + namespaceSelector: + matchNames: + - default + selector: + matchLabels: + router: router +``` + +### Router Monitoring + +Create the custom cluster role `router-metrics` and add it to the Prometheus service account `prometheus-k8s`, for Prometheus to be able to read the router metrics. +First you need to check, what labels your routers are using. + +``` +[ec2-user@master0 ~]$ oc get endpoints -n default --show-labels +NAME ENDPOINTS AGE LABELS +router 172.31.43.147:1936,172.31.47.59:1936,172.31.47.64:1936 + 6 more... 1h router=router +``` + +Add the `prometheus-k8s` service account to the `router-metrics` cluster role +``` +[ec2-user@master0 ~]$ oc adm policy add-cluster-role-to-user router-metrics system:serviceaccount:openshift-monitoring:prometheus-k8s +``` + +Set the router label as parameter and create the service monitor +``` +[ec2-user@master0 ~]$ oc project openshift-monitoring +[ec2-user@master0 ~]$ oc process -f resource/templates/template-router.yaml -p ROUTER_LABEL="router" | oc apply -f - +``` + +### Logging Monitoring +Just works on clustered ElasticSearch, the OPStechlab runs because of lack of ressources on a single node ES. +The Service `logging-es-prometheus` needs to be labeled and the following RoleBinding applied, for Prometheus to be able to get the metrics. + +``` +[ec2-user@master0 ~]$ oc label svc logging-es-prometheus -n openshift-logging scrape=prometheus +[ec2-user@master0 ~]$ oc create -f resource/templates/template-rolebinding.yaml -n openshift-logging +[ec2-user@master0 ~]$ oc process -f resource/templates/template-logging.yaml | oc apply -f - +``` + +## Additional rules: CRD type PrometheusRule + +Take a look at the additional ruleset, that we suggest to use monitoring OpenShift. +``` +[ec2-user@master0 ~]$ less resource/templates/template-k8s-custom-rules.yaml +``` + +Add the custom rules from the template folder to Prometheus: + +``` +[ec2-user@master0 ~]$ oc process -f resource/templates/template-k8s-custom-rules.yaml -p SEVERITY_LABEL="critical" | oc apply -f - +``` + +## AlertManager + +Configuring Alertmanager with the Red Hat Ansible playbooks. + + +By hand + +``` +[ec2-user@master0 ~]$ oc delete secret alertmanager-main +[ec2-user@master0 ~]$ oc create secret generic alertmanager-main --from-file=resource/templates/alertmanager.yaml +``` + +Follow these guides: + + +Check if the new configuration is in place: https://alertmanager-main-openshift-monitoring.app[X].lab.openshift.ch/#/status + +## Additional configuration + +### Add view role for developers + +Let non OpenShift admins access Prometheus: +``` +[ec2-user@master0 ~]$ oc adm policy add-cluster-role-to-user cluster-monitoring-view [user] +``` + +### Add metrics reader service account to access Prometheus metrics + +You can create a service account to access Prometheus through the API +``` +[ec2-user@master0 ~]$ oc create sa prometheus-metrics-reader -n openshift-monitoring +[ec2-user@master0 ~]$ oc adm policy add-cluster-role-to-user cluster-monitoring-view -z prometheus-metrics-reader -n openshift-monitoring +``` + +Access the API with a simple `curl` +``` +[ec2-user@master0 ~]$ export TOKEN=`oc sa get-token prometheus-metrics-reader -n openshift-monitoring` +[ec2-user@master0 ~]$ curl https://prometheus-k8s-openshift-monitoring.app[X].lab.openshift.ch/api/v1/query?query=ALERTS -H "Authorization: Bearer $TOKEN" +``` + +### Allow Prometheus to scrape your metrics endpoints (if using ovs-networkpolicy plugin) + +Create an additional network-policy. + +``` +[ec2-user@master0 ~]$ oc create -f resource/templates/networkpolicy.yaml -n [namespace] +``` diff --git a/appendices/03_internet_resources.md b/appendices/02_internet_resources.md similarity index 85% rename from appendices/03_internet_resources.md rename to appendices/02_internet_resources.md index 75303e3..e5a6af0 100644 --- a/appendices/03_internet_resources.md +++ b/appendices/02_internet_resources.md @@ -1,4 +1,4 @@ -# Appendix 3: Useful Internet Resources +# Appendix 2: Useful Internet Resources This appendix is a small collection of rather useful online resources containing scripts and documentation as well as Ansible roles and playbooks and more. @@ -7,7 +7,7 @@ This appendix is a small collection of rather useful online resources containing - Red Hat Communities of Practice: https://github.com/redhat-cop - Red Hat Consulting DevOps and OpenShift Playbooks: http://v1.uncontained.io/ - APPUiO OpenShift resources: https://github.com/appuio/ - +- Knowledge Base: https://kb.novaordis.com/index.php/OpenShift --- diff --git a/appendices/02_prometheus.md b/appendices/02_prometheus.md deleted file mode 100644 index a696301..0000000 --- a/appendices/02_prometheus.md +++ /dev/null @@ -1,141 +0,0 @@ -# Appendix 1: Monitoring with Prometheus - -This appendix is going to show you how to install Prometheus on OpenShift 3.7. - - -## Installation - -OpenShift 3.7 was the first release to make it possible to install Prometheus via playbooks. We set the Ansible inventory variables, run the playbook to perform the actual installation and add the following components to the installation: -- Monitor router endpoints -- Deploy node-exporter DaemonSet -- Deploy kube-state-metrics - -Uncomment the following part in your Ansible inventory at `/etc/ansible/hosts`: -``` -openshift_hosted_prometheus_deploy=true -openshift_prometheus_node_selector={"region":"infra"} -openshift_prometheus_additional_rules_file=/usr/share/ansible/prometheus/prometheus_configmap_rules.yaml -``` - -Execute the playbook to install Prometheus: -``` -[ec2-user@master0 ~]$ ansible-playbook /usr/share/ansible/openshift-ansible/playbooks/byo/openshift-cluster/openshift-prometheus.yml -``` - -### Monitor OpenShift routers with Prometheus - -Get the router password for basic authentication to scrape information from the router healthz endpoint: -``` -[ec2-user@master0 ~]$ oc get dc router -n default -o jsonpath='{.spec.template.spec.containers[*].env[?(@.name=="STATS_PASSWORD")].value}{"\n"}' -``` - -Add router scrape configuration and add the output from the command above to `[ROUTER_PW]`: -``` -[ec2-user@master0 ~]$ oc edit configmap prometheus -n openshift-metrics - scrape_configs: -... - - job_name: 'openshift-routers' - metrics_path: '/metrics' - scheme: http - basic_auth: - username: admin - password: [ROUTER_PW] - static_configs: - - targets: ['router.default.svc.cluster.local:1936'] -... - alerting: - alertmanagers: -``` - -### Deploy node-exporter - -Delete the project node-selector, grant prometheus-node-exporter serviceaccount hostaccess and deploy the node-exporter DaemonSet. -``` -[ec2-user@master0 ~]$ oc annotate namespace openshift-metrics openshift.io/node-selector="" --overwrite -[ec2-user@master0 ~]$ oc adm policy add-scc-to-user -z prometheus-node-exporter -n openshift-metrics hostaccess -[ec2-user@master0 ~]$ oc create -f resource/node-exporter.yaml -n openshift-metrics -``` - -Add scrape configuration for node-exporter: -``` -[ec2-user@master0 ~]$ oc edit configmap prometheus -n openshift-metrics - scrape_configs: -... - - job_name: 'node-exporters' - tls_config: - ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt - insecure_skip_verify: true - kubernetes_sd_configs: - - role: node - relabel_configs: - - action: labelmap - regex: __meta_kubernetes_node_label_(.+) - - source_labels: [__meta_kubernetes_role] - action: replace - target_label: kubernetes_role - - source_labels: [__address__] - regex: '(.*):10250' - replacement: '${1}:9100' - target_label: __address__ -... - alerting: - alertmanagers: -``` - -Check port for Prometheus node-exporter: -``` -[ec2-user@master0 ~]$ ansible nodes -m iptables -a "chain=OS_FIREWALL_ALLOW protocol=tcp destination_port=9100 jump=ACCEPT comment=node-exporter" -``` - -### Deploy kube-state-metrics - -Documentation: https://github.com/kubernetes/kube-state-metrics -``` -[ec2-user@master0 ~]$ oc create -f resource/kube-state-metrics.yaml -n openshift-metrics -``` - -Add kube-state-metric scrape configuration. -``` -[ec2-user@master0 ~]$ oc edit configmap prometheus -n openshift-metrics - scrape_configs: -... - - job_name: 'kube-state-metrics' - metrics_path: '/metrics' - scheme: http - static_configs: - - targets: ['kube-state.openshift-metrics.svc.cluster.local:80'] -... - alerting: - alertmanagers: -``` - -### Restart Prometheus - -Delete the Prometheus pod load the changed configuration. -``` -[ec2-user@master0 ~]$ oc get pods -n openshift-metrics -NAME READY STATUS RESTARTS AGE -kube-state-2718312193-kgs9w 1/1 Running 0 24s 10.131.2.14 node4.user8.lab.openshift.ch -prometheus-0 5/5 Running 0 4m 10.129.2.88 node2.user8.lab.openshift.ch -prometheus-node-exporter-22hwn 1/1 Running 0 37s 172.31.39.136 master2.user8.lab.openshift.ch -prometheus-node-exporter-2hq7j 1/1 Running 0 37s 172.31.35.184 node2.user8.lab.openshift.ch -prometheus-node-exporter-2rfj8 1/1 Running 0 37s 172.31.41.6 node1.user8.lab.openshift.ch -prometheus-node-exporter-995tx 1/1 Running 0 37s 172.31.36.128 master0.user8.lab.openshift.ch -prometheus-node-exporter-c4jlz 1/1 Running 0 37s 172.31.46.123 node3.user8.lab.openshift.ch -prometheus-node-exporter-c7v76 1/1 Running 0 37s 172.31.40.35 master1.user8.lab.openshift.ch -prometheus-node-exporter-jk7q7 1/1 Running 0 37s 172.31.43.182 node0.user8.lab.openshift.ch -prometheus-node-exporter-sgpmm 1/1 Running 0 37s 172.31.41.93 node4.user8.lab.openshift.ch - -[ec2-user@master0 ~]$ oc delete pod prometheus-0 -n openshift-metrics -pod "prometheus-0" deleted -``` - -### Access Prometheus - -This creates a new project called `openshift-metrics`. As soon as the pod is running you will be able to access it with the user `cheyenne`. -https://prometheus-openshift-metrics.app[X].lab.openshift.ch/ - ---- - -[← back to the labs overview](../README.md) - diff --git a/appendices/03_aws_storage.md b/appendices/03_aws_storage.md new file mode 100644 index 0000000..5504ac8 --- /dev/null +++ b/appendices/03_aws_storage.md @@ -0,0 +1,156 @@ +# Appendix 3: Using AWS EBS and EFS Storage +This appendix is going to show you how to use AWS EBS and EFS Storage on OpenShift 3.11. + +## Installation +:information_source: To access the efs-storage at aws, you will need an fsid. Please ask your instructor to get one. + +Uncomment the following part in your Ansible inventory and set the fsid: +``` +[ec2-user@master0 ~]$ sudo vi /etc/ansible/hosts +``` + +# EFS Configuration +``` +openshift_provisioners_install_provisioners=True +openshift_provisioners_efs=True +openshift_provisioners_efs_fsid="[provided by instructor]" +openshift_provisioners_efs_region="eu-central-1" +openshift_provisioners_efs_nodeselector={"beta.kubernetes.io/os": "linux"} +openshift_provisioners_efs_aws_access_key_id="[provided by instructor]" +openshift_provisioners_efs_aws_secret_access_key="[provided by instructor]" +openshift_provisioners_efs_supplementalgroup=65534 +openshift_provisioners_efs_path=/persistentvolumes +``` + +For detailed information about provisioners take a look at https://docs.openshift.com/container-platform/3.11/install_config/provisioners.html#provisioners-efs-ansible-variables + +Execute the playbook to install the provisioner: +``` +[ec2-user@master0 ~]$ ansible-playbook /usr/share/ansible/openshift-ansible/playbooks/openshift-provisioners/config.yml +``` + +Check if the pv was created: +``` +[ec2-user@master0 ~]$ oc get pv + +NAME CAPACITY ACCESS MODES RECLAIM POLICY STATUS CLAIM STORAGECLASS REASON AGE +provisioners-efs 1Mi RWX Retain Bound openshift-infra/provisioners-efs 22h +``` + + +:warning: The external provisioner for AWS EFS on OpenShift Container Platform 3.11 is still a Technology Preview feature. +https://docs.openshift.com/container-platform/3.11/install_config/provisioners.html#overview + +#### Create StorageClass + +To enable dynamic provisioning, you need to crate a storageclass: +``` +[ec2-user@master0 ~]$ cat << EOF > aws-efs-storageclass.yaml +kind: StorageClass +apiVersion: storage.k8s.io/v1beta1 +metadata: + name: nfs +provisioner: openshift.org/aws-efs +EOF +[ec2-user@master0 ~]$ oc create -f aws-efs-storageclass.yaml +``` + +Check if the storage class has been created: +``` +[ec2-user@master0 ~]$ oc get sc + +NAME PROVISIONER AGE +glusterfs-storage kubernetes.io/glusterfs 23h +nfs openshift.org/aws-efs 23h +``` + +#### Create PVC + +Now we create a little project and claim a volume from EFS. + +``` +[ec2-user@master0 ~]$ oc new-project quotatest +[ec2-user@master0 ~]$ oc new-app centos/ruby-25-centos7~https://github.com/sclorg/ruby-ex.git +[ec2-user@master0 ~]$ cat << EOF > test-pvc.yaml +apiVersion: v1 +kind: PersistentVolumeClaim +metadata: + name: quotatest +spec: + accessModes: + - ReadWriteOnce + volumeMode: Filesystem + resources: + requests: + storage: 10Mi + storageClassName: nfs +EOF +[ec2-user@master0 ~]$ oc create -f test-pvc.yaml +[ec2-user@master0 ~]$ oc set volume dc/ruby-ex --add --overwrite --name=v1 --type=persistentVolumeClaim --claim-name=quotatest --mount-path=/quotatest +``` + +Check if we can see our pvc: +``` +[ec2-user@master0 ~]$ oc get pvc + +NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE +quotatest Bound pvc-2fa78a43-98ee-11e9-94ce-064eab17d15e 10Mi RWX nfs 17m +``` + +We will now try to write 40Mi in the 10Mi claim to demonstrate, that PVs do not enforce quotas +``` +[ec2-user@master0 ~]$ oc get pods +NAME READY STATUS RESTARTS AGE +ruby-ex-2-zwnws 1/1 Running 0 1h +[ec2-user@master0 ~]$ oc rsh ruby-ex-2-zwnws +$ df -h /quotatest +Filesystem Size Used Avail Use% Mounted on +fs-4f7f2916.efs.eu-central-1.amazonaws.com:/persistentvolumes/provisioners-efs-pvc-2fa78a43-98ee-11e9-94ce-064eab17d15e 8.0E 0 8.0E 0% /quotatest +$ dd if=/dev/urandom of=/quotatest/quota bs=4096 count=10000 +$ $ du -hs /quotatest/ +40M /quotatest/ +``` + +#### Delete EFS Volumes +When you delete the PVC, the PV and the corresponding data gets deleted. +The default RECLAIM POLICY is set to 'Delete': +``` +[ec2-user@master0 ~]$ oc get pv +NAME CAPACITY ACCESS MODES RECLAIM POLICY STATUS CLAIM STORAGECLASS REASON AGE +provisioners-efs 1Mi RWX Retain Bound openshift-infra/provisioners-efs 23m +pvc-2fa78a43-98ee-11e9-94ce-064eab17d15e 10Mi RWX Delete Bound test/provisioners-efs nfs 17m +registry-volume 5Gi RWX Retain Bound default/registry-claim 13m +``` + +Rundown the application and delete the pvc: +``` +[ec2-user@master0 ~]$ oc scale dc/ruby-ex --replicas=0 +[ec2-user@master0 ~]$ oc delete pvc quotatest +``` + +Check if the pv was deleted: +``` +[ec2-user@master0 ~]$ oc get pv +NAME CAPACITY ACCESS MODES RECLAIM POLICY STATUS CLAIM STORAGECLASS REASON AGE +provisioners-efs 1Mi RWX Retain Bound openshift-infra/provisioners-efs 23m +registry-volume 5Gi RWX Retain Bound default/registry-claim 13m +``` + +Check if the efs-provisioner cleans up the NFS Volume: +``` +[ec2-user@master0 ~]$ oc project openshift-infra +[ec2-user@master0 ~]$ oc get pods +NAME READY STATUS RESTARTS AGE +provisioners-efs-1-l75qr 1/1 Running 0 1h +[ec2-user@master0 ~]$ oc rsh provisioners-efs-1-l75qr +sh-4.2# df /persistentvolumes +Filesystem 1K-blocks Used Available Use% Mounted on +fs-4f7f2916.efs.eu-central-1.amazonaws.com:/persistentvolumes 9007199254739968 0 9007199254739968 0% /persistentvolumes +sh-4.2# ls /persistentvolumes +sh-4.2# +``` + +--- + +[← back to the labs overview](../README.md) + diff --git a/labs/11_overview.md b/labs/11_overview.md index 53fe705..35b15a8 100644 --- a/labs/11_overview.md +++ b/labs/11_overview.md @@ -5,18 +5,20 @@ This is the environment we will build and work on. It is deployed on Amazon AWS. ![Lab OpenShift Cluster Overview](../resources/11_ops-techlab.png) Our lab installation consists of the following components: -1. Two Load Balancers - 1. LB app[X]: Used for load balancing requests to the routers (*.app[X].lab.openshift.ch) - 1. LB user[X]-console: Used for load balancing reqeusts to the master APIs (console.user[X].lab.openshift.ch) +1. Three Load Balancers + 1. Application Load Balancer app[X]: Used for load balancing requests to the routers (*.app[X].lab.openshift.ch) + 1. Application Load Balancer console[X]: Used for load balancing reqeusts to the master APIs (console.user[X].lab.openshift.ch) + 1. Classic Load Balancer console[X]-internal: Used for internal load balancing reqeusts to the master APIs (internalconsole.user[X].lab.openshift.ch) 1. Two OpenShift masters, one will be added later - 1. etcd is already installed on all three masters -1. Two infra nodes, where the following components are running: +1. Two etcd, one will be added later +1. Three infra nodes, where the following components are running: 1. Container Native Storage (Gluster) 1. Routers 1. Metrics 1. Logging + 1. Monitoring (Prometheus) 1. One app node, one will be added later -1. For now, we are going to use the first master as a bastion host (bastion.user[X].lab.openshift.ch) +1. We are going to use the jump host as a bastion host (jump.lab.openshift.ch) --- diff --git a/labs/12_access_environment.md b/labs/12_access_environment.md index c74c096..6ffbe25 100644 --- a/labs/12_access_environment.md +++ b/labs/12_access_environment.md @@ -8,7 +8,7 @@ https://console.user[X].lab.openshift.ch ``` to ``` -https://console.user1.lab.openshift.ch +https://console.user[X].lab.openshift.ch ``` diff --git a/labs/21_ansible_inventory.md b/labs/21_ansible_inventory.md index 9967158..fd844c9 100644 --- a/labs/21_ansible_inventory.md +++ b/labs/21_ansible_inventory.md @@ -9,8 +9,8 @@ Take a look at the prepared inventory file: Download the default example hosts file from the OpenShift GitHub repository and compare it to the prepared inventory for the lab. ``` -[ec2-user@master0 ~]$ wget https://raw.githubusercontent.com/openshift/openshift-ansible/release-3.6/inventory/byo/hosts.ose.example -[ec2-user@master0 ~]$ vimdiff hosts.ose.example /etc/ansible/hosts +[ec2-user@master0 ~]$ wget https://raw.githubusercontent.com/openshift/openshift-ansible/release-3.11/inventory/hosts.example +[ec2-user@master0 ~]$ vimdiff hosts.example /etc/ansible/hosts ``` --- diff --git a/labs/22_installation.md b/labs/22_installation.md index 4a7b64c..b832e28 100644 --- a/labs/22_installation.md +++ b/labs/22_installation.md @@ -19,32 +19,42 @@ Now we run the prepare_hosts_for_ose.yml playbook. This will do the following: Run the installation 1. Install OpenShift. This takes a while, get a coffee. ``` -[ec2-user@master0 ~]$ ansible-playbook /usr/share/ansible/openshift-ansible/playbooks/byo/config.yml +[ec2-user@master0 ~]$ ansible-playbook /usr/share/ansible/openshift-ansible/playbooks/prerequisites.yml +[ec2-user@master0 ~]$ ansible-playbook /usr/share/ansible/openshift-ansible/playbooks/deploy_cluster.yml ``` -2. Deploy the OpenShift metrics +2. Add the cluster-admin role to the "sheriff" user. ``` -[ec2-user@master0 ~]$ ansible-playbook /usr/share/ansible/openshift-ansible/playbooks/byo/openshift-cluster/openshift-metrics.yml +[ec2-user@master0 ~]$ oc adm policy --as system:admin add-cluster-role-to-user cluster-admin sheriff ``` -3. Deploy the OpenShift logging +3. Now open your browser and access the master API with the user "sheriff": +``` +https://console.user[X].lab.openshift.ch/console/ ``` -[ec2-user@master0 ~]$ ansible-playbook /usr/share/ansible/openshift-ansible/playbooks/byo/openshift-cluster/openshift-logging.yml +Password is documented in the Ansible inventory: +``` +[ec2-user@master0 ~]$ grep keepass /etc/ansible/hosts ``` -4. Add the cluster-admin role to the "sheriff" user. +4. Deploy the APPUiO openshift-client-distributor. This provides the correct oc client in a Pod and can then be obtained via the OpenShift GUI. For this to work, the Masters must have the package `atomic-openshift-clients-redistributable` installed. In addition the variable `openshift_web_console_extension_script_urls` must be defined in the inventory. ``` -[ec2-user@master0 ~]$ oc adm policy --as system:admin add-cluster-role-to-user cluster-admin sheriff +[ec2-user@master0 ~]$ grep openshift_web_console_extension_script_urls /etc/ansible/hosts +openshift_web_console_extension_script_urls=["https://client.app1.lab.openshift.ch/cli-download-customization.js"] +[ec2-user@master0 ~]$ ansible masters -m shell -a "rpm -qi atomic-openshift-clients-redistributable" ``` -5. Now open your browser and access the master API with the user "sheriff". Password is documented in the Ansible inventory. +Deploy the openshift-client-distributor. ``` -https://console.user[X].lab.openshift.ch/console/ +[ec2-user@master0 ~]$ sudo yum install python-openshift +[ec2-user@master0 ~]$ git clone https://github.com/appuio/openshift-client-distributor +[ec2-user@master0 ~]$ cd openshift-client-distributor +[ec2-user@master0 ~]$ ansible-playbook playbook.yml -e 'openshift_client_distributor_hostname=client.app[X].lab.openshift.ch' ``` -6. You can download the client binary and use it from your local workstation. The binary is available for Linux, macOS and Windows. (optional) +5. You can now download the client binary and use it from your local workstation. The binary is available for Linux, macOS and Windows. (optional) ``` -https://console.user[X].lab.openshift.ch/console/extensions/clients/ +https://console.user[X].lab.openshift.ch/console/command-line ``` --- diff --git a/labs/23_verification.md b/labs/23_verification.md index c97ac7f..e3afa90 100644 --- a/labs/23_verification.md +++ b/labs/23_verification.md @@ -16,18 +16,18 @@ Check if all pvc are bound and glusterfs runs fine [ec2-user@master0 ~]$ oc get pvc --all-namespaces ``` -Check the etcd health status. Do not forget to change the *[X]* of *user[X]* with your number. +Check the etcd health status. ``` -[ec2-user@master0 ~]# sudo etcdctl -C https://master0.user[X].lab.openshift.ch:2379,https://master1.user[X].lab.openshift.ch:2379,https://master2.user[X].lab.openshift.ch:2379 --ca-file=/etc/origin/master/master.etcd-ca.crt --cert-file=/etc/origin/master/master.etcd-client.crt --key-file=/etc/origin/master/master.etcd-client.key cluster-health -member 3f511408a118b9fd is healthy: got healthy result from https://172.31.37.59:2379 -member 50953a25943f54a8 is healthy: got healthy result from https://172.31.35.180:2379 -member ec41afe89f86deaf is healthy: got healthy result from https://172.31.35.199:2379 +[ec2-user@master0 ~]$ sudo -i +[root@master0 ~]# source /etc/etcd/etcd.conf +[root@master0 ~]# etcdctl2 cluster-health +member 16682006866446bb is healthy: got healthy result from https://172.31.45.211:2379 +member 5c619e4b51953519 is healthy: got healthy result from https://172.31.44.160:2379 cluster is healthy -[ec2-user@master0 ~]# sudo etcdctl -C https://master0.user[X].lab.openshift.ch:2379,https://master1.user[X].lab.openshift.ch:2379,https://master2.user[X].lab.openshift.ch:2379 --ca-file=/etc/origin/master/master.etcd-ca.crt --cert-file=/etc/origin/master/master.etcd-client.crt --key-file=/etc/origin/master/master.etcd-client.key member list -3f511408a118b9fd: name=ip-172-31-37-59.eu-central-1.compute.internal peerURLs=https://172.31.37.59:2380 clientURLs=https://172.31.37.59:2379 isLeader=true -50953a25943f54a8: name=master0.user2.lab.openshift.ch peerURLs=https://172.31.35.180:2380 clientURLs=https://172.31.35.180:2379 isLeader=false -ec41afe89f86deaf: name=master1.user2.lab.openshift.ch peerURLs=https://172.31.35.199:2380 clientURLs=https://172.31.35.199:2379 isLeader=false +[root@master0 ~]# etcdctl2 member list +16682006866446bb: name=master1.user7.lab.openshift.ch peerURLs=https://172.31.45.211:2380 clientURLs=https://172.31.45.211:2379 isLeader=false +5c619e4b51953519: name=master0.user7.lab.openshift.ch peerURLs=https://172.31.44.160:2380 clientURLs=https://172.31.44.160:2379 isLeader=true ``` Create a project, run a build, push/pull from the internal registry and deploy a test application. diff --git a/labs/31_user_management.md b/labs/31_user_management.md index bd1964c..85fd02b 100644 --- a/labs/31_user_management.md +++ b/labs/31_user_management.md @@ -8,9 +8,9 @@ Before you begin with this lab, make sure you roughly understand the authorizati ### Add User to Project First we create a user and give him the admin role in the openshift-infra project. -Login to the master and create the local user with ansible on all masters (replace ``````): +Login to the master and create the local user with ansible on all masters (replace ```[password]```): ``` -[ec2-user@master0 ~]$ ansible masters -a "htpasswd -b /etc/origin/master/htpasswd cowboy " +[ec2-user@master0 ~]$ ansible masters -a "htpasswd -b /etc/origin/master/htpasswd cowboy [password]" ``` Add the admin role to the newly created user, but only for the project `openshift-infra`: @@ -79,6 +79,8 @@ Groups can be created manually or synchronized from an LDAP directory. So let's [ec2-user@master0 ~]$ oc login -u sheriff [ec2-user@master0 ~]$ oc adm groups new deputy-sheriffs cowboy +group.user.openshift.io/deputy-sheriffs created +[ec2-user@master0 ~]$ oc get groups NAME USERS deputy-sheriffs cowboy ``` @@ -114,39 +116,15 @@ Who can create configmaps in the `default` project: oc policy who-can create configmaps -n default ``` -You can also get a description of all available clusterPolicies and clusterPoliciesBindings with the following oc command: -``` -[ec2-user@master0 ~]$ oc describe clusterPolicy default -Name: default -Created: 4 hours ago -Labels: -Last Modified: 2015-06-10 17:22:25 +0000 UTC -admin Verbs Resources Resource Names Non-Resource URLs Extension - [create delete get list update watch] [pods/proxy projects resourcegroup:exposedkube resourcegroup:exposedopenshift resourcegroup:granter secrets] [][] - [get list watch] [pods/exec pods/portforward resourcegroup:allkube resourcegroup:allkube-status resourcegroup:allopenshift-status resourcegroup:policy] [][] - [get update] [imagestreams/layers] [][] -basic-user Verbs Resources Resource Names Non-Resource URLs Extension - [get] [users] -... - - -[ec2-user@master0 ~]$ oc describe clusterPolicyBindings :default -Name: :default -Created: 4 hours ago -Labels: -Last Modified: 2015-06-10 17:22:26 +0000 UTC -Policy: -RoleBinding[basic-users]: - Role: basic-user - Users: [] - Groups: [system:authenticated] -RoleBinding[cluster-admins]: - Role: cluster-admin - Users: [] - Groups: [system:cluster-admins] -... +You can also get a description of all available clusterroles and clusterrolebinding with the following oc command: +``` +[ec2-user@master0 ~]$ oc describe clusterrole.rbac ``` +``` +[ec2-user@master0 ~]$ oc describe clusterrolebinding.rbac +``` +https://docs.openshift.com/container-platform/3.11/admin_guide/manage_rbac.html ### Cleanup diff --git a/labs/32_update_hosts.md b/labs/32_update_hosts.md index b597878..ee62528 100644 --- a/labs/32_update_hosts.md +++ b/labs/32_update_hosts.md @@ -27,34 +27,47 @@ These excludes are set by using the OpenShift Ansible playbooks or when using th ### Apply OS Patches to Masters and Nodes +If you don't know if you're cluster-admin or not. +Query all users with rolebindings=cluster-admin: +``` +oc get clusterrolebinding -o json | jq '.items[] | select(.metadata.name | startswith("cluster-admin")) | .userNames' +``` + +Hint: root on master-node always is system:admin (don't use it for ansible-tasks). But you're able to grant permissions to other users. + First, login as cluster-admin and drain the first app-node (this deletes all pods so the OpenShift scheduler creates them on other nodes and also disables scheduling of new pods on the node). ``` [ec2-user@master0 ~]$ oc get nodes [ec2-user@master0 ~]$ oc adm drain app-node0.user[X].lab.openshift.ch --ignore-daemonsets --delete-local-data ``` -After draining a node, only the DaemonSet (`logging-fluentd`) should remain on the node: +After draining a node, only pods from DaemonSets should remain on the node: ``` [ec2-user@master0 ~]$ oc adm manage-node app-node0.user[X].lab.openshift.ch --list-pods + Listing matched pods on node: app-node0.user[X].lab.openshift.ch -NAME READY STATUS RESTARTS AGE -logging-fluentd-s2k2j 1/1 Running 0 1h +NAMESPACE NAME READY STATUS RESTARTS AGE +openshift-logging logging-fluentd-lfjnc 1/1 Running 0 33m +openshift-monitoring node-exporter-czhr2 2/2 Running 0 36m +openshift-node sync-rhh8z 1/1 Running 0 46m +openshift-sdn ovs-hz9wj 1/1 Running 0 46m +openshift-sdn sdn-49tpr 1/1 Running 0 46m ``` Scheduling should now be disabled for this node: ``` [ec2-user@master0 ~]$ oc get nodes ... -app-node0.user[X].lab.openshift.ch Ready,SchedulingDisabled 2d v1.6.1+5115d708d7 +app-node0.user[X].lab.openshift.ch Ready,SchedulingDisabled compute 2d v1.11.0+d4cacc0 ... ``` If everything looks good, you can update the node and reboot it. The first command can take a while and doesn't output anything until it's done: ``` -[ec2-user@master0 ~]$ ansible app_nodes[0] -m yum -a "name='*' state=latest exclude='atomic-openshift-* openshift-* docker-*'" -[ec2-user@master0 ~]$ ansible app_nodes[0] --poll=0 --background=1 -m shell -a 'sleep 2 && reboot' +[ec2-user@master0 ~]$ ansible app-node0.user[X].lab.openshift.ch -m yum -a "name='*' state=latest exclude='atomic-openshift-* openshift-* docker-*'" +[ec2-user@master0 ~]$ ansible app-node0.user[X].lab.openshift.ch --poll=0 --background=1 -m shell -a 'sleep 2 && reboot' ``` After the node becomes ready again, enable schedulable anew. Do not do this before the node has rebooted (it takes a while for the node's status to change to `Not Ready`): @@ -69,8 +82,13 @@ Check that pods are correctly starting: Listing matched pods on node: app-node0.user[X].lab.openshift.ch -NAME READY STATUS RESTARTS AGE -logging-fluentd-s2k2j 1/1 Running 1 1h +NAMESPACE NAME READY STATUS RESTARTS AGE +dakota ruby-ex-1-6lc87 1/1 Running 0 12m +openshift-logging logging-fluentd-lfjnc 1/1 Running 1 43m +openshift-monitoring node-exporter-czhr2 2/2 Running 2 47m +openshift-node sync-rhh8z 1/1 Running 1 56m +openshift-sdn ovs-hz9wj 1/1 Running 1 56m +openshift-sdn sdn-49tpr 1/1 Running 1 56m ``` Since we want to update the whole cluster, **you will need to repeat these steps on all servers**. Masters do not need to be drained because they do not run any pods (unschedulable by default). diff --git a/labs/34_renew_certificates.md b/labs/34_renew_certificates.md index 7d084b5..d79343d 100644 --- a/labs/34_renew_certificates.md +++ b/labs/34_renew_certificates.md @@ -14,12 +14,12 @@ These are the certificates that need to be maintained. For each component there To check all your certificates, run the playbook `certificate_expiry/easy-mode.yaml`: ``` -[ec2-user@master0 ~]$ ansible-playbook /usr/share/ansible/openshift-ansible/playbooks/certificate_expiry/easy-mode.yaml +[ec2-user@master0 ~]$ ansible-playbook /usr/share/ansible/openshift-ansible/playbooks/openshift-checks/certificate_expiry/easy-mode.yaml ``` The playbook will generate the following reports with the information of each certificate in JSON and HTML format: ``` -/tmp/cert-expiry-report.html -/tmp/cert-expiry-report.json +grep -A2 summary $HOME/cert-expiry-report*.json +$HOME/cert-expiry-report*.html ``` @@ -33,42 +33,104 @@ First, we check the current etcd certificates creation time: ``` [ec2-user@master0 ~]$ sudo openssl x509 -in /etc/origin/master/master.etcd-ca.crt -text -noout | grep -i validity -A 2 Validity - Not Before: Mar 23 12:50:41 2018 GMT - Not After : Mar 22 12:50:41 2023 GMT + Not Before: Jun 4 15:45:00 2019 GMT + Not After : Jun 2 15:45:00 2024 GMT + [ec2-user@master0 ~]$ sudo openssl x509 -in /etc/origin/master/master.etcd-client.crt -text -noout | grep -i validity -A 2 Validity - Not Before: Mar 23 12:51:34 2018 GMT - Not After : Mar 22 12:51:35 2020 GMT + Not Before: Jun 4 15:45:00 2019 GMT + Not After : Jun 2 15:45:00 2024 GMT + ``` Note the value for "Validity Not Before:". We will later compare this timestamp with the freshly deployed certificates. Redeploy the CA certificate of the etcd servers: ``` -[ec2-user@master0 ~]$ ansible-playbook /usr/share/ansible/openshift-ansible/playbooks/byo/openshift-cluster/redeploy-etcd-ca.yml +[ec2-user@master0 ~]$ ansible-playbook /usr/share/ansible/openshift-ansible/playbooks/openshift-etcd/redeploy-ca.yml ``` Check the current etcd CA certificate creation time: ``` [ec2-user@master0 ~]$ sudo openssl x509 -in /etc/origin/master/master.etcd-ca.crt -text -noout | grep -i validity -A 2 Validity - Not Before: Mar 26 06:22:41 2018 GMT - Not After : Mar 25 06:22:41 2023 GMT + Not Before: Jun 6 12:58:04 2019 GMT + Not After : Jun 4 12:58:04 2024 GMT + [ec2-user@master0 ~]$ sudo openssl x509 -in /etc/origin/master/master.etcd-client.crt -text -noout | grep -i validity -A 2 Validity - Not Before: Mar 23 12:51:34 2018 GMT - Not After : Mar 22 12:51:35 2020 GMT + Not Before: Jun 4 15:45:00 2019 GMT + Not After : Jun 2 15:45:00 2024 GMT ``` The etcd CA certificate has been generated, but etcd is still using the old server certificates. We will replace them with the `redeploy-etcd-certificates.yml` playbook. **Warning:** This will again lead to a restart of etcd and master services and consequently cause an outage for a few seconds of the OpenShift API. ``` -[ec2-user@master0 ~]$ ansible-playbook /usr/share/ansible/openshift-ansible/playbooks/byo/openshift-cluster/redeploy-etcd-certificates.yml +[ec2-user@master0 ~]$ ansible-playbook /usr/share/ansible/openshift-ansible/playbooks/openshift-etcd/redeploy-certificates.yml ``` Check if the server certificate has been replaced: ``` [ec2-user@master0 ~]$ sudo openssl x509 -in /etc/origin/master/master.etcd-ca.crt -text -noout | grep -i validity -A 2 + Validity + Not Before: Jun 6 12:58:04 2019 GMT + Not After : Jun 4 12:58:04 2024 GMT + [ec2-user@master0 ~]$ sudo openssl x509 -in /etc/origin/master/master.etcd-client.crt -text -noout | grep -i validity -A 2 + Validity + Not Before: Jun 6 13:28:36 2019 GMT + Not After : Jun 4 13:28:36 2024 GMT +``` +### Redeploy nodes Certificates + +1. Create a new bootstrap.kubeconfig for nodes (MASTER nodes will just copy admin.kubeconfig):" +``` +[ec2-user@master0 ~]$ sudo oc serviceaccounts create-kubeconfig node-bootstrapper -n openshift-infra --config /etc/origin/master/admin.kubeconfig > /tmp/bootstrap.kubeconfig +``` + +2. Distribute ~/bootstrap.kubeconfig from step 1 to infra and compute nodes replacing /etc/origin/node/bootstrap.kubeconfig +``` +[ec2-user@master0 ~]$ ansible nodes -m copy -a 'src=/tmp/bootstrap.kubeconfig dest=/etc/origin/node/bootstrap.kubeconfig' +``` + +3. Move node.kubeconfig and client-ca.crt. These will get recreated when the node service is restarted: +``` +[ec2-user@master0 ~]$ ansible nodes -m shell -a 'mv /etc/origin/node/client-ca.crt{,.old}' +[ec2-user@master0 ~]$ ansible nodes -m shell -a 'mv /etc/origin/node/node.kubeconfig{,.old}' +``` +4. Remove contents of /etc/origin/node/certificates/ on app-/infra-nodes: +``` +[ec2-user@master0 ~]$ ansible nodes -m shell -a 'rm -rf /etc/origin/node/certificates' --limit 'nodes:!master*' +``` +5. Restart node service on app-/infra-nodes: +:warning: restart atomic-openshift-node will fail, until CSR's are approved! Approve (Task 6) the CSR's and restart the Services again. +``` +[ec2-user@master0 ~]$ ansible nodes -m service -a "name=atomic-openshift-node state=restarted" --limit 'nodes:!master*' +``` +6. Approve CSRs, 2 should be approved for each node: +``` +[ec2-user@master0 ~]$ oc get csr -o name | xargs oc adm certificate approve +``` +7. Check if the app-/infra-nodes are READY: +``` +[ec2-user@master0 ~]$ oc get node +[ec2-user@master0 ~]$ for i in `oc get nodes -o jsonpath=$'{range .items[*]}{.metadata.name}\n{end}'`; do oc get --raw /api/v1/nodes/$i/proxy/healthz; echo -e "\t$i"; done +``` +8. Remove contents of /etc/origin/node/certificates/ on master-nodes: +``` +[ec2-user@master0 ~]$ ansible masters -m shell -a 'rm -rf /etc/origin/node/certificates' +``` +9. Restart node service on master-nodes: +``` +[ec2-user@master0 ~]$ ansible masters -m service -a "name=atomic-openshift-node state=restarted" +``` +10. Approve CSRs, 2 should be approved for each node: +``` +[ec2-user@master0 ~]$ oc get csr -o name | xargs oc adm certificate approve +``` +11. Check if the master-nodes are READY: +``` +[ec2-user@master0 ~]$ oc get node +[ec2-user@master0 ~]$ for i in `oc get nodes -o jsonpath=$'{range .items[*]}{.metadata.name}\n{end}' | grep master`; do oc get --raw /api/v1/nodes/$i/proxy/healthz; echo -e "\t$i"; done ``` @@ -79,18 +141,23 @@ Use the following playbooks to replace the certificates of the other main compon **Warning:** Do not yet replace the router certificates with the corresponding playbook as it will break your routers running on OpenShift 3.6. If you want to, replace the router certificates after upgrading to OpenShift 3.7. (Reference: https://bugzilla.redhat.com/show_bug.cgi?id=1490186) - masters (API server and controllers) - - /usr/share/ansible/openshift-ansible/playbooks/byo/openshift-cluster/redeploy-master-certificates.yml + - /usr/share/ansible/openshift-ansible/playbooks/openshift-master/redeploy-certificates.yml + - etcd - - /usr/share/ansible/openshift-ansible/playbooks/byo/openshift-cluster/redeploy-etcd-ca.yml - - /usr/share/ansible/openshift-ansible/playbooks/byo/openshift-cluster/redeploy-etcd-certificates.yml -- nodes - - /usr/share/ansible/openshift-ansible/playbooks/byo/openshift-cluster/redeploy-node-certificates.yml + - /usr/share/ansible/openshift-ansible/playbooks/openshift-etcd/redeploy-ca.yml + - /usr/share/ansible/openshift-ansible/playbooks/openshift-etcd/redeploy-certificates.yml + - registry - - /usr/share/ansible/openshift-ansible/playbooks/byo/openshift-cluster/redeploy-registry-certificates.yml + - /usr/share/ansible/openshift-ansible/playbooks/openshift-hosted/redeploy-registry-certificates.yml + - router - - /usr/share/ansible/openshift-ansible/playbooks/byo/openshift-cluster/redeploy-router-certificates.yml - - + - /usr/share/ansible/openshift-ansible/playbooks/openshift-hosted/redeploy-router-certificates.yml + + **Warning:** The documented redeploy-certificates.yml for Nodes doesn't exists anymore! (since 3.10) + This is already reported: Red Hat Bugzilla – Bug 1635251. + Red Hat provided this KCS: https://access.redhat.com/solutions/3782361 + +- nodes (manual steps needed!) --- **End of Lab 3.4** diff --git a/labs/35_add_new_node_and_master.md b/labs/35_add_new_node_and_master.md index 81cd5ac..f4bbdb0 100644 --- a/labs/35_add_new_node_and_master.md +++ b/labs/35_add_new_node_and_master.md @@ -2,8 +2,8 @@ In this lab we will add a new node and a new master to our OpenShift cluster. - -### Add a New Node + +### Lab 3.5.1: Add a New Node Uncomment the new node (`app-node1.user...`) in the Ansible inventory and also uncomment the `new_nodes` group in the "[OSEv3:children]" section. ``` @@ -11,12 +11,12 @@ Uncomment the new node (`app-node1.user...`) in the Ansible inventory and also u ... glusterfs bastion -new_masters +#new_masters new_nodes ... [new_nodes] -app-node1.user[X].lab.openshift.ch openshift_hostname=app-node1.user[X].lab.openshift.ch openshift_public_hostname=app-node1.user[X].lab.openshift.ch openshift_node_labels="{'region': 'primary', 'zone': 'default'}" +app-node1.user7.lab.openshift.ch openshift_node_group_name='node-config-compute' ... ``` @@ -32,7 +32,7 @@ Test the ssh connection and run the pre-install playbook: Now add the new node with the scaleup playbook: ``` -[ec2-user@master0 ~]$ ansible-playbook /usr/share/ansible/openshift-ansible/playbooks/byo/openshift-node/scaleup.yml +[ec2-user@master0 ~]$ ansible-playbook /usr/share/ansible/openshift-ansible/playbooks/openshift-node/scaleup.yml ``` Check if the node is ready: @@ -76,7 +76,8 @@ app-node1.user[X].lab.openshift.ch openshift_hostname=app-node1.user[X].lab.open ... ``` -### Add a New Master + +### Lab 3.5.2: Add a New Master Uncomment the new master inside the Ansible inventory. It needs to be in both the `[new_nodes]` and the `[new_masters]` groups. ``` @@ -104,21 +105,22 @@ Check if the host is accessible and run the pre-install playbook: Now we can add the new master: ``` -[ec2-user@master0 ~]$ ansible-playbook /usr/share/ansible/openshift-ansible/playbooks/byo/openshift-master/scaleup.yml +[ec2-user@master0 ~]$ ansible-playbook /usr/share/ansible/openshift-ansible/playbooks/openshift-master/scaleup.yml ``` Let's check if the node daemon on the new master is ready: ``` [ec2-user@master0 ~]$ oc get nodes -NAME STATUS AGE VERSION -app-node0.user2.lab.openshift.ch Ready 3h v1.6.1+5115d708d7 -app-node1.user2.lab.openshift.ch Ready 14m v1.6.1+5115d708d7 -infra-node0.user2.lab.openshift.ch Ready 4h v1.6.1+5115d708d7 -infra-node1.user2.lab.openshift.ch Ready 4h v1.6.1+5115d708d7 -infra-node2.user2.lab.openshift.ch Ready 4h v1.6.1+5115d708d7 -master0.user2.lab.openshift.ch Ready,SchedulingDisabled 4h v1.6.1+5115d708d7 -master1.user2.lab.openshift.ch Ready,SchedulingDisabled 4h v1.6.1+5115d708d7 -master2.user2.lab.openshift.ch Ready,SchedulingDisabled 1m v1.6.1+5115d708d7 +NAME STATUS ROLES AGE VERSION +app-node0.user7.lab.openshift.ch Ready compute 1d v1.11.0+d4cacc0 +app-node1.user7.lab.openshift.ch Ready compute 1d v1.11.0+d4cacc0 +infra-node0.user7.lab.openshift.ch Ready infra 1d v1.11.0+d4cacc0 +infra-node1.user7.lab.openshift.ch Ready infra 1d v1.11.0+d4cacc0 +infra-node2.user7.lab.openshift.ch Ready infra 1d v1.11.0+d4cacc0 +master0.user7.lab.openshift.ch Ready master 1d v1.11.0+d4cacc0 +master1.user7.lab.openshift.ch Ready master 1d v1.11.0+d4cacc0 +master2.user7.lab.openshift.ch Ready master 6m v1.11.0+d4cacc0 + ``` Check if the old masters see the new one: @@ -214,7 +216,8 @@ This means we now have an empty `[new_nodes]` and `[new_masters]` groups. ``` -### Fix Logging + +### Lab 3.5.3: Fix Logging The default logging stack on OpenShift mainly consists of Elasticsearch, fluentd and Kibana, where fluentd is a DaemonSet. This means that a fluentd pod is automatically deployed on every node, even if scheduling is disabled for that node. The limiting factor for the deployment of DaemonSet pods is the node selector which is set by default to the label `logging-infra-fluentd=true`. The logging playbook attaches this label to all nodes by default, so if you wanted to prevent the deployment of fluentd on certain hosts you had to add the label `logging-infra-fluentd=false` in the inventory. As you may have seen, we do not specify the label specifically in the inventory, which means: - Every node gets the `logging-infra-fluentd=true` attached by the logging playbook @@ -228,7 +231,7 @@ oc get nodes --show-labels Then we correct it either by executing the logging playbook or by manually labelling the nodes with `oc`. Executing the playbook takes quite some time but we leave this choice to you: - So either execute the playbook: ``` -[ec2-user@master0 ~]$ ansible-playbook /usr/share/ansible/openshift-ansible/playbooks/byo/openshift-cluster/openshift-logging.yml +[ec2-user@master0 ~]$ ansible-playbook /usr/share/ansible/openshift-ansible/playbooks/openshift-logging/config.yml ``` - Or label the nodes manually with `oc`: diff --git a/labs/41_out_of_resource_handling.md b/labs/41_out_of_resource_handling.md index e77f735..c6d62a5 100644 --- a/labs/41_out_of_resource_handling.md +++ b/labs/41_out_of_resource_handling.md @@ -21,7 +21,7 @@ An OpenShift node recovers from out of memory conditions by killing containers o The order in which containers and pods are killed is determined by their Quality of Service (QoS) class. The QoS class in turn is defined by resource requests and limits developers configure on their containers. -For more information see [Quality of Service Tiers](https://docs.openshift.com/container-platform/3.6/dev_guide/compute_resources.html#quality-of-service-tiers). +For more information see [Quality of Service Tiers](https://docs.openshift.com/container-platform/3.11/dev_guide/compute_resources.html#quality-of-service-tiers). ### Out of Memory Killer in Action @@ -30,7 +30,7 @@ To observe how the OOM killer in action create a container which allocates all m ``` [ec2-user@master0 ~]$ oc new-project out-of-memory -[ec2-user@master0 ~]$ oc create -f https://raw.githubusercontent.com/appuio/ops-techlab/release-3.6/resources/membomb/pod_oom.yaml +[ec2-user@master0 ~]$ oc create -f https://raw.githubusercontent.com/appuio/ops-techlab/release-3.11/resources/membomb/pod_oom.yaml ``` Wait and watch till the container is up and being killed. `oc get pods -o wide -w` will then show: @@ -100,7 +100,7 @@ Soft evictions allow the threshold to be exceeded for a configurable grace perio To observe a pod eviction create a container which allocates memory till it is being evicted: ``` -[ec2-user@master0 ~]$ oc create -f https://raw.githubusercontent.com/appuio/ops-techlab/release-3.6/resources/membomb/pod_eviction.yaml +[ec2-user@master0 ~]$ oc create -f https://raw.githubusercontent.com/appuio/ops-techlab/release-3.11/resources/membomb/pod_eviction.yaml ``` Wait till the container gets evicted. Run `oc describe pod -l app=membomb` to see the reason for the eviction: @@ -137,16 +137,18 @@ This is usually to low to trigger pod eviction before the OOM killer hits. We re threshold of **500Mi**. If you keep to see lots of OOM killed containers consider increasing the hard eviction threshold or adding a soft eviction threshold. But remember that hard eviction thresholds are subtracted from the nodes allocatable resources. -You can configure reserves and eviction thresholds in the `openshift_node_kubelet_args` key of your Ansible inventory, e.g.: +You can configure reserves and eviction thresholds in the node configuration, e.g.: ``` -openshift_node_kubelet_args='{"kube-reserved":["cpu=200m,memory=1G"],"system-reserved":["cpu=200m,memory=1G"],"eviction-hard":["memory.available<500Mi"]}' +kubeletArguments: + kube-reserved: + - "cpu=200m,memory=512Mi" + system-reserved: + - "cpu=200m,memory=512Mi" ``` -Then run the config playbook to apply the settings to the cluster. - -See [Allocating Node Resources](https://docs.openshift.com/container-platform/3.6/admin_guide/allocating_node_resources.html) -and [Out of Resource Handling](https://docs.openshift.com/container-platform/3.6/admin_guide/out_of_resource_handling.html) for more information. +See [Allocating Node Resources](https://docs.openshift.com/container-platform/3.11/admin_guide/allocating_node_resources.html) +and [Out of Resource Handling](https://docs.openshift.com/container-platform/3.11/admin_guide/out_of_resource_handling.html) for more information. --- diff --git a/labs/42_outgoing_http_proxies.md b/labs/42_outgoing_http_proxies.md index 00311da..b8cd221 100644 --- a/labs/42_outgoing_http_proxies.md +++ b/labs/42_outgoing_http_proxies.md @@ -108,7 +108,9 @@ If you use Java base images other than the ones provided by Red Hat you have to To apply the outgoing HTTP proxy configuration to the cluster you have to run the master and node config playbooks: ``` -[ec2-user@master0 ~]$ ansible-playbook --tags master,node /usr/share/ansible/openshift-ansible/playbooks/byo/config.yml +[ec2-user@master0 ~]$ ansible-playbook /usr/share/ansible/openshift-ansible/playbooks/openshift-node/bootstrap.yml +[ec2-user@master0 ~]$ ansible-playbook /usr/share/ansible/openshift-ansible/playbooks/openshift-master/config.yml +[ec2-user@master0 ~]$ ansible-playbook /usr/share/ansible/openshift-ansible/playbooks/openshift-master/additional_config.yml ``` diff --git a/labs/51_backup.md b/labs/51_backup.md index 41733e2..5953d02 100644 --- a/labs/51_backup.md +++ b/labs/51_backup.md @@ -2,6 +2,7 @@ In this techlab you will learn how to create a new backup and which files are important. The following items should be backuped: +- Cluster data files - etcd data on each master - API objects (stored in etcd, but it's a good idea to regularly export all objects) - Docker registry storage @@ -10,56 +11,118 @@ In this techlab you will learn how to create a new backup and which files are im - Ansible hosts file -### Master Backup Files +### Lab 5.1.1: Master Backup Files The following files should be backuped on all masters: - Ansible inventory file (contains information about the cluster): `/etc/ansible/hosts` - Configuration files (for the master), certificates and htpasswd: `/etc/origin/master/` +- Docker configurations: `/etc/sysconfig/docker` `/etc/sysconfig/docker-network` `/etc/sysconfig/docker-storage` - -### Node Backup Files +### Lab 5.1.2: Node Backup Files Backup the following folders on all nodes: - Node Configuration files: `/etc/origin/node/` - Certificates for the docker-registry: `/etc/docker/certs.d/` +- Docker configurations: `/etc/sysconfig/docker` `/etc/sysconfig/docker-network` `/etc/sysconfig/docker-storage` - -### Application Backup +### Lab 5.1.3: Application Backup To backup the data in persistent volumes, you should mount them somewhere. If you mount a Glusterfs volume, it is guaranteed to be consistent. The bricks directly on the Glusterfs servers can contain small inconsistencies that Glusterfs hasn't synced to the other instances yet. -### Project Backup +### Lab 5.1.4: Project Backup It is advisable to regularly backup all project data. -The following script on the first master will export all the OpenShift API Objects (in json) of all projects and save them to the filesystem. +We will set up a cronjob in a project called "project-backup" which hourly writes all resources on OpenShift to a PV. +Let's gather the backup-script: +``` +[ec2-user@master0 ~]$ sudo yum install git python-openshift -y +[ec2-user@master0 ~]$ git clone https://github.com/mabegglen/openshift-project-backup ``` -[ec2-user@master0 ~]$ /home/ec2-user/resource/openshift-project-backup.sh -[ec2-user@master0 ~]$ ls -al /home/ec2-user/openshift_backup_*/projects +Now we create the cronjob on the first master: ``` +[ec2-user@master0 ~]$ cd openshift-project-backup +[ec2-user@master0 ~]$ ansible-playbook playbook.yml \ +-e openshift_project_backup_job_name="cronjob-project-backup" \ +-e "openshift_project_backup_schedule=\"0 6,18 * * *\"" \ +-e openshift_project_backup_job_service_account="project-backup" \ +-e openshift_project_backup_namespace="project-backup" \ +-e openshift_project_backup_image="registry.access.redhat.com/openshift3/jenkins-slave-base-rhel7" \ +-e openshift_project_backup_image_tag="v3.11" \ +-e openshift_project_backup_storage_size="1G" \ +-e openshift_project_backup_deadline="3600" \ +-e openshift_project_backup_cronjob_api="batch/v1beta1" +``` +Details https://github.com/mabegglen/openshift-project-backup +If you want to reschedule your backup-job to check it's functionality to every 1minute: -### Create etcd Backup +Change the value of schedule: to "*/1 * * * *" +``` +[ec2-user@master0 ~]$ oc project project-backup +[ec2-user@master0 ~]$ oc get cronjob +[ec2-user@master0 ~]$ oc edit cronjob cronjob-project-backup +``` -To ensure a consistent etcd backup, we need to stop the daemon. Since there are 3 etcd servers, there is no downtime. All the new data that gets written during this period gets synced after the etcd daemon is started again. +Show if cronjob is active: ``` -[ec2-user@master0 ~]$ sudo systemctl stop etcd.service -[ec2-user@master0 ~]$ sudo etcdctl backup --data-dir /var/lib/etcd/ --backup-dir etcd.bak -[ec2-user@master0 ~]$ sudo cp /var/lib/etcd/member/snap/db etcd.bak/member/snap/ -[ec2-user@master0 ~]$ sudo systemctl start etcd.service +[ec2-user@master0 openshift-project-backup]$ oc get cronjob +NAME SCHEDULE SUSPEND ACTIVE LAST SCHEDULE AGE +cronjob-project-backup */1 * * * * False 1 1m 48m ``` -Check if the etcd cluster is healthy. +Show if backup-pod was launched: ``` -[ec2-user@master0 ~]$ sudo etcdctl -C https://master0.user[X].lab.openshift.ch:2379,https://master1.user[X].lab.openshift.ch:2379,https://master2.user[X].lab.openshift.ch:2379 --ca-file=/etc/etcd/ca.crt --cert-file=/etc/etcd/peer.crt --key-file=/etc/etcd/peer.key cluster-health -member 3f511408a118b9fd is healthy: got healthy result from https://172.31.37.59:2379 -member 50953a25943f54a8 is healthy: got healthy result from https://172.31.35.180:2379 -member ec41afe89f86deaf is healthy: got healthy result from https://172.31.35.199:2379 -cluster is healthy +[ec2-user@master0 openshift-project-backup]$ oc get pods +NAME READY STATUS RESTARTS AGE +cronjob-project-backup-1561384620-kjm6v 1/1 Running 0 47s + ``` +Check the logfiles while backup-job is running: +``` +[ec2-user@master0 openshift-project-backup]$ oc logs -f +``` +When your Backupjob runs as expected, don't forget to set up the cronjob back to "0 22 * * *" for example. +``` +[ec2-user@master0 ~]$ oc edit cronjob cronjob-project-backup +``` +If you wanna Restore a project, proceed to [Lab 5.2.1](52_restore.md#5.2.1) + + +### Lab 5.1.5: Create etcd Backup +We plan to create a Backup of our etcd. When we've created our backup, we wan't to restore them on master1/master2 and scale out from 1 to 3 nodes. + +First we create a snapshot of our etcd cluster: +``` +[root@master0 ~]# export ETCD_POD_MANIFEST="/etc/origin/node/pods/etcd.yaml" +[root@master0 ~]# export ETCD_EP=$(grep https ${ETCD_POD_MANIFEST} | cut -d '/' -f3) +[root@master0 ~]# export ETCD_POD=$(oc get pods -n kube-system | grep -o -m 1 '\S*etcd\S*') +[root@master0 ~]# oc project kube-system +Now using project "kube-system" on server "https://internalconsole.user[x].lab.openshift.ch:443". +[root@master0 ~]# oc exec ${ETCD_POD} -c etcd -- /bin/bash -c "ETCDCTL_API=3 etcdctl \ + --cert /etc/etcd/peer.crt \ + --key /etc/etcd/peer.key \ + --cacert /etc/etcd/ca.crt \ + --endpoints $ETCD_EP \ + snapshot save /var/lib/etcd/snapshot.db" + + Snapshot saved at /var/lib/etcd/snapshot.db +``` +Check Filesize of the snapshot created: +``` +[root@master0 ~]# ls -hl /var/lib/etcd/snapshot.db +-rw-r--r--. 1 root root 21M Jun 24 16:44 /var/lib/etcd/snapshot.db +``` + +copy them to the tmp directory for further use: +``` +[root@master0 ~]# cp /var/lib/etcd/snapshot.db /tmp/snapshot.db +[root@master0 ~]# cp /var/lib/etcd/member/snap/db /tmp/db +``` +If you wanna Restore an etcd, proceed to [Lab 5.2.2](52_restore.md#5.2.2) --- diff --git a/labs/52_restore.md b/labs/52_restore.md index bcf4ddf..6c3c4cf 100644 --- a/labs/52_restore.md +++ b/labs/52_restore.md @@ -1,142 +1,149 @@ ## Lab 5.2: Restore -### Restore a Project + +### Lab 5.2.1: Restore a Project -We will now delete the logging project and try to restore it from the backup. +We will now delete the initially created `dakota` project and try to restore it from the backup. ``` -[ec2-user@master0 ~]$ oc delete project logging +[ec2-user@master0 ~]$ oc delete project dakota ``` Check if the project is being deleted ``` -[ec2-user@master0 ~]$ oc get project logging +[ec2-user@master0 ~]$ oc get project dakota ``` -Restore the logging project from the backup. Some objects still exist, because they are not namespaced and therefore not deleted. You will see during the restore, that these object will not be replaced. +Restore the dakota project from the backup. ``` -[ec2-user@master0 ~]$ oc adm new-project logging --node-selector="" -[ec2-user@master0 ~]$ oc project logging - -[ec2-user@master0 ~]$ oc create -f /home/ec2-user/openshift_backup_[date]/projects/logging/serviceaccount.json -[ec2-user@master0 ~]$ oc create -f /home/ec2-user/openshift_backup_[date]/projects/logging/secret.json -[ec2-user@master0 ~]$ oc create -f /home/ec2-user/openshift_backup_[date]/projects/logging/configmap.json -[ec2-user@master0 ~]$ oc create -f /home/ec2-user/openshift_backup_[date]/projects/logging/rolebindings.json -[ec2-user@master0 ~]$ oc create -f /home/ec2-user/openshift_backup_[date]/projects/logging/project.json -[ec2-user@master0 ~]$ oc create -f /home/ec2-user/openshift_backup_[date]/projects/logging/daemonset.json +[ec2-user@master0 ~]$ oc new-project dakota +[ec2-user@master0 ~]$ oc project project-backup +[ec2-user@master0 ~]$ oc debug `oc get pods -o jsonpath='{.items[*].metadata.name}' | awk '{print $1}'` +sh-4.2# tar -xvf /backup/backup-201906131343.tar.gz -C /tmp/ +sh-4.2# oc apply -f /tmp/dakota/ ``` -Scale the logging components. +Start build and push image to registry ``` -[ec2-user@master0 ~]$ oc get dc -NAME REVISION DESIRED CURRENT TRIGGERED BY -logging-curator 5 1 1 config -logging-es-a4nhrowo 5 1 1 config -logging-kibana 7 1 0 config - - -[ec2-user@master0 ~]$ oc scale dc logging-kibana --replicas=0 -[ec2-user@master0 ~]$ oc scale dc logging-curator --replicas=0 -[ec2-user@master0 ~]$ oc scale dc logging-es-[HASH] --replicas=0 -[ec2-user@master0 ~]$ oc scale dc logging-kibana --replicas=1 -[ec2-user@master0 ~]$ oc scale dc logging-curator --replicas=1 -[ec2-user@master0 ~]$ oc scale dc logging-es-[HASH] --replicas=1 +[ec2-user@master0 ~]$ oc start-build ruby-ex -n dakota ``` -Check if the pods are coming up again +Check whether the pods become ready again. ``` -[ec2-user@master0 ~]$ oc get pods -w +[ec2-user@master0 ~]$ oc get pods -w -n dakota ``` -If all the pods are ready, Kibana should be receiving logs again. + +### Lab 5.2.2: Restore the etcd Cluster ### + +:warning: Before you proceed, make sure you've already added master2 [Lab 3.5.2](35_add_new_node_and_master.md#3.5.2) + +copy the snapshot to the master1.user[x].lab.openshift.ch ``` -https://logging.app[X].lab.openshift.ch +[ec2-user@master0 ~]$ userid=[x] +[ec2-user@master0 ~]$ scp /tmp/snapshot.db master1.user$userid.lab.openshift.ch:/tmp/snapshot.db +[ec2-user@master0 ~]$ ansible etcd -m service -a "name=atomic-openshift-node state=stopped" +[ec2-user@master0 ~]$ ansible etcd -m service -a "name=docker state=stopped" +[ec2-user@master0 ~]$ ansible etcd -a "rm -rf /var/lib/etcd" +[ec2-user@master0 ~]$ ansible etcd -a "mv /etc/etcd/etcd.conf /etc/etcd/etcd.conf.bak" ``` +switch to user root and restore the etc-database -### Restore the etcd Cluster - -First, we need to stop all etcd: +:warning: run this task on ALL Masters (master0,master1) ``` -[ec2-user@master0 ~]$ ansible etcd -m service -a "name=etcd state=stopped" +[ec2-user@master0 ~]$ sudo -i +[root@master0 ~]# yum install etcd-3.2.22-1.el7.x86_64 +[root@master0 ~]# rmdir /var/lib/etcd +[root@master0 ~]# mv /etc/etcd/etcd.conf.bak /etc/etcd/etcd.conf +[root@master0 ~]# source /etc/etcd/etcd.conf +[root@master0 ~]# export ETCDCTL_API=3 +[root@master0 ~]# ETCDCTL_API=3 etcdctl snapshot restore /tmp/snapshot.db \ + --name $ETCD_NAME \ + --initial-cluster $ETCD_INITIAL_CLUSTER \ + --initial-cluster-token $ETCD_INITIAL_CLUSTER_TOKEN \ + --initial-advertise-peer-urls $ETCD_INITIAL_ADVERTISE_PEER_URLS \ + --data-dir /var/lib/etcd +[root@master0 ~]# restorecon -Rv /var/lib/etcd ``` -The cluster is now down and you can't get any resources through the console. We are now copying the files back from the backup and set the right permissions. +As we have restored the etcd on all masters we should be able to start the services: ``` -[ec2-user@master0 ~]$ ETCD_DIR=/var/lib/etcd/ -[ec2-user@master0 ~]$ sudo mv $ETCD_DIR /var/lib/etcd.orig -[ec2-user@master0 ~]$ sudo cp -Rp etcd.bak $ETCD_DIR -[ec2-user@master0 ~]$ sudo chcon -R --reference /var/lib/etcd.orig/ $ETCD_DIR -[ec2-user@master0 ~]$ sudo chown -R etcd:etcd $ETCD_DIR +[ec2-user@master0 ~]$ ansible etcd -m service -a "name=docker state=started" +[ec2-user@master0 ~]$ ansible etcd -m service -a "name=atomic-openshift-node state=started" ``` -Add the "--force-new-cluster" parameter to the etcd unit file, start etcd and check if it's running. This is needed, because initially it will create a new cluster with the existing data from the backup. +#### Check ectd-clusther health #### ``` -[ec2-user@master0 ~]$ sudo cp /usr/lib/systemd/system/etcd.service /etc/systemd/system -[ec2-user@master0 ~]$ sudo sed -i '/ExecStart/s/"$/ --force-new-cluster"/' /etc/systemd/system/etcd.service -[ec2-user@master0 ~]$ sudo systemctl daemon-reload -[ec2-user@master0 ~]$ sudo systemctl start etcd -[ec2-user@master0 ~]$ sudo systemctl status etcd +[root@master0 ~]# ETCD_ALL_ENDPOINTS=` etcdctl3 --write-out=fields member list | awk '/ClientURL/{printf "%s%s",sep,$3; sep=","}'` +[root@master0 ~]# etcdctl3 --endpoints=$ETCD_ALL_ENDPOINTS endpoint status --write-out=table +[root@master0 ~]# etcdctl3 --endpoints=$ETCD_ALL_ENDPOINTS endpoint health ``` -The cluster is now initialized, so we need to remove the "--force-new-cluster" parameter again and restart etcd. +### Scale up the etcd Cluster ### +Add the third etcd master2.user[X].lab.openshift.ch to the etcd cluster +We add the 3rd Node (master2) by adding it to the [new_etcd] group and activate this group by uncommenting it: ``` -[ec2-user@master0 ~]$ sudo rm /etc/systemd/system/etcd.service -[ec2-user@master0 ~]$ sudo systemctl daemon-reload -[ec2-user@master0 ~]$ sudo systemctl restart etcd -[ec2-user@master0 ~]$ sudo systemctl status etcd +[OSEv3:children] +... +new_etcd + +[new_etcd] +master2.user[X].lab.openshift.ch ``` -Check if etcd is healthy and check if "/openshift.io" exists in etcd +:warning: the scaleup-playbook provided by redhat doesn't restart the masters seamlessly. If you have to scaleup in production, please do this in a maintenance window. + +Run the scaleup-Playbook to scaleup the etcd-cluster: + ``` -[ec2-user@master0 ~]$ sudo etcdctl -C https://master0.user[X].lab.openshift.ch:2379 --ca-file=/etc/etcd/ca.crt --cert-file=/etc/etcd/peer.crt --key-file=/etc/etcd/peer.key cluster-health -member 92c764a37c90869 is healthy: got healthy result from https://127.0.0.1:2379 -cluster is healthy -[ec2-user@master0 ~]$ sudo etcdctl -C https://master0.user[X].lab.openshift.ch:2379 --ca-file=/etc/etcd/ca.crt --cert-file=/etc/etcd/peer.crt --key-file=/etc/etcd/peer.key ls / -/openshift.io +[ec2-user@master0 ~]$ ansible-playbook /usr/share/ansible/openshift-ansible/playbooks/openshift-etcd/scaleup.yml ``` -We need to change the peerURL of the etcd to it's private ip. Make sure to correctly copy the **member_id** and **private_ip**. +#### Check ectd-clusther health #### +``` +[root@master0 ~]# ETCD_ALL_ENDPOINTS=` etcdctl3 --write-out=fields member list | awk '/ClientURL/{printf "%s%s",sep,$3; sep=","}'` +[root@master0 ~]# etcdctl3 --endpoints=$ETCD_ALL_ENDPOINTS endpoint status --write-out=table +[root@master0 ~]# etcdctl3 --endpoints=$ETCD_ALL_ENDPOINTS endpoint health ``` -[ec2-user@master0 ~]$ sudo etcdctl -C https://master0.user[X].lab.openshift.ch:2379 --ca-file=/etc/etcd/ca.crt --cert-file=/etc/etcd/peer.crt --key-file=/etc/etcd/peer.key member list -[member_id]: name=master0.user[X].lab.openshift.ch peerURLs=https://localhost:2380 clientURLs=https://[private_ip]:2379 isLeader=true -[ec2-user@master0 ~]$ sudo etcdctl -C https://master0.user[X].lab.openshift.ch:2379 --ca-file=/etc/etcd/ca.crt --cert-file=/etc/etcd/peer.crt --key-file=/etc/etcd/peer.key member update [member_id] https://[private_ip]:2380 -Updated member with ID [member_id] in cluster +:information_source: don't get confused by the 4 entries. Master0 will show up twice with the same id -[ec2-user@master0 ~]$ sudo etcdctl -C https://master0.user[X].lab.openshift.ch:2379 --ca-file=/etc/etcd/ca.crt --cert-file=/etc/etcd/peer.crt --key-file=/etc/etcd/peer.key member list -[member_id]: name=master0.user[X].lab.openshift.ch peerURLs=https://172.31.46.201:2380 clientURLs=https://172.31.46.201:2379 isLeader=true -``` +You should now get an output like this. -Add the second etcd `master1.user[X].lab.openshift.ch` to the etcd cluster ``` -[ec2-user@master0 ~]$ sudo etcdctl -C https://master0.user[X].lab.openshift.ch:2379 --ca-file=/etc/origin/master/master.etcd-ca.crt --cert-file=/etc/origin/master/master.etcd-client.crt --key-file=/etc/origin/master/master.etcd-client.key member add master1.user[X].lab.openshift.ch https://[IP_OF_MASTER1]:2380 -Added member named master1.user[X].lab.openshift.ch with ID aadb46077a7f58a to cluster ++---------------------------------------------+------------------+---------+---------+-----------+-----------+------------+ +| ENDPOINT | ID | VERSION | DB SIZE | IS LEADER | RAFT TERM | RAFT INDEX | ++---------------------------------------------+------------------+---------+---------+-----------+-----------+------------+ +| https://master0.user1.lab.openshift.ch:2379 | a8e78dd0690640cb | 3.2.22 | 26 MB | false | 2 | 9667 | +| https://172.31.42.95:2379 | 1ab823337d6e84bf | 3.2.22 | 26 MB | false | 2 | 9667 | +| https://172.31.38.22:2379 | 56f5e08139a21df3 | 3.2.22 | 26 MB | true | 2 | 9667 | +| https://172.31.46.194:2379 | a8e78dd0690640cb | 3.2.22 | 26 MB | false | 2 | 9667 | ++---------------------------------------------+------------------+---------+---------+-----------+-----------+------------+ -ETCD_NAME="master1.user[X].lab.openshift.ch" -ETCD_INITIAL_CLUSTER="master0.user[X].lab.openshift.ch=https://172.31.37.65:2380,master1.user[X].lab.openshift.ch=https://172.31.32.131:2380" -ETCD_INITIAL_CLUSTER_STATE="existing" +https://172.31.46.194:2379 is healthy: successfully committed proposal: took = 2.556091ms +https://172.31.42.95:2379 is healthy: successfully committed proposal: took = 2.018976ms +https://master0.user1.lab.openshift.ch:2379 is healthy: successfully committed proposal: took = 2.639024ms +https://172.31.38.22:2379 is healthy: successfully committed proposal: took = 1.666699ms ``` -Login to `master1.user[X].lab.openshift.ch` and edit the etcd configuration file using the environment variables provided above. Then remove the etcd data directory and restart etcd. -``` -[ec2-user@master1 ~]$ sudo vi /etc/etcd/etcd.conf -[ec2-user@master1 ~]$ sudo rm -rf /var/lib/etcd/member -[ec2-user@master1 ~]$ sudo systemctl restart etcd -``` +#### move new etcd-member in /etc/ansible/hosts #### + +Move the now functional etcd members from the group `[new_etcd]` to `[etcd]` in your Ansible inventory at `/etc/ansible/hosts` so the group looks like: -Login to `master0.user[X].lab.openshift.ch` again and check the etcd cluster health. -``` -[ec2-user@master0 ~]$ sudo etcdctl -C https://master0.user[X].lab.openshift.ch:2379,https://master1.user[X].lab.openshift.ch:2379 --ca-file=/etc/origin/master/master.etcd-ca.crt --cert-file=/etc/origin/master/master.etcd-client.crt --key-file=/etc/origin/master/master.etcd-client.key member list -633a80df3001: name=master0.user[X].lab.openshift.ch peerURLs=https://172.31.37.65:2380 clientURLs=https://172.31.37.65:2379 isLeader=true -aadb46077a7f58a: name=master1.user[X].lab.openshift.ch peerURLs=https://172.31.32.131:2380 clientURLs=https://172.31.32.131:2379 isLeader=false -[ec2-user@master0 ~]$ sudo etcdctl -C https://master0.user[X].lab.openshift.ch:2379,https://master1.user[X].lab.openshift.ch:2379 --ca-file=/etc/origin/master/master.etcd-ca.crt --cert-file=/etc/origin/master/master.etcd-client.crt --key-file=/etc/origin/master/master.etcd-client.key cluster-health -member 633a80df3001 is healthy: got healthy result from https://172.31.37.65:2379 -member aadb46077a7f58a is healthy: got healthy result from https://172.31.32.131:2379 -cluster is healthy ``` +... +#new_etcd + +#[new_etcd] -Try to restore the last etcd on master2.user[X] the same way you did for master1.user[X]. +... + +[etcd] +master0.user[X].lab.openshift.ch +master1.user[X].lab.openshift.ch +master2.user[X].lab.openshift.ch +``` --- diff --git a/labs/61_monitoring.md b/labs/61_monitoring.md index a788643..fd49d84 100644 --- a/labs/61_monitoring.md +++ b/labs/61_monitoring.md @@ -18,19 +18,19 @@ In order to answer this first question, we check the state of different vital co Check the masters' health state with a HTTP request: ``` -$ curl -v https://console.user[X].lab.openshift.ch/healthz +[ec2-user@master0 ~]$ curl -v https://console.user[X].lab.openshift.ch/healthz ``` As long as the response is a 200 status code at least one of the masters is still working and the API is accessible via Load Balancer (if there is one). **etcd** also exposes a similar health endpoint at https://`openshift_master_cluster_public_hostname`:2379/health, though it is only accessible using the client certificate and corresponding key stored on the masters at `/etc/origin/master/master.etcd-client.crt` and `/etc/origin/master/master.etcd-client.key`. ``` -$ sudo curl --cacert /etc/origin/master/master.etcd-ca.crt --cert /etc/origin/master/master.etcd-client.crt --key /etc/origin/master/master.etcd-client.key https://master0.user[X].lab.openshift.ch:2379/health +[ec2-user@master0 ~]$ sudo curl --cacert /etc/origin/master/master.etcd-ca.crt --cert /etc/origin/master/master.etcd-client.crt --key /etc/origin/master/master.etcd-client.key https://master0.user[X].lab.openshift.ch:2379/health ``` The **HAProxy router pods** are responsible for getting application traffic into OpenShift. Similar to the masters, HAProxy also exposes a /healthz endpoint on port 1936 which can be checked with e.g.: ``` -$ curl -v http://router.app[X].lab.openshift.ch:1936/healthz +[ec2-user@master0 ~]$ curl -v http://router.app[X].lab.openshift.ch:1936/healthz ``` Using the wildcard domain to access a router's health page results in a positive answer if at least one router is up and running and that's all we want to know right now. @@ -48,19 +48,19 @@ First, let's look at how to use above checks to answer this second question. The health endpoint exposed by **masters** was accessed via load balancer in the first category in order to find out if the API is generally available. This time however we want to find out if at least one of the master APIs is unavailable, even if there still are some that are accessible. So we check every single master endpoint directly instead of via load balancer: ``` -$ for i in {0..2}; do curl -v https://master${i}.user[X].lab.openshift.ch/healthz; done +[ec2-user@master0 ~]$ for i in {0..2}; do curl -v https://master${i}.user[X].lab.openshift.ch/healthz; done ``` The **etcd** check above is already run against single members of the cluster and can therefore be applied here in the exact same form. The difference only is that we want to make sure every single member is running, not just the number needed to have quorum. The approach used for the masters also applies to the **HAProxy routers**. A router pod is effectively listening on the node's interface it is running on. So instead of connecting via load balancer, we use the nodes' IP addresses the router pods are running on. In our case, these are nodes 0 and 1: ``` -$ for i in {0..2}; do curl -v http://infra-node${i}.user[X].lab.openshift.ch:1936/healthz; done +[ec2-user@master0 ~]$ for i in {0..2}; do curl -v http://infra-node${i}.user[X].lab.openshift.ch:1936/healthz; done ``` As already mentioned, finding out if our cluster will remain in an operational state in the near future also includes some better known checks we could call a more conventional **components monitoring**. -Next to the usual monitoring of storage per partition/logical volume, there's one logical volume on each node of special interest to us: the **Docker storage**. The Docker storage contains images and container filesystems of running containers. Monitoring the available space of this logical volume is important in order to tune garbage collection. Garbage collection is done by the **kubelets** running on each node. The available garbage collection kubelet arguments can be found in the [official documentation](https://docs.openshift.com/container-platform/3.7/admin_guide/garbage_collection.html). +Next to the usual monitoring of storage per partition/logical volume, there's one logical volume on each node of special interest to us: the **Docker storage**. The Docker storage contains images and container filesystems of running containers. Monitoring the available space of this logical volume is important in order to tune garbage collection. Garbage collection is done by the **kubelets** running on each node. The available garbage collection kubelet arguments can be found in the [official documentation](https://docs.openshift.com/container-platform/3.11/admin_guide/garbage_collection.html). Speaking of garbage collection, there's another component that needs frequent garbage collection: the registry. Contrary to the Docker storage on each node, OpenShift only provides a command to prune the registry but does not offer a means to execute it on a regular basis. Until it does, setup the [appuio-pruner](https://github.com/appuio/appuio-pruner) as described in its README. @@ -71,7 +71,7 @@ Besides the obvious components that need monitoring like CPU, memory and storage But let's first get an overview of available resources using tools you might not have heard about before. One such tool is [Cockpit](http://cockpit-project.org/). Cockpit aims to ease administration tasks of Linux servers by making some basic tasks available via web interface. It is installed by default on every master by the OpenShift Ansible playbooks and listens on port 9090. We don't want to expose the web interface to the internet though, so we are going to use SSH port forwarding to access it: ``` -$ ssh ec2-user@jump.lab.openshift.ch -L 9090:master0.user[X].lab.openshift.ch:9090 +[ec2-user@master0 ~]$ ssh ec2-user@jump.lab.openshift.ch -L 9090:master0.user[X].lab.openshift.ch:9090 ``` After the SSH tunnel has been established, open http://localhost:9090 in your browser and log in using user `ec2-user` and the password provided by the instructor. Explore the different tabs and sections of the web interface. @@ -83,7 +83,7 @@ oc create sa kube-ops-view oc adm policy add-scc-to-user anyuid -z kube-ops-view oc adm policy add-cluster-role-to-user cluster-admin system:serviceaccount:ocp-ops-view:kube-ops-view oc apply -f https://raw.githubusercontent.com/raffaelespazzoli/kube-ops-view/ocp/deploy-openshift/kube-ops-view.yaml -oc expose svc kube-ops-view +oc create route edge --service kube-ops-view oc get route | grep kube-ops-view | awk '{print $2}' ``` diff --git a/labs/62_logs.md b/labs/62_logs.md index 073b519..83a7c88 100644 --- a/labs/62_logs.md +++ b/labs/62_logs.md @@ -7,45 +7,56 @@ As soon as basic functionality of OpenShift itself is reduced or not working at **Note:** While it is convenient to use the EFK stack to analyze log messages in a central place, be aware that depending on the problem, relevant log messages might not be received by Elasticsearch (e.g. SDN problems). -### Services Overview +### OpenShift Components Overview -The master usually houses two to three master-specific services: -* `atomic-openshift-master` (in a single-master setup) -* `atomic-openshift-master-api` and `atomic-openshift-master-controllers` (in a multi-master setup) -* `etcd` (usually installed on a master, also possible externally) +The master usually houses three master-specific containers: +* `master-api` in OpenShift project `kube-system` +* `master-controllers` in OpenShift project `kube-system` +* `master-etcd` in OpenShift project `kube-system` (usually installed on all masters, also possible externally) + +The node-specific containers can also be found on a master: +* `sync` in OpenShift project `openshift-node` +* `sdn` and `ovs` in OpenShift project `openshift-sdn` The node-specific services can also be found on a master: * `atomic-openshift-node` (in order for the master to be part of the SDN) * `docker` General services include the following: -* `openvswitch` * `dnsmasq` * `NetworkManager` -* `iptables` +* `firewalld` ### Service States -Check different service states from the first master using ansible. Check the OpenShift master services first: +Check etcd and master states from the first master using ansible. Check the OpenShift master container first: ``` -$ ansible masters -a "systemctl is-active atomic-openshift-master-*" +[ec2-user@master0 ~]$ oc get pods -n kube-system -o wide +NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE +master-api-master0.user7.lab.openshift.ch 1/1 Running 9 1d 172.31.44.160 master0.user7.lab.openshift.ch +master-api-master1.user7.lab.openshift.ch 1/1 Running 7 1d 172.31.45.211 master1.user7.lab.openshift.ch +master-api-master2.user7.lab.openshift.ch 1/1 Running 0 4m 172.31.35.148 master2.user7.lab.openshift.ch +master-controllers-master0.user7.lab.openshift.ch 1/1 Running 7 1d 172.31.44.160 master0.user7.lab.openshift.ch +master-controllers-master1.user7.lab.openshift.ch 1/1 Running 6 1d 172.31.45.211 master1.user7.lab.openshift.ch +master-controllers-master2.user7.lab.openshift.ch 1/1 Running 0 4m 172.31.35.148 master2.user7.lab.openshift.ch +master-etcd-master0.user7.lab.openshift.ch 1/1 Running 6 1d 172.31.44.160 master0.user7.lab.openshift.ch +master-etcd-master1.user7.lab.openshift.ch 1/1 Running 4 1d 172.31.45.211 master1.user7.lab.openshift.ch ``` -etcd: +Depending on the outcome of the above commands we have to get a closer look at specific container. This can either be done the conventional way, e.g. the 30 most recent messages for etcd on the first master: + ``` -$ ansible masters -a "systemctl is-active etcd" +[ec2-user@master0 ~]$ oc logs master-etcd-master0.user7.lab.openshift.ch -n kube-system --tail=30 ``` -As we can see, the command `systemctl is-active` only outputs the status but not the service name. That's why we execute above commands for each service separately. - There is also the possibility of checking etcd's health using `etcdctl`: ``` -# etcdctl --cert-file=/etc/etcd/peer.crt \ - --key-file=/etc/etcd/peer.key \ - --ca-file=/etc/etcd/ca.crt \ - --peers="https://master0.user[X].lab.openshift.ch:2379,https://master1.user[X].lab.openshift.ch:2379" - cluster-health +[root@master0 ~]# etcdctl2 --cert-file=/etc/etcd/peer.crt \ + --key-file=/etc/etcd/peer.key \ + --ca-file=/etc/etcd/ca.crt \ + --peers="https://master0.user[X].lab.openshift.ch:2379,https://master1.user[X].lab.openshift.ch:2379" \ + cluster-health ``` As an etcd cluster needs a quorum to update its state, `etcdctl` will output that the cluster is healthy even if not every member is. @@ -54,15 +65,15 @@ Back to checking services with systemd: Master-specific services only need to be atomic-openshift-node: ``` -$ ansible nodes -a "systemctl is-active atomic-openshift-node" +[ec2-user@master0 ~]$ ansible nodes -a "systemctl is-active atomic-openshift-node" ``` Above command applies to all the other node services (`docker`, `dnsmasq` and `NetworkManager`) with which we get an overall overview of OpenShift-specific service states. -Depending on the outcome of the above commands we have to get a closer look at specific services. This can either be done the conventional way, e.g. the 30 most recent messages for etcd on the first master: +Depending on the outcome of the above commands we have to get a closer look at specific services. This can either be done the conventional way, e.g. the 30 most recent messages for atomic-openshift-node on the first master: ``` -$ ansible masters[0] -a "journalctl -u etcd -n 30" +[ec2-user@master0 ~]$ ansible masters[0] -a "journalctl -u atomic-openshift-node -n 30" ``` Or by searching Elasticsearch: After logging in to https://logging.app[X].lab.openshift.ch, make sure you're on Kibana's "Discover" tab. Then choose the `.operations.*` index by clicking on the arrow in the dark-grey box on the left to get a list of all available indices. You can then create search queries such as `systemd.t.SYSTEMD_UNIT:atomic-openshift-node.service` in order to filter for all messages from every running OpenShift node service. @@ -73,6 +84,6 @@ Or if we wanted to filter for error messages we could simply use "error" in the **End of Lab 6.2** -

Upgrade OpenShift from 3.6 to 3.7 →

+

Upgrade OpenShift from 3.11.88 to 3.11.104 →

[← back to the Chapter Overview](60_monitoring_troubleshooting.md) diff --git a/labs/70_upgrade.md b/labs/70_upgrade.md index 9816d2c..bb908bc 100644 --- a/labs/70_upgrade.md +++ b/labs/70_upgrade.md @@ -1,17 +1,16 @@ -# Lab 7: Upgrade OpenShift from 3.6 to 3.7 +# Lab 7: Upgrade OpenShift from 3.11.88 to 3.11.104 -In this chapter, we will upgrade OpenShift 3.6 to 3.7. +In this chapter, we will do a minor upgrade from OpenShift 3.11.88 to 3.11.104. ## Chapter Content -* [7.1: Upgrade to the Latest Version of OpenShift 3.6](71_upgrade_openshift36.md) -* [7.2: Upgrade to OpenShift 3.7](72_upgrade_openshift37.md) -* [7.3: Verify the Upgrade](73_upgrade_verification.md) +* [7.1: Upgrade OpenShift 3.11.88 to 3.11.104](71_upgrade_openshift3.11.104.md) +* [7.2: Verify the Upgrade](72_upgrade_verification.md) --- -

7.1 Upgrade to the Latest Version of OpenShift 3.6 →

+

7.1 Upgrade to OpenShift 3.11.104 →

[← back to the Labs Overview](../README.md) diff --git a/labs/71_upgrade_openshift3.11.104.md b/labs/71_upgrade_openshift3.11.104.md new file mode 100644 index 0000000..171423f --- /dev/null +++ b/labs/71_upgrade_openshift3.11.104.md @@ -0,0 +1,125 @@ +## Lab 7.1: Upgrade OpenShift 3.11.88 to 3.11.104 + +### Upgrade Preparation + +We first need to make sure our lab environment fulfills the requirements mentioned in the official documentation. We are going to do an "[Automated In-place Cluster Upgrade](https://docs.openshift.com/container-platform/3.11/upgrading/automated_upgrades.html)" which lists part of these requirements and explains how to verify the current installation. Also check the [Prerequisites](https://docs.openshift.com/container-platform/3.11/install/prerequisites.html#install-config-install-prerequisites) of the new release. + +Conveniently, our lab environment already fulfills all the requirements, so we can move on to the next step. + +#### 1. Ensure the openshift_deployment_type=openshift-enterprise #### +``` +[ec2-user@master0 ~]$ grep -i openshift_deployment_type /etc/ansible/hosts +``` + +#### 2. disable rolling, full system restarts of the hosts #### +``` +[ec2-user@master0 ~]$ ansible masters -m shell -a "grep -i openshift_rolling_restart_mode /etc/ansible/hosts" +``` +in our lab environment this parameter isn't set, so let's do it on all master-nodes: +``` +[ec2-user@master0 ~]$ ansible masters -m lineinfile -a 'path="/etc/ansible/hosts" regexp="^openshift_rolling_restart_mode" line="openshift_rolling_restart_mode=services" state="present"' +``` +#### 3. change the value of openshift_pkg_version to 3.11.104 in /etc/ansible/hosts #### +``` +[ec2-user@master0 ~]$ ansible masters -m lineinfile -a 'path="/etc/ansible/hosts" regexp="^openshift_pkg_version" line="openshift_pkg_version=-3.11.104" state="present"' +``` +#### 4. upgrade the nodes #### + +##### 4.1 prepare nodes for upgrade ##### +``` +[ec2-user@master0 ~]$ ansible all -a 'subscription-manager refresh' +[ec2-user@master0 ~]$ ansible all -a 'subscription-manager repos --enable="rhel-7-server-ose-3.11-rpms" --enable="rhel-7-server-rpms" --enable="rhel-7-server-extras-rpms" --enable="rhel-7-server-ansible-2.6-rpms" --enable="rhel-7-fast-datapath-rpms" --disable="rhel-7-server-ose-3.10-rpms" --disable="rhel-7-server-ansible-2.4-rpms"' +[ec2-user@master0 ~]$ ansible all -a 'yum clean all' +[ec2-user@master0 ~]$ ansible masters -m lineinfile -a 'path="/etc/ansible/hosts" regexp="^openshift_certificate_expiry_fail_on_warn" line="openshift_certificate_expiry_fail_on_warn=False" state="present"' +``` +##### 4.2 prepare your upgrade-host ##### +``` +[ec2-user@master0 ~]$ sudo -i +[ec2-user@master0 ~]# yum update -y openshift-ansible +``` + +##### 4.3 upgrade the control plane ##### + +Upgrade the so-called control plane, consisting of: + +- etcd +- master components +- node services running on masters +- Docker running on masters +- Docker running on any stand-alone etcd hosts + +``` +[ec2-user@master0 ~]$ cd /usr/share/ansible/openshift-ansible +[ec2-user@master0 ~]$ ansible-playbook playbooks/byo/openshift-cluster/upgrades/v3_11/upgrade_control_plane.yml +``` + +##### 4.4 upgrade the nodes manually (one by one) ##### + +Upgrade node by node manually because we need to make sure, that the nodes running GlusterFS in container have enough time to replicate to the other nodes. + +Upgrade `infra-node0.user[X].lab.openshift.ch`: +``` +[ec2-user@master0 ~]$ ansible-playbook playbooks/byo/openshift-cluster/upgrades/v3_11/upgrade_nodes.yml \ + --extra-vars openshift_upgrade_nodes_label="kubernetes.io/hostname=infra-node0.user[X].lab.openshift.ch" +``` +Wait until all GlusterFS Pods are ready again and check if GlusterFS volumes have heal entries. +``` +[ec2-user@master0 ~]$ oc project glusterfs +[ec2-user@master0 ~]$ oc get pods -o wide | grep glusterfs +[ec2-user@master0 ~]$ oc rsh +sh-4.2# for vol in `gluster volume list`; do gluster volume heal $vol info; done | grep -i "number of entries" +Number of entries: 0 +``` +If all volumes have `Number of entries: 0`, we can proceed with the next node and repeat the check of GlusterFS. + +Upgrade `infra-node1` and `infra-node2` the same way you as you did the first one: +``` +[ec2-user@master0 ~]$ ansible-playbook playbooks/byo/openshift-cluster/upgrades/v3_11/upgrade_nodes.yml \ + --extra-vars openshift_upgrade_nodes_label="kubernetes.io/hostname=infra-node1.user[X].lab.openshift.ch" +``` + +Afer upgrading the `infra_nodes`, you need to upgrade the compute nodes: +``` +[ec2-user@master0 ~]$ ansible-playbook playbooks/byo/openshift-cluster/upgrades/v3_11/upgrade_nodes.yml \ + --extra-vars openshift_upgrade_nodes_label="node-role.kubernetes.io/compute=true" \ + --extra-vars openshift_upgrade_nodes_serial="1" +``` + +#### 5. Upgrading the EFK Logging Stack #### + +**Note:** Setting openshift_logging_install_logging=true enables you to upgrade the logging stack. + +``` +[ec2-user@master0 ~]$ grep openshift_logging_install_logging /etc/ansible/hosts +[ec2-user@master0 ~]$ cd /usr/share/ansible/openshift-ansible/playbooks +[ec2-user@master0 ~]$ ansible-playbook openshift-logging/config.yml +[ec2-user@master0 ~]$ oc delete pod --selector="component=fluentd" -n logging +``` + +#### 6. Upgrading Cluster Metrics #### +``` +[ec2-user@master0 ~]$ cd /usr/share/ansible/openshift-ansible/playbooks +[ec2-user@master0 ~]$ ansible-playbook openshift-metrics/config.yml +``` + +#### 7. Update the oc binary #### +The `atomic-openshift-clients-redistributable` package which provides the `oc` binary for different operating systems needs to be updated separately: +``` +[ec2-user@master0 ~]$ ansible masters -a "yum install --assumeyes --disableexcludes=all atomic-openshift-clients-redistributable" +``` + +#### 8. Update oc binary on client #### +Update the `oc` binary on your own client. As before, you can get it from: +``` +https://client.app[X].lab.openshift.ch +``` + +**Note:** You should tell all users of your platform to update their client. Client and server version differences can lead to compatibility issues. + +--- + +**End of Lab 7.1** + +

7.2 Verify the Upgrade →

+ +[← back to the Chapter Overview](70_upgrade.md) diff --git a/labs/71_upgrade_openshift36.md b/labs/71_upgrade_openshift36.md deleted file mode 100644 index 9836b1a..0000000 --- a/labs/71_upgrade_openshift36.md +++ /dev/null @@ -1,14 +0,0 @@ -## Lab 7.1: Upgrade to the Latest Version of OpenShift 3.6 - -Before upgrading to the next release, you need to upgrade to the latest minor version of the actual release. For OpenShift 3.6, you can find the different errata updates at https://docs.openshift.com/container-platform/3.6/release_notes/ocp_3_6_release_notes.html#ocp-36-asynchronous-errata-updates. - -The procedure to upgrade to the latest minor version is the same as upgrading to the next release. As our lab installation is already on the latest minor version of OpenShift 3.6, we can upgrade directly to OpenShift 3.7. - - ---- - -**End of Lab 7.1** - -

7.2 Upgrade to OpenShift 3.7 →

- -[← back to the Chapter Overview](70_upgrade.md) diff --git a/labs/72_upgrade_openshift37.md b/labs/72_upgrade_openshift37.md deleted file mode 100644 index a93faa9..0000000 --- a/labs/72_upgrade_openshift37.md +++ /dev/null @@ -1,116 +0,0 @@ -## Lab 7.2: Upgrade to OpenShift 3.7 - -### Upgrade Preparation - -We first need to make sure our lab environment fulfills the requirements mentioned in the official documentation. We are going to do an "[Automated In-place Cluster Upgrade](https://docs.openshift.com/container-platform/3.7/upgrading/automated_upgrades.html#install-config-upgrading-automated-upgrades)" which lists part of these requirements and explains how to verify the current installation. Also check the [Prerequisites](https://docs.openshift.com/container-platform/3.7/install_config/install/prerequisites.html#install-config-install-prerequisites) of the new release. - -Conveniently, our lab environment already fulfills all the requirements, so we can move on to the next step. Let's attach the repositories for the new OpenShift release: -``` -[ec2-user@master0 ~]$ ansible all -a "subscription-manager refresh" -[ec2-user@master0 ~]$ ansible all -a 'subscription-manager repos --disable="rhel-7-server-ose-3.6-rpms" --enable="rhel-7-server-ose-3.7-rpms" --enable="rhel-7-fast-datapath-rpms"' -[ec2-user@master0 ~]$ ansible all -a "yum clean all" -``` - -Next we need to upgrade `atomic-openshift-utils` to version 3.7 on our first master. -``` -[ec2-user@master0 ~]$ sudo yum update -y atomic-openshift-utils -.... -Updating - atomic-openshift-utils noarch 3.7.72-1.git.0.5c45a8a.el7 rhel-7-server-ose-3.7-rpms 374 k -Updating for dependencies: - openshift-ansible noarch 3.7.72-1.git.0.5c45a8a.el7 rhel-7-server-ose-3.7-rpms 349 k - openshift-ansible-callback-plugins noarch 3.7.72-1.git.0.5c45a8a.el7 rhel-7-server-ose-3.7-rpms 340 k - openshift-ansible-docs noarch 3.7.72-1.git.0.5c45a8a.el7 rhel-7-server-ose-3.7-rpms 362 k - openshift-ansible-filter-plugins noarch 3.7.72-1.git.0.5c45a8a.el7 rhel-7-server-ose-3.7-rpms 354 k - openshift-ansible-lookup-plugins noarch 3.7.72-1.git.0.5c45a8a.el7 rhel-7-server-ose-3.7-rpms 330 k - openshift-ansible-playbooks noarch 3.7.72-1.git.0.5c45a8a.el7 rhel-7-server-ose-3.7-rpms 440 k - openshift-ansible-roles noarch 3.7.72-1.git.0.5c45a8a.el7 rhel-7-server-ose-3.7-rpms 2.0 M -.... -``` - -Change the following Ansible variables in our OpenShift inventory: -``` -[ec2-user@master0 ~]$ sudo vim /etc/ansible/hosts -.... -openshift_image_tag=v3.7 -openshift_release=v3.7 -openshift_pkg_version=-3.7.72 -... -openshift_logging_image_version=v3.7 -``` - - -### Upgrade Procedure - -1. Upgrade the so-called control plane, consisting of: - - etcd - - master components - - node services running on masters - - Docker running on masters - - Docker running on any stand-alone etcd hosts - -``` -[ec2-user@master0 ~]$ ansible-playbook /usr/share/ansible/openshift-ansible/playbooks/byo/openshift-cluster/upgrades/v3_7/upgrade_control_plane.yml -... -``` -2. Upgrade node by node manually because we need to make sure, that the nodes running GlusterFS in container have enough time to replicate to the other nodes. -Upgrade "infra-node0.user[X].lab.openshift.ch": -``` -[ec2-user@master0 ~]$ ansible-playbook /usr/share/ansible/openshift-ansible/playbooks/byo/openshift-cluster/upgrades/v3_7/upgrade_nodes.yml --extra-vars openshift_upgrade_nodes_label="kubernetes.io/hostname=infra-node0.user[X].lab.openshift.ch" -... -``` - -Wait until all GlusterFS Pods are ready again and check if GlusterFS volumes have heal entries. -``` -[ec2-user@master0 ~]$ oc project glusterfs -[ec2-user@master0 ~]$ oc get pods -o wide | grep glusterfs -glusterfs-storage-b9xdl 1/1 Running 0 23m 172.31.33.43 infra-node0.user6.lab.openshift.ch -glusterfs-storage-lll7g 1/1 Running 0 23m 172.31.43.209 infra-node1.user6.lab.openshift.ch -glusterfs-storage-mw5sz 1/1 Running 0 23m 172.31.34.222 infra-node2.user6.lab.openshift.ch -[ec2-user@master0 ~]$ oc rsh -sh-4.2# for vol in `gluster volume list`; do gluster volume heal $vol info; done | grep -i "number of entries" -Number of entries: 0 -``` - -If all volumes have "Number of entries: 0", we can proceed with the next node and repeat the check of GlusterFS. - -``` -[ec2-user@master0 ~]$ ansible-playbook /usr/share/ansible/openshift-ansible/playbooks/byo/openshift-cluster/upgrades/v3_7/upgrade_nodes.yml -e openshift_upgrade_nodes_label="kubernetes.io/hostname=infra-node1.user[X].lab.openshift.ch" -... -``` -3. Upgrading the EFK Logging Stack -``` -[ec2-user@master0 ~]$ ansible-playbook /usr/share/ansible/openshift-ansible/playbooks/byo/openshift-cluster/openshift-logging.yml -[ec2-user@master0 ~]$ oc delete pod --selector="component=fluentd" -n logging -``` - -4. Upgrading Cluster Metrics -``` -[ec2-user@master0 ~]$ ansible-playbook /usr/share/ansible/openshift-ansible/playbooks/byo/openshift-cluster/openshift-metrics.yml -``` - -5. The `atomic-openshift-clients-redistributable` package which provides the `oc` binary for different operating systems needs to be updated separately: -``` -[ec2-user@master0 ~]$ ansible masters -a "yum install --assumeyes --disableexcludes=all atomic-openshift-clients-redistributable" -``` - -6. To finish the upgrade, it is best practice to run the config playbook: -``` -[ec2-user@master0 ~]$ ansible-playbook /usr/share/ansible/openshift-ansible/playbooks/byo/config.yml -``` - -7. Update the `oc` binary on your own client. As before, you can get it from: -``` -https://console.user[X].lab.openshift.ch/console/extensions/clients/ -``` - -**Note:** You should tell all users of your platform to update their client. Client and server version differences can lead to compatibility issues. - - ---- - -**End of Lab 7.2** - -

7.3 Verify the Upgrade →

- -[← back to the Chapter Overview](70_upgrade.md) diff --git a/labs/73_upgrade_verification.md b/labs/72_upgrade_verification.md similarity index 100% rename from labs/73_upgrade_verification.md rename to labs/72_upgrade_verification.md diff --git a/renovate.json b/renovate.json new file mode 100644 index 0000000..5db72dd --- /dev/null +++ b/renovate.json @@ -0,0 +1,6 @@ +{ + "$schema": "https://docs.renovatebot.com/renovate-schema.json", + "extends": [ + "config:recommended" + ] +} diff --git a/resources/11_ops-techlab.png b/resources/11_ops-techlab.png index 86ee615..5bbcf0c 100644 Binary files a/resources/11_ops-techlab.png and b/resources/11_ops-techlab.png differ diff --git a/resources/11_ops-techlab.xml b/resources/11_ops-techlab.xml index 5bbccb5..61cd37d 100644 --- a/resources/11_ops-techlab.xml +++ b/resources/11_ops-techlab.xml @@ -1 +1 @@ -7V1tk5s4Ev41U3V3VXHphRfxMTN5ua1KUqmduru5+4aNbLPBxoeZZLK/fiVAGNSyzdiCweM4tTu2ANno6X661d0SN/Ru9fQxCzfLz2nEkxuCoqcb+u6GEII8T/yRLT/LFowxKlsWWRxVbbuG+/hPXjWq0x7jiG9bJ+ZpmuTxpt04S9drPstbbWGWpT/ap83TpP2tm3DBQcP9LExg63/iKF9WrZ7r7A78k8eLZfXVhNLqlqfh7NsiSx/X1RfeEDovXuXhVag6q+50uwyj9Eejib6/oXdZmublu9XTHU/k6KpxK6/7sOdo/cMzvs67XMBIecX3MHmsbv5xy7Oqn23+Uw1JcU9cXoVu6O2PZZzz+004k0d/CCkQbct8lYhPWLzd5ln6rR46cVO38zhJ7tIkzYre6DyY+7Kz23m6ziv8sSs+h0m8WIsPCZ/ndUeNC6lHAyov/M6zPBaIva0uyFP5G+DtV3ciT+dPjaZqOD7ydMXz7Kc4RYkvoeUlSniJW37+sZMExqq2ZUMIHNUYVtK3qPveASDeVBjswYMCPG6Il8ixiOLvsvPqfkXr/x+llNwWQ1V/Eu8W8q9E8UFdKr61uLo81DO2vHiNEltBROwouJSSnsB1oLLhJ/H531/vFFLTTKFE5ZH7x+ma5/AgbCk6evflXv7udJsLNAn6X7rmhmsL4QlXckyrztD2cRqlqzBeS9bmCV+EeZyubQoK1gRlLX+b+AFZGMUCUK35iOj0JiAOaouHsmQN8cAORQb5QOh8+Qg8IB9fUmkIz8IhCrfL4lwdAsnDrvx3gq56xatfXaVtXXUohbqKDLqKqQ0sXIDF51DoVdYjGpHLWeScgAYjUzowGqKlKxrEAhoOQOO39TwL3/SsH1HI2Xx2in7MGJ/OB0WEOh31I7CgHmy/duj25jNfpaJz+lac73w0mK67r/+qjop316lehLYNj5hDQafTYybt8s8HU0279mnXIUTZQURJiah+/F28/VadgOT16G93X+7/fpVqrCNv0mIz8jY8DkXiZ083+pIU6dTuEZZ9PrHNyc6F+k66UJlcJ7NQEQvTHDjLWRW2AU0shBb2cbo+kRDHAj9Cvn8AoJk4n2cHGR/Odi3AQwON7QmZwHkoJoZ5hmdB6aEvVeKDf+FTDTKt8XgZhODcIy7s8VqwbC9apOyjASWfhx5HJ6FUW89BtIi6g2IE5+rhZtMbQkeNzQEUlLkxgDst/g2FkEODIRHyD2hRL1x3zMvsoCmHFLAnrnM0rhtYjzAku9/TRzlOVvWHzfhsD8OJ8Z0WCvJcR27KXMdFdnBwEdZRABi4BghsOGwYctnvfBFvVTfXCwJjBlXw+oIBEtZn0Uc8OzMmMigKvZCU7zgaLv6QuAQAFwDIdhlu5Nt5wp/eygy0uHG+jqq372ZJuN3Gs8O4lHliOPb8Kc4fKqjl+//K9xNigGXvUPOolfCGA32EZVRbxpMwj7/zVuemsa2+4Wsai1+yS68w3bF2ca1hqptt+pjNeHXlDiXQmefBzvyJr/WWh9mC56C3AvP6/rtNa2GUrCgZeNyIRhn/AEIhpDtvQ57xbfxnOC1OkHBu5O8qfql7e+O+M/kKepBgFUeRvP42Cac8ua3rDbpIkkE+lGzrulgXV1S/9qZZn2DSUTTBjPktPN44VqTmDdGBxq4uNOl8vuVnIwwjF3cyo3rd5Osyg4fYjXyd88nX4MX/It+T1MjXI80YnUq9zNe6QoHelUXehbmmHe+WWadLZN5asMfMvFjjXdwL71JT/inO47AoK+SbJP254pUcWSPifUgdn6yj4tUX3YoZeWvMPQcNOR2nMCP0NcykTKRz2bX4T4iUVDnxd/pCkLTSMgPjQ4OXxQdG798/bZJwba7YgkTYHi5VbwUrs7rToQnlthx08EOfrSZMgwGbYHBMXokNFKBXoqK/fUQWh4/+tjTMAlwwpTJwMJhCL6LMepFXkfWyjRfz2h7ewAkwCkMuv2Zigvu7ROr7mokpLm2Mv5zZ3Fcf0yxfpot0HSbvd62aHWigUc+ryrlUPbUqpm7ilzWPyc/q4B88z39WkISPeSqadt/8KZUVFCXah5Hbi0Y5BVJCX7aVc5l229FJXWfXu/P4Q8t/jYkSEAseMFHiQ7Mv5mrbNOGlEZkIl3iSbvh6u4zn+WS2PA+a48to5nMyO6WuLRJAup4l0x4wzbSr6EADkbrGsAmJjTptH9r1f0yEL3adWDBf84qZazLbJhNhBYzBMiWdhrphSFp2BJdfqcCk1VG1ENGbUOkUqwEjxeH9sO0smW7IOhkZJcBNI4NxVyszTOgwwG1v0Ee03UXnwCHTY5B+f4FDTGFpgfz+D46ssf2AA7Vo87ICh7WSnR04fCMjh8p8qsghPk9irAYGmak8eXx8gtp8gnVCcc5mk4Ozuhab+CY2cUbFJhhpy+w8VUz3XDrBiGhrYIg3Ya6PHVL+P+iNW1wYs3ZU6f5l8Qnb52Wcwieu72rrxMZEJ9BXbLrvD1fpMgp2eDnvnUGHsfDerxQKqtWf7XHfXcPKDStrXiGnvay5rU0q0y3q5HwPvbNNZUYP3R2XTcW+blO1PjrbVMdt91T33IcVhQ7ehVrRWndsWFEP+Voh6oisaDA2p3y/4+1OmDfcTF4ZszZPeOPiCYq0CJFuODrzhBD5pq9N2xLrUDLZ+eFMmaw+ZvkwKPsKZvmBTa+cIH9UDAIRe/v1N4DSOWkM7kUo4if4c0HgYeaa8x+R69MgsuPrUc3KEq9ijwHSGAGcB92JYcrSJLGcTRo9DA5iR2EwLZS2AgOc/FydGnhYVwPTyo7eFj4hOOO5Vk3wPFdL45mx6EsbMIJ+5WWpg/UaHH1TmAFNBFYbB126ZlgHhelKMpzBwAjuDsnzWXRBcNgw2QwfRaA/tYCu6x+Pq81VhiupfzRwjE1A2AhV1hnyZh30p9tynyBZ/nENAOhpedc3hYuZ1xcEhlJ0BUFRDvWmyq5cAxY+0augsEkfTAXpdsCAtQ4tMB6uCQyMmR7Dhhtg9YgF3E+hyU0PVwEBpe0laUOTEwOjbKVa2ddTSI2iEB9UmQ1ZrVxTwGl5Kuv1yni4fROeV72DmyipUp5DSQQhS4MlG2ur2kRxXPU7juNOaOPVNnseah3U9yftvruDvuo8IBN2oGObGQUouBealcT29nd4I3D1NI9/TEkFbNifY4x0QzS+CQDdoJPqBRtkU1LP6XRDRkU3kAkwOUwFXTnGR+1+tMgOHYxwHCi6l0o4tRZaIRxXW2x9ZnGyVuNsmX9Mj24ZAf8cXvyg048zCYKzvZ3OJRN1GKdJP/6o6Cdwgonn+MhlTK410wTSx4LRAwdjvzrnNCYKiBj3Ri9wi4Hq24tztGppm54PzMG8glqKnWraICUUqD0YxrhkAhu2d74AGoIFnmjANVhGGhpXgecRhrBDQz469CWOcLtqGpRn9UdD5HXSkFJNKzVdnkJ5nN4QjIGW+208FGstBAYCk7dEAqFDeVk7b+wlmDOWYBDXEK5WhGR9A2bDFntyI5uegbrELecBUNSQe+4RKFi3B4DCtoG6yJ3nAVCO6Ul+/QE1tsh37wtpDu7Z0tEHM6UvyMjK54m2L1Id73h++bzek9/jtA4Sx8XGl2wGtImv8o/jmLsZHsTWfhyXuLlYMHl5B+oZXNKB05/H1brsOU17qpnMj9Y6v/UcQ3UpiXBtZZ6xTApDyrKSBjdsrfqKLVFr9zAt7+I/IyfimSzRuMKSuA5Q6I+MfL4lQm0BJb62u4tNSwSD5K9hZk+tZT1kHMfVNnMeVXzRgYVmL8sonfZksZVlPbGoAxm2ZSGj4hNG2kHt+qFAz6UTV1+10OPSTwcGmQzSeAkEUmvV1aVNHTg3GQWfHK4Se2E+UROFMW9J4egTXHTi0yb0jgjtzz9xYOTmUhlF6dXVMYra0e6qV0n52h5rAy+sdS9+pZrthYOBzmJDLltzYVHA7lmoveyAf4nPQtU3vacuHfZJqL8ehVpoir75kXGb+950Bfqjn9LFIl4vLgiFXtQj0DaXHvbpqA4sTfv1SAgxbdcpi3RDxMIDIbALp98AjxGEgsZf8Vrb55FM3TAivlZz753xtEAMnv3T67bf7qvcEGynbVdQxOqOrYCiU5DZ1tJB3CSWM9fyqNT/SJiFYS3MfGrWytcek1g/NrEHQlEG9BWUT9SK9drWA4qPWSpLIHanC3dv+VnMb+UZfwE= \ No newline at end of file +7V1tc9o6Fv41zOzuTBhLsiz7Y0jabmebTudmX7L7zWABvjWYa0yb3F+/kt/AOjIQkB0TmkwbLNsC9JzznKNzjuQBuVs8f0r81fwhDng0wFbwPCD3A4yxRWzxR7a85C0IOUXLLAmDom3b8Bj+yYtGq2jdhAFf1y5M4zhKw1W9cRIvl3yS1tr8JIl/1i+bxlH9XVf+jIOGx4kfwdb/hEE6L1odam9P/J2Hs3nx1pgQJz8z9iffZ0m8WRZvOMBkmv3kpxd+2VnxTddzP4h/7jSRDwNyl8Rxmr9aPN/xSI5uOW75fR8bzlYfPOHL9JgbXJzf8cOPNsWX36x5UvSzTl/KIcm+E5d3WQMy+jkPU/648ify7E8hBaJtni4icYTEy3WaxN+roRNfajQNo+gujuIk641MvSmTnY2m8TIt8EdUHPtROFuKg4hP06qjnRuJQzwib/zBkzQUiN0WN6Sx/Azw6xffRF7On3eaiuH4xOMFT5MXcUkpvpjkt5TCi2l+/HMrCa5btM13hICiotEvpG9W9b0FQLwoMGjAgwA8BtiJ5FgE4Q/ZefF9ResfGyklo2yoqiPxaib/ShSfylvFu2Z356daxpZnP73EFiHkHgQXM9wSuDZUNvQsjv/97a5EapyUKBF55nEzXvIUnoQtWUf3Xx/l547XqUATW/+Ll1xzbyY8/kKOadGZtd6Mg3jhh0vJ2jziMz8N46VJQUGKoCzlZxMfIPGDUACqNB8QndYExLbq4kEsIB6IMAvKh+2458uH5wD5+BpLQ3gWDoG/nmfXqhBIHqby9wRddbKfdnWV1HXVdqCuIlejq5VSn4UFBVg8+EKvkhbRCCh3A/sENFw8Jh2jgW1yJBoEG0DDBmh8Xk4T/6Zl/Qh87k4np+jHxOXjaaeIEHakfniWAT+lWTtUe/PAF7HonNyK6+1PGtN19+1fxVnx6jrVC5O64cEltrtOZ2li6trFzgeznHY1adc+RN29iOIcUfX8fbj+Xlxgyfutv9x9ffzrVaqxirxOi/XIWwbUGCGA/GnTjbYkRTq1DcLS5BObnOxcqO+kCpXOddILlQnXCc5yFpltsIYGQgtNnK5OJMQ5jwUWY3sAmojrebKX8eFs1wA8xFPYntpDje3GmnmGY0DpoS+V44N+4VMMMsElHm+DEJx7hJk9XgqWbUWLSvuoQYlx3+HWSShV1rMTLSJupxjBubq/WrWG0EFjsweF0txowB1nv10hZDPUJUJsjxa1wnWHvMwjNGWfArbEdbbCdR3rEYJk91u8keNkVH/cCZ80MJwY33GmIK915MYutallBgdqIRUFgAHVQGDCYUOQy37js3BddnO1INjY0qiC0xYMkLAeRB/h5MyYSKcotEJSzLbruCCvS1w8gAsAZD33V/LlNOLPtzIDLb44XwbFy/tJ5K/X4WQ/LnmeGI49fw7TpwJq+fq/8vUQa2BpHGoe1BLecKAPsEzZlvDIT8MfvNa5bmyLd/gWh+KTbNMrrupYU1ppWNnNOt4kE17cuUUJdOY4oDPbGzKlt9RPZjwFvWWYV9//uGktjJJlJQOblWiU8Q8gFEK60zrkCV+Hf/rj7AIJ50p+ruyT0tGA3ut8BTVIsAiDQN4/ivwxj0ZVvcExkqSRj1K2VV2siiuKTzvYrU/Q6ag1RK7Lanjc2Eak5garQCOqCk08na752QjDyMWdzKheN/lSV+MhHke+9vnkq/Hif5HvSWrE1EizjU+lXpepXSG1K4O8C3NNW97Ns06iC55Ogksk4Eq++0zAxKvTr6LXRsiX6JJQYRr6WW0hX0Xxy4IXwmSMjZtwOjxjt7KftjhXTMtrI+4w3OWcnMC00Dc/kRIRT2XX4p8QKKl34u/4jSCp5WY6xod4b4sPDOF/eF5F/lJftgVpsD5cZdEVLM86ngx1KNfl4Ahn9NVq4iow2DoYbJ1rYgIF6JqUIeA2wovdh4BrGmYALphX6TgiTKArkae+8LtIfZnGy3Xqbl7HWTAC4y6/pmOC+48J17c1HSu5dGf85fTmsTiMk3Qez+KlH33Ytip2YAeNanKVT6iq+VU2fxOfbPecPC5P/s7T9KWAxN+ksWjavvOXWJZR5GjvR64RjXweNKgFBfIJTb3t4MzuaMf76PGHlv8asyVqQLjLbAmDZl/M1NZxxHMjMhQu8TBe8eV6Hk7T4WR+HjSH19JMp3hySnFbIICkjqEgkcJKZYBgd10F0gBCDNgJBq3634bCE7tOJFxW94kZYjqjrTMQRsDoLFly1FDvmJGaFUH5W5ZgkuJsuRbRGRLpEpcDhrPTzbBt7Zhqxo4yMaUA75oYhI61Md1EDz1U9wUZOTV26KqxQ6+92CEisLpAvv9HWy5f+Ii8ct3mZQUNKyU7O2h4I6OGFNcQuUHnSYzRsKCrq1DuH59YdT5BKqHYZ7PJ3jldjU2Yjk3sXrEJspSVdk5ZT/daOkEWVpbBYGfoUoZsnP/vtcYtFEas7XJV1GXxidvkZZzCJ5RRZalYn+gE+oq7zvvTVbqMwtdQfMYOvXcXOoyZ936lUGBan9k2uO9Us3jDBBoe5LS3NbeVSXVVizo830NHBy26OLh/rh29DF7h2btaz572yxYjptpipY+jbbFN6z1VPbdhfaFjeKHWt9I5E9bXsZhSSNwj6+v1zZlvdtjp0HW6iwCURrDOE06/eIJYSrbVwSfyhBD5XR+d1CXWduzh1n93qbKzhsnoAAzlvoPogGfSm8cW6xWDQMRuv30GKJ2T/OBOYAX8BD/Q8xzkygs1WZOAMuIFZnxEolhZUq5a6CD54cH5050YpiSOIsM5qN7DYFvuQRh0a6yNwAAnTVenBo4SkRaDopkptbZmyoIzpWvVBMehSl2PHou2tAFZ0K+8LHUwXrmj1np3aCKQhd+HZhgHxaUHQWlPReDGktqS+f7CYcJku+ggAu2pBXRdf98sVlcZ5qx2OWoOOCMdECZCnFVmfbd6+sso32JIlo1cAwBqOt+xdGFm12kLAljGdltCUORjxOeXU3Hx/92X0U7zTSgn0Uu5LOT9w0Qdde0dcjwAk7ba0wxOsHxCxelJj9PTVeEEMmgdwwQ3cNhltKergKDaJPSNKM0Fo2ykMprtSVgxUNPWZWV0xQ6nZbeM10aj7jZqaLVWiA6FLBlLbZ6Tvqzs9C7C/aoksm06JDs/dWvp4dpJdbPU47eaAGYY20N3T88mkxRQqi800YnM7TZxYw2Jo0wi+pSnqBZF9Y+L2D4y8gAXWaeVLuLXkJG8bntKHpxPU7hXNKVhELqfQY7lJmbhWj9KINbpjKdsKPGXylOV8hrhKaosFj+zvFqp0jZMW7rnz/SCtvYt31BZyx563tke1CvoB0P6Yb2iH8/2ho7NLOq6mFCsCCSz8VBcghArrjmNiTzhE+32ArdIKN49u0ap9zbpMMFs0Duo6tiqpglSsjxSny33atEH0uxRfQE0BEtUrQ5XkWlpqF+lpgcYwgwNMWvfm9jC7apoUF7VHg3h90lDpWoaqS5zSpT76Q3BuGq+X8hTtlpEYCAwucUSCBXKy9o5pJFgXhMCV5aAY1ezbqEkJOO7SGv2CZQb8bQM1CXumw+AsjVZ8BaBghWEAChkGqiL3D4fAuXBp0K1CFTfoumXvRRIny3BPavxx8qWT6hcG/76Gn918yirxTp+yCkXG3oyGSLHrEx39mNap9nj8x1TSm0bK1QjlDwIfyRvODre6Fd8CVUzTfUBhq/nDavOG8RqcXcQGO18D1M0Yix8LSjEo8qewr0KFNmwQKknjGIsk9atl2Jpdg/5sfj6Nf3+4Z9/jO4eVp8+/2OFRvYNcvrluri4HtCsnmrzWgYCi5ip4k2bzJnBAINGgC+BcypFvLqUWekhX3VlPVP28+l4MRa9+NUNphebeI7qRnW41IHC9M320Xut7LV8iY/eU7dXJi7V6EyLD9779eS9TFPUDTO0Gyq3piswdPElns3C5eyCUGhFPTxlI9NuH8ZnwyKCX5uPCydbpazungSFKHSWAR79KwroYW1SZZ97MnVDFmZKdaRzxsOpEHjKRKtbzNJ3uYnMVtuuoNyI9i3VlTY/+6CVlSP6RUYdVmuX4eKeMJKLlGDSqeFspjzIC3vtEZEDsyuXmgWjJrNgvV4o0hBbhVCWy2nfxbMmNLgC1miONTmHnzVBNVURJpZ5NsDVt30NT1t4aCg/0WgeDmcnFPo/aEv2a49mB/S3MiiOp6z+QYowHu3dWl69I7e17ETD4OJLtjJHG5S66O1T+32mp9d7oTd8KRhK15mei92e+yzTg4hdDxb2wPb0balE+2V6+w3MyalxQ8aH6IwPfUvjgzBR0qmnWh8khHtIPMZw8atYtXKxRWfGCKYwrtYYNfg3W2PU663B99eYKHumnWdqerLa4ixDRLx6IL7KgHeROWzACsZBe2KI2ivF6rcl0hSO4re0Q7ZaXH6qGQIdtfeciYahhVlbjbRfhdlhh8zOJRVzNXzHI9ID51ggdit/Dc1Qqu13q23QNIZBYxfUOMIRhkEcJnGc7o62+Krzhzjg8or/Aw== \ No newline at end of file diff --git a/resources/images/prometheus_architecture.png b/resources/images/prometheus_architecture.png new file mode 100644 index 0000000..1610bc0 Binary files /dev/null and b/resources/images/prometheus_architecture.png differ diff --git a/resources/images/prometheus_cmo.png b/resources/images/prometheus_cmo.png new file mode 100644 index 0000000..7315f1e Binary files /dev/null and b/resources/images/prometheus_cmo.png differ diff --git a/resources/images/prometheus_use-cases.png b/resources/images/prometheus_use-cases.png new file mode 100644 index 0000000..8f76101 Binary files /dev/null and b/resources/images/prometheus_use-cases.png differ