Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Configure Renovate #87

Open
wants to merge 108 commits into
base: release-3.6
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
108 commits
Select commit Hold shift + click to select a range
438049b
Update 21_ansible_inventory.md
gerald-eggenberger Jun 5, 2019
9779058
changed user[x] for OpenShift console URL
gerald-eggenberger Jun 5, 2019
1ae4c18
specified where to find the inventory-file and grep for password
gerald-eggenberger Jun 5, 2019
cc48b3e
new textorder for console access, password
gerald-eggenberger Jun 5, 2019
076de8d
changed from bastion to jump-host
gerald-eggenberger Jun 5, 2019
b8546e0
new picture with jumphost
gerald-eggenberger Jun 5, 2019
92f7d13
changed user add cowboy
gerald-eggenberger Jun 5, 2019
7063e2b
changed password-grep from jumphost to master0
gerald-eggenberger Jun 5, 2019
fd2e456
since 3.11 policies are replaced by rbac. changed oc describe command…
gerald-eggenberger Jun 5, 2019
91c3582
added query for cluster-admin users, which is needed for patching.
gerald-eggenberger Jun 5, 2019
802dc67
correct query with Rolebindings
gerald-eggenberger Jun 5, 2019
7b3199c
changed node1 to infra-node1
gerald-eggenberger Jun 5, 2019
e45dcab
changed output of oc get nodes infra-node1
gerald-eggenberger Jun 5, 2019
faa06cb
changed output of running pods on infra-node1
gerald-eggenberger Jun 5, 2019
6da58f8
remove install of jq package, because we deploy it now as prereq.
gerald-eggenberger Jun 6, 2019
2640dbf
Update 32_update_hosts.md
gerald-eggenberger Jun 6, 2019
e81146e
Merge branch 'release-3.6' into release-3.11
gerald-eggenberger Jun 6, 2019
bb7c572
Merge branch 'release-3.11' of github.com:gerald-eggenberger/ops-tech…
gerald-eggenberger Jun 6, 2019
86bbfdb
Merge branch 'release-3.6' into release-3.11
Jun 6, 2019
e80ef13
Merge branch 'release-3.6' into release-3.11
Jun 6, 2019
47ac3b4
Fix changes for release-3.11
Jun 6, 2019
b6d3878
Merge pull request #88 from gerald-eggenberger/release-3.11
mebagel Jun 6, 2019
c89da3d
easy-mode.yaml has been moved to subdirectory openshift-checks
gerald-eggenberger Jun 6, 2019
a83564e
location for reports changed from /tmp to $HOME
gerald-eggenberger Jun 6, 2019
7a93175
location changed for redeploy-ca.yml
gerald-eggenberger Jun 6, 2019
0a394b9
changed path of redeploy-certificates.yml and verification output
gerald-eggenberger Jun 6, 2019
fc45500
Updated some different playbook locations and added warning Bug 1635251
gerald-eggenberger Jun 6, 2019
03e06ab
changed location for router and registry deployment-playbook
gerald-eggenberger Jun 6, 2019
104e20b
fixed location for playbooks/openshift-node/scaleup.yml
gerald-eggenberger Jun 7, 2019
de633e4
Merge pull request #89 from gerald-eggenberger/release-3.11
mebagel Jun 11, 2019
ac10365
Check the etcd health status with etcdctl2 command
gerald-eggenberger Jun 11, 2019
c0b7035
fixed new_node entry for app-node1
gerald-eggenberger Jun 11, 2019
559ba7a
fixed app_nodes to app-node
gerald-eggenberger Jun 11, 2019
1904cb9
changed path for master scaleup.yml
gerald-eggenberger Jun 13, 2019
6b71022
changed output of oc get nodes
gerald-eggenberger Jun 13, 2019
30076a2
Update lab to OSE311
Jun 13, 2019
7d8653c
udate output oc get nodes
gerald-eggenberger Jun 13, 2019
ce43b0a
Update logs lab to OSE311
Jun 13, 2019
b9750cc
Fix markup prefix
Jun 13, 2019
3200cf2
added manual steps for nodes certificate replacement
gerald-eggenberger Jun 13, 2019
0696fbf
Update etcd appendix to OSE311
Jun 13, 2019
5ea8cba
Fix appendices numbering
Jun 13, 2019
1b837ba
Adapt overview to ose311 installation
Jun 13, 2019
a8e8415
added console-prefix to adhoc-statements
gerald-eggenberger Jun 13, 2019
de57db4
Update arch-overview to ose311 setup
Jun 13, 2019
f8d4701
changed path of bootstrap.kubeconfig
gerald-eggenberger Jun 13, 2019
c4434fa
changed path of bootstrap.kubeconfig ins src-path
gerald-eggenberger Jun 13, 2019
51252b1
removed sudo for config deployment
gerald-eggenberger Jun 13, 2019
6c3f7a0
removed sudo for ansible statements
gerald-eggenberger Jun 13, 2019
7c80c46
fixed missing } in ansible statement
gerald-eggenberger Jun 13, 2019
6e5c775
splitted certificates deployment in app/infra and master-nodes
gerald-eggenberger Jun 13, 2019
82dc7c0
fixed missing code-block on task 11.
gerald-eggenberger Jun 13, 2019
6b0724f
Remove etcd scaleup appendix
Jun 13, 2019
4eb56fa
renamed some md-files and created new index in 70_upgrade.md
gerald-eggenberger Jun 13, 2019
fdec169
fixed href in 70_upgrade.md
gerald-eggenberger Jun 13, 2019
e0a9bf5
Fix appendix order
Jun 13, 2019
4eb7a1f
added new md-file 71_upgrade_openshift3.11.98.md
gerald-eggenberger Jun 13, 2019
052c0e0
initial version of upgrade-procedure 3.11.98
gerald-eggenberger Jun 13, 2019
5bba900
Update chapter four to ose311 setup
Jun 13, 2019
3bf9f4a
Add project backup
Jun 13, 2019
f8096c7
Use first debug pod to restore
Jun 14, 2019
4555d2b
described the upgrade procedure
gerald-eggenberger Jun 14, 2019
06f0da5
fixed typo
gerald-eggenberger Jun 14, 2019
d2eb707
fixed .yaml to yml
gerald-eggenberger Jun 14, 2019
5c69ffc
changed value from system to services to prevent system-reboot on hos…
gerald-eggenberger Jun 14, 2019
35a7294
changed enabled to disabled according openshift_rolling_restart_mode …
gerald-eggenberger Jun 14, 2019
6afafc9
changed node-update to one by one
gerald-eggenberger Jun 14, 2019
425a994
formatting issue 4.4
gerald-eggenberger Jun 14, 2019
6e6d8ac
changed doku from 3.11.98 to 3.11.104
gerald-eggenberger Jun 14, 2019
44a131c
fixed upgrade-statement for infra-node1.user
gerald-eggenberger Jun 14, 2019
dbc40f9
changed some formatting and removed old instructions for 3.7
gerald-eggenberger Jun 14, 2019
0866a92
fixed formatting by inserting newline
gerald-eggenberger Jun 14, 2019
1f7e0a7
Add link to etcd documentation
Jun 14, 2019
a26476c
changed instructions for EFK logging Stack
gerald-eggenberger Jun 14, 2019
edf9076
added note for openshift_logging_install_logging parameter
gerald-eggenberger Jun 14, 2019
2b7c85d
changed notes to note:
gerald-eggenberger Jun 14, 2019
bc138ca
changed upgrade instructions for metrics upgrade
gerald-eggenberger Jun 14, 2019
d545646
removed chapter 9
gerald-eggenberger Jun 14, 2019
86466d2
Use snapshot for etcd backup
Jun 17, 2019
9a3ab0a
Merge pull request #90 from gerald-eggenberger/release-3.11
mebagel Jun 17, 2019
399f3dc
Remove upgrade node reboot
Jun 17, 2019
4d235b4
Execute command with sudo
bliemli Jun 18, 2019
9a7594c
Limit query to masters (the quick&dirty way)
bliemli Jun 18, 2019
9cabde1
Fix command
bliemli Jun 18, 2019
e39dfa8
Update expected output when listing pods
bliemli Jun 20, 2019
30bef68
Add lab for prometheus operator
Jun 26, 2019
ddeb76d
Adapt prometheus operator to opstechlab
Jun 26, 2019
4c7aa6f
Fix markup
Jun 26, 2019
48dcd78
Remove AWS Storage
Jun 26, 2019
7828392
Add explanation to GUI
Jun 27, 2019
1d0375d
Add curl example to access API
Jun 27, 2019
45f1d7a
Release 3.11 backup (#91)
gerald-eggenberger Jun 27, 2019
5290990
Fix list for gui navigation
Jun 27, 2019
7eae593
Merge branch 'release-3.11' of https://github.com/appuio/ops-techlab …
Jun 27, 2019
5e3ca3e
Add note for single es setup
Jun 27, 2019
a18333c
Fix next chapter label
Jun 30, 2019
1357bad
Add missing worker nodes
Jun 30, 2019
dd588b1
Fix client url
Jun 30, 2019
55082da
Add oc client distributor
Jul 1, 2019
c355058
Release 3.11 aws (#92)
gerald-eggenberger Jul 1, 2019
f01ff3b
Fix quotes on install oc client
Jul 1, 2019
17fe537
Add get groups
Jul 1, 2019
3e2f25b
Removed wrong character in command line (#93)
tobiasdenzler Jul 1, 2019
6f0fe4c
Release 3.11 aws (#94)
gerald-eggenberger Jul 1, 2019
d50fcfb
Comment new_masters while node scaleup
Jul 1, 2019
76f4e39
Fix logging playbook
Jul 2, 2019
1274390
Fix typo
Aug 8, 2019
77bb734
Add renovate.json
renovate[bot] Dec 15, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 3 additions & 4 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -28,10 +28,9 @@ There's a [Troubleshooting Cheat Sheet](resources/troubleshooting_cheat_sheet.md

## Appendices

1. [etcd Scaleup](appendices/01_etcd_scaleup.md)
2. [Monitoring with Prometheus](appendices/02_prometheus.md)
3. [Useful Internet Resources](appendices/03_internet_resources.md)

1. [Monitoring with Prometheus](appendices/01_prometheus.md)
2. [Useful Internet Resources](appendices/02_internet_resources.md)
3. [Using AWS EFS Storage](appendices/03_aws_storage.md)

## License

Expand Down
48 changes: 0 additions & 48 deletions appendices/01_etcd_scaleup.md

This file was deleted.

225 changes: 225 additions & 0 deletions appendices/01_prometheus.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,225 @@
# Prometheus
Source: https://github.com/prometheus/prometheus

Visit [prometheus.io](https://prometheus.io) for the full documentation,
examples and guides.

Prometheus, a [Cloud Native Computing Foundation](https://cncf.io/) project, is a systems and service monitoring system. It collects metrics
from configured targets at given intervals, evaluates rule expressions,
displays the results, and can trigger alerts if some condition is observed
to be true.

Prometheus' main distinguishing features as compared to other monitoring systems are:

- a **multi-dimensional** data model (timeseries defined by metric name and set of key/value dimensions)
- a **flexible query language** to leverage this dimensionality
- no dependency on distributed storage; **single server nodes are autonomous**
- timeseries collection happens via a **pull model** over HTTP
- **pushing timeseries** is supported via an intermediary gateway
- targets are discovered via **service discovery** or **static configuration**
- multiple modes of **graphing and dashboarding support**
- support for hierarchical and horizontal **federation**

## Prometheus overview
The following diagram shows the general architectural overview of Prometheus:

![Prometheus Architecture](../resources/images/prometheus_architecture.png)

## Monitoring use cases
Starting with OpenShift 3.11, Prometheus is installed by default to **monitor the OpenShift cluster** (depicted in the diagram below on the left side: *Kubernetes Prometheus deployment*). This installation is managed by the "Cluster Monitoring Operator" and not intended to be customized (we will do it anyway).

To **monitor applications** or **define custom Prometheus configurations**, the Tech Preview feature [Operator Lifecycle Manager (OLM)](https://docs.openshift.com/container-platform/3.11/install_config/installing-operator-framework.html]) can be used to install the Prometheus Operator which in turn allows to define Prometheus instances (depicted in the diagram below on the right side: *Service Prometheus deployment*). These instances are fully customizable with the use of *Custom Ressource Definitions (CRD)*.

![Prometheus Overview](../resources/images/prometheus_use-cases.png)

(source: https://sysdig.com/blog/kubernetes-monitoring-prometheus-operator-part3/)

# Cluster Monitoring Operator

![Cluster Monitoring Operator components](../resources/images/prometheus_cmo.png)
<https://github.com/openshift/cluster-monitoring-operator/tree/release-3.11>

## Installation

<https://docs.openshift.com/container-platform/3.11/install_config/prometheus_cluster_monitoring.html>

From OpenShift 3.11 onwards, the CMO is installed per default. To customize the installation you can set the following variables in inventory (small cluster)

```ini
openshift_cluster_monitoring_operator_install=true # default value
openshift_cluster_monitoring_operator_prometheus_storage_enabled=true
openshift_cluster_monitoring_operator_prometheus_storage_capacity=50Gi
openshift_cluster_monitoring_operator_prometheus_storage_class_name=[tbd]
openshift_cluster_monitoring_operator_alertmanager_storage_enabled=true
openshift_cluster_monitoring_operator_alertmanager_storage_capacity=2Gi
openshift_cluster_monitoring_operator_alertmanager_storage_class_name=[tbd]
openshift_cluster_monitoring_operator_alertmanager_config=[tbd]
```

Run the installer

```
[ec2-user@master0 ~]$ ansible-playbook /usr/share/ansible/openshift-ansible/playbooks/openshift-monitoring/config.yml
```

### Access Prometheus

You can login with the cluster administrator `sheriff` on:
https://prometheus-k8s-openshift-monitoring.app[X].lab.openshift.ch/

- Additional targets: `Status` -> `Targets`
- Scrape configuration: `Status` -> `Configuration`
- Defined rules: `Status` -> `Rules`
- Service Discovery: `Status` -> `Service Discovery`


### Configure Prometheus
Let Prometheus scrape service labels in different namespaces

```
[ec2-user@master0 ~]$ oc adm policy add-cluster-role-to-user cluster-reader -z prometheus-k8s -n openshift-monitoring
```

To modify the Prometheus configuration - e.g. retention time, change the ConfigMap `cluster-monitoring-config` as described here:
<https://github.com/openshift/cluster-monitoring-operator/blob/release-3.11/Documentation/user-guides/configuring-cluster-monitoring.md>

```
[ec2-user@master0 ~]$ oc edit cm cluster-monitoring-config -n openshift-monitoring
```

Unfortunately, changing the default scrape config is not supported with the Cluster Monitoring Operator.

#### etcd monitoring

To add etcd monitoring, follow this guide:
<https://docs.openshift.com/container-platform/3.11/install_config/prometheus_cluster_monitoring.html#configuring-etcd-monitoring>

## Additional services: CRD type ServiceMonitor (unsupported by Red Hat)

Creating additional ServiceMonitor objects is not supported by Red Hat. See [Supported Configuration](https://docs.openshift.com/container-platform/3.11/install_config/prometheus_cluster_monitoring.html#supported-configuration) for details.

We will do it anyway :sunglasses:.

In order for the custom services to be added to the managed Prometheus instance, the label `k8s-app` needs to be present in the "ServiceMonitor" *Custom Ressource (CR)*

See example for *Service Monitor* `router-metrics`:

```yaml
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
generation: 1
labels:
k8s-app: router-metrics
name: router-metrics
namespace: ""
spec:
endpoints:
- bearerTokenFile: /var/run/secrets/kubernetes.io/serviceaccount/token
honorLabels: true
interval: 30s
port: 1936-tcp
scheme: https
tlsConfig:
caFile: /var/run/secrets/kubernetes.io/serviceaccount/service-ca.crt
insecureSkipVerify: true
namespaceSelector:
matchNames:
- default
selector:
matchLabels:
router: router
```

### Router Monitoring

Create the custom cluster role `router-metrics` and add it to the Prometheus service account `prometheus-k8s`, for Prometheus to be able to read the router metrics.
First you need to check, what labels your routers are using.

```
[ec2-user@master0 ~]$ oc get endpoints -n default --show-labels
NAME ENDPOINTS AGE LABELS
router 172.31.43.147:1936,172.31.47.59:1936,172.31.47.64:1936 + 6 more... 1h router=router
```

Add the `prometheus-k8s` service account to the `router-metrics` cluster role
```
[ec2-user@master0 ~]$ oc adm policy add-cluster-role-to-user router-metrics system:serviceaccount:openshift-monitoring:prometheus-k8s
```

Set the router label as parameter and create the service monitor
```
[ec2-user@master0 ~]$ oc project openshift-monitoring
[ec2-user@master0 ~]$ oc process -f resource/templates/template-router.yaml -p ROUTER_LABEL="router" | oc apply -f -
```

### Logging Monitoring
Just works on clustered ElasticSearch, the OPStechlab runs because of lack of ressources on a single node ES.
The Service `logging-es-prometheus` needs to be labeled and the following RoleBinding applied, for Prometheus to be able to get the metrics.

```
[ec2-user@master0 ~]$ oc label svc logging-es-prometheus -n openshift-logging scrape=prometheus
[ec2-user@master0 ~]$ oc create -f resource/templates/template-rolebinding.yaml -n openshift-logging
[ec2-user@master0 ~]$ oc process -f resource/templates/template-logging.yaml | oc apply -f -
```

## Additional rules: CRD type PrometheusRule

Take a look at the additional ruleset, that we suggest to use monitoring OpenShift.
```
[ec2-user@master0 ~]$ less resource/templates/template-k8s-custom-rules.yaml
```

Add the custom rules from the template folder to Prometheus:

```
[ec2-user@master0 ~]$ oc process -f resource/templates/template-k8s-custom-rules.yaml -p SEVERITY_LABEL="critical" | oc apply -f -
```

## AlertManager

Configuring Alertmanager with the Red Hat Ansible playbooks.
<https://docs.openshift.com/container-platform/3.11/install_config/prometheus_cluster_monitoring.html#configuring-alertmanager>

By hand

```
[ec2-user@master0 ~]$ oc delete secret alertmanager-main
[ec2-user@master0 ~]$ oc create secret generic alertmanager-main --from-file=resource/templates/alertmanager.yaml
```

Follow these guides:
<https://github.com/openshift/cluster-monitoring-operator/blob/release-3.11/Documentation/user-guides/configuring-prometheus-alertmanager.md>

Check if the new configuration is in place: https://alertmanager-main-openshift-monitoring.app[X].lab.openshift.ch/#/status

## Additional configuration

### Add view role for developers

Let non OpenShift admins access Prometheus:
```
[ec2-user@master0 ~]$ oc adm policy add-cluster-role-to-user cluster-monitoring-view [user]
```

### Add metrics reader service account to access Prometheus metrics

You can create a service account to access Prometheus through the API
```
[ec2-user@master0 ~]$ oc create sa prometheus-metrics-reader -n openshift-monitoring
[ec2-user@master0 ~]$ oc adm policy add-cluster-role-to-user cluster-monitoring-view -z prometheus-metrics-reader -n openshift-monitoring
```

Access the API with a simple `curl`
```
[ec2-user@master0 ~]$ export TOKEN=`oc sa get-token prometheus-metrics-reader -n openshift-monitoring`
[ec2-user@master0 ~]$ curl https://prometheus-k8s-openshift-monitoring.app[X].lab.openshift.ch/api/v1/query?query=ALERTS -H "Authorization: Bearer $TOKEN"
```

### Allow Prometheus to scrape your metrics endpoints (if using ovs-networkpolicy plugin)

Create an additional network-policy.

```
[ec2-user@master0 ~]$ oc create -f resource/templates/networkpolicy.yaml -n [namespace]
```
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# Appendix 3: Useful Internet Resources
# Appendix 2: Useful Internet Resources

This appendix is a small collection of rather useful online resources containing scripts and documentation as well as Ansible roles and playbooks and more.

Expand All @@ -7,7 +7,7 @@ This appendix is a small collection of rather useful online resources containing
- Red Hat Communities of Practice: https://github.com/redhat-cop
- Red Hat Consulting DevOps and OpenShift Playbooks: http://v1.uncontained.io/
- APPUiO OpenShift resources: https://github.com/appuio/

- Knowledge Base: https://kb.novaordis.com/index.php/OpenShift

---

Expand Down
Loading