Connect gitlab runners to prometheus #1264

glogiotatidis · 2020-04-15T09:25:00Z

https://docs.gitlab.com/runner/monitoring/

glogiotatidis · 2020-05-11T08:56:11Z

@duallain How can we connect the Runners to Prom?

I understand we need the following:

Setup the runners so they expose metrics on port 9252
Open a network flow from Prom IP to Runner IPs / port
Make Prom discover the Runners

Can you advise on how to automate the the second and third points?

duallain · 2020-05-11T17:21:00Z

I found this article with default ports, it looks like gitlab-exporter's default is normally 9168. (It's sort of arbitrary from our perspective, but if we want to try to avoid a collision later, maybe we can just ride on their coattails), https://github.com/prometheus/prometheus/wiki/Default-port-allocations

Prom isn't a first class object from the POV of networking, it's 'just' part of the k8s cluster. I think we should make a 'k8s-accessor' security group in each aws region, that we expect to be attached to the clusters (both existing and eks). The gitlab runners could then authorize access with something like: an inbound rule, allow to port 9252 from k8s-accessor sg.

3a.
For automatic discovery, we can use the ec2 sd. The prom pods will need ec2 access to do list/get info for the instances, but that should be no problem (we likely just give the k8s nodes those perms).

The pattern I've used in the past is to attach some special tags to the ec2 instance. Something like prom_scrape:true prom_port:9252 prom_path:/metrics sometimes with numbers in there somewhere to allow an instance to be scraped multiple times (prom1_scrape). It's possible to have arbitrary number of scrape points, but it doesn't really seem worth the effort in my opinion (especially since node-exporter has a text collector, so if you had many things emitting metrics it could be used as an intermediate collector). So instead, we copy a single scrape job and just increment a few (3?) times to allow multiple endpoints on one instance to be scraped.

Example config, using the labels from above as a starting point.

  - job_name: 'node'
    ec2_sd_configs:
      - refresh_interval: 120s
    relabel_configs:
        # Only scrape instances with prom1_scrape tag
      - source_labels: [__meta_ec2_tag_prom1_scrape]
        regex: true
        action: keep
      # not at all tested, but goal is to use port tag + private ip to set what prom will scrape
      - source_labels: [__meta_ec2_private_ip,__meta_ec2_tag_prom1_port]
         regex:  '(.*);(.*)'             # This is the default value.
         target_label: __address__
         replacement: '${1}:${2}'
     # Also not tested, but we're setting the magic __metrics_path_ to the value of the ec2 tag 
     - source_labels: [__meta_ec2_tag_prom1_path]
         regex:  '(.*)'             # This is the default value.
         target_label: __metrics_path__
         replacement: '${1}'

3b.
Then, we need to feed that to the prom deployment. As an example from the prom_sauron deployment: https://github.com/mozmeao/infra-services/blob/master/prom_sauron/helm/helm_configs/server.yml#L4 The tl;dr is, make that config above, add it to a helm file like ^, then wire the yml file to the bash file that deploys prom. Hopefully traceable if you look at server.yml references in prom_sauron).

Full reference:
https://prometheus.io/docs/prometheus/latest/configuration/configuration/#ec2_sd_config
Very simple 'keep' label:
https://www.robustperception.io/automatically-monitoring-ec2-instances
Good example of ec2 magic labels:
https://www.robustperception.io/controlling-the-instance-label

glogiotatidis · 2020-05-12T12:23:09Z

I found this article with default ports, it looks like gitlab-exporter's default is normally 9168. (It's sort of arbitrary from our perspective, but if we want to try to avoid a collision later, maybe we can just ride on their coattails), https://github.com/prometheus/prometheus/wiki/Default-port-allocations

A bit further down this page there's another entry for GitLab Runner exporter which uses port 9252.

duallain · 2020-05-12T15:18:58Z

I found this article with default ports, it looks like gitlab-exporter's default is normally 9168. (It's sort of arbitrary from our perspective, but if we want to try to avoid a collision later, maybe we can just ride on their coattails), https://github.com/prometheus/prometheus/wiki/Default-port-allocations

A bit further down this page there's another entry for GitLab Runner exporter which uses port 9252.

Ahh, classic multiple things with with the same name tripping me up. Glad you saw the list and used it.

duallain · 2020-05-13T19:47:20Z

We could consider installing the node-exporter and exposing it as well. Would give us metrics on diskspace and other node related items.

glogiotatidis · 2020-05-14T11:12:43Z

Good point on node-exporter. Will PR

(my other half says to just create a GitLab scheduled job to curl a DeadMansSnitch to make sure that everything works on the runners :)

glogiotatidis · 2020-05-14T13:01:16Z

We could consider installing the node-exporter and exposing it as well. Would give us metrics on diskspace and other node related items.

https://github.com/mozmeao/infra-services/pull/52

glogiotatidis added this to the Artemis: Improve Gitlab Runners milestone Apr 15, 2020

glogiotatidis mentioned this issue May 11, 2020

Write Alerts for Gitlab runners #1291

Open

bensternthal added the Epic: Artemis Gitlab Runners label Sep 11, 2020

bensternthal removed this from the Artemis: Improve Gitlab Runners milestone Sep 11, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Connect gitlab runners to prometheus #1264

Connect gitlab runners to prometheus #1264

glogiotatidis commented Apr 15, 2020 •

edited

Loading

glogiotatidis commented May 11, 2020

duallain commented May 11, 2020

glogiotatidis commented May 12, 2020

duallain commented May 12, 2020

duallain commented May 13, 2020

glogiotatidis commented May 14, 2020

glogiotatidis commented May 14, 2020

Connect gitlab runners to prometheus #1264

Connect gitlab runners to prometheus #1264

Comments

glogiotatidis commented Apr 15, 2020 • edited Loading

glogiotatidis commented May 11, 2020

duallain commented May 11, 2020

glogiotatidis commented May 12, 2020

duallain commented May 12, 2020

duallain commented May 13, 2020

glogiotatidis commented May 14, 2020

glogiotatidis commented May 14, 2020

glogiotatidis commented Apr 15, 2020 •

edited

Loading