Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Connect gitlab runners to prometheus #1264

Open
glogiotatidis opened this issue Apr 15, 2020 · 7 comments
Open

Connect gitlab runners to prometheus #1264

glogiotatidis opened this issue Apr 15, 2020 · 7 comments

Comments

@glogiotatidis
Copy link
Contributor

glogiotatidis commented Apr 15, 2020

https://docs.gitlab.com/runner/monitoring/

@glogiotatidis
Copy link
Contributor Author

@duallain How can we connect the Runners to Prom?

I understand we need the following:

  • Setup the runners so they expose metrics on port 9252
  • Open a network flow from Prom IP to Runner IPs / port
  • Make Prom discover the Runners

Can you advise on how to automate the the second and third points?

@duallain
Copy link
Contributor

I found this article with default ports, it looks like gitlab-exporter's default is normally 9168. (It's sort of arbitrary from our perspective, but if we want to try to avoid a collision later, maybe we can just ride on their coattails), https://github.com/prometheus/prometheus/wiki/Default-port-allocations

Prom isn't a first class object from the POV of networking, it's 'just' part of the k8s cluster. I think we should make a 'k8s-accessor' security group in each aws region, that we expect to be attached to the clusters (both existing and eks). The gitlab runners could then authorize access with something like: an inbound rule, allow to port 9252 from k8s-accessor sg.

3a.
For automatic discovery, we can use the ec2 sd. The prom pods will need ec2 access to do list/get info for the instances, but that should be no problem (we likely just give the k8s nodes those perms).

The pattern I've used in the past is to attach some special tags to the ec2 instance. Something like prom_scrape:true prom_port:9252 prom_path:/metrics sometimes with numbers in there somewhere to allow an instance to be scraped multiple times (prom1_scrape). It's possible to have arbitrary number of scrape points, but it doesn't really seem worth the effort in my opinion (especially since node-exporter has a text collector, so if you had many things emitting metrics it could be used as an intermediate collector). So instead, we copy a single scrape job and just increment a few (3?) times to allow multiple endpoints on one instance to be scraped.

Example config, using the labels from above as a starting point.

  - job_name: 'node'
    ec2_sd_configs:
      - refresh_interval: 120s
    relabel_configs:
        # Only scrape instances with prom1_scrape tag
      - source_labels: [__meta_ec2_tag_prom1_scrape]
        regex: true
        action: keep
      # not at all tested, but goal is to use port tag + private ip to set what prom will scrape
      - source_labels: [__meta_ec2_private_ip,__meta_ec2_tag_prom1_port]
         regex:  '(.*);(.*)'             # This is the default value.
         target_label: __address__
         replacement: '${1}:${2}'
     # Also not tested, but we're setting the magic __metrics_path_ to the value of the ec2 tag 
     - source_labels: [__meta_ec2_tag_prom1_path]
         regex:  '(.*)'             # This is the default value.
         target_label: __metrics_path__
         replacement: '${1}'

3b.
Then, we need to feed that to the prom deployment. As an example from the prom_sauron deployment: https://github.com/mozmeao/infra-services/blob/master/prom_sauron/helm/helm_configs/server.yml#L4 The tl;dr is, make that config above, add it to a helm file like ^, then wire the yml file to the bash file that deploys prom. Hopefully traceable if you look at server.yml references in prom_sauron).

Full reference:
https://prometheus.io/docs/prometheus/latest/configuration/configuration/#ec2_sd_config
Very simple 'keep' label:
https://www.robustperception.io/automatically-monitoring-ec2-instances
Good example of ec2 magic labels:
https://www.robustperception.io/controlling-the-instance-label

@glogiotatidis
Copy link
Contributor Author

I found this article with default ports, it looks like gitlab-exporter's default is normally 9168. (It's sort of arbitrary from our perspective, but if we want to try to avoid a collision later, maybe we can just ride on their coattails), https://github.com/prometheus/prometheus/wiki/Default-port-allocations

A bit further down this page there's another entry for GitLab Runner exporter which uses port 9252.

@duallain
Copy link
Contributor

I found this article with default ports, it looks like gitlab-exporter's default is normally 9168. (It's sort of arbitrary from our perspective, but if we want to try to avoid a collision later, maybe we can just ride on their coattails), https://github.com/prometheus/prometheus/wiki/Default-port-allocations

A bit further down this page there's another entry for GitLab Runner exporter which uses port 9252.

Ahh, classic multiple things with with the same name tripping me up. Glad you saw the list and used it.

@duallain
Copy link
Contributor

We could consider installing the node-exporter and exposing it as well. Would give us metrics on diskspace and other node related items.

@glogiotatidis
Copy link
Contributor Author

Good point on node-exporter. Will PR

(my other half says to just create a GitLab scheduled job to curl a DeadMansSnitch to make sure that everything works on the runners :)

@glogiotatidis
Copy link
Contributor Author

We could consider installing the node-exporter and exposing it as well. Would give us metrics on diskspace and other node related items.

https://github.com/mozmeao/infra-services/pull/52

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants