Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add node_exporter metrics #1627

Open
wants to merge 19 commits into from
Open

Conversation

avilagaston9
Copy link
Collaborator

@avilagaston9 avilagaston9 commented Dec 16, 2024

Add node_exporter metrics

Description

Adds node_exporter to be deployed in the Aligned servers.

  • Creates a script to install node_exporter.
  • Creates an Ansible playbook to install node_exporter in the servers.
  • Uses the node_exporter_full Grafana dashboard.

How to Test

  1. Install node_exporter:
make install_node_exporter
  1. Run node_exporter:
node_exporter
  1. Go to localhost:9100 and you should see the exposed metrics.
  2. Run metrics:
make run_metrics
  1. Go to localhost:3000 and you should see the new dashboard Node Exporter Full with your metrics displayed.

Note

If you are on macos, some of the metrics will appear with no data.

  1. Ask for a server and test the playbook.

Type of change

  • New feature

Checklist

  • “Hotfix” to testnet, everything else to staging
  • Linked to Github Issue
  • This change depends on code or research by an external entity
    • Acknowledgements were updated to give credit
  • Unit tests added
  • This change requires new documentation.
    • Documentation has been added/updated.
  • This change is an Optimization
    • Benchmarks added/run
  • Has a known issue
  • If your PR changes the Operator compatibility (Ex: Upgrade prover versions)
    • This PR adds compatibility for operator for both versions and do not change batcher/docs/examples
    • This PR updates batcher and docs/examples to the newer version. This requires the operator are already updated to be compatible

@avilagaston9 avilagaston9 self-assigned this Dec 17, 2024
@avilagaston9 avilagaston9 marked this pull request as ready for review December 17, 2024 18:57
@PatStiles
Copy link
Contributor

Ran locally and it worked fine. Did not have access to a server to test the playbook yett.

Copy link
Collaborator

@MarcosNicolau MarcosNicolau left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tested it locally. I only see the node_exporter dashboard for the batcher, is that alright? Shouldn't I also see the aggregator dashboard?

Copy link
Contributor

@uri-99 uri-99 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks fine but this PR needs approves from DevOps

Copy link
Collaborator

@JulianVentura JulianVentura left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Run it locally and everything seems to work fine, but mac os doesn't show many graphs. I left it running for a few minutes and if I chose the 24h time frame on Grafana some new graphs appears.
Anyways, I think this PR should really be tested on a server so all graphs are shown and we test the ansible script.

@avilagaston9
Copy link
Collaborator Author

Tested it locally. I only see the node_exporter dashboard for the batcher, is that alright? Shouldn't I also see the aggregator dashboard?
@MarcosNicolau

If you tested it locally, then the metrics displayed were from your computer, not specifically from the batcher. Did it say 'Batcher' anywhere?

- name: Node Exporter Setup
hosts: "{{ host }}"

tasks:
Copy link
Collaborator

@samoht9277 samoht9277 Dec 20, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This Ansible task assumes the repo is already cloned, which might not be the case.

Also, there are some sender servers, like aligned-holesky-stage-1-sender, which @JuArce told me to test this PR on. Why are we running this ansible inside the Aggergator and Batcher? Is this whole Node Exporter thing an addon to this components, or should it be a separate component with it's own server?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I made the playbook to be used after the repo was cloned intentionally, so we could include it within the playbooks of the Aggregator and Batcher. But I made a mistake when adding the playbook to the others, as the repository hadn't been cloned at that step yet 🤣. I'll fix that in a moment.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed in #7923c1a!

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unless this file is for deploying prometheus locally, we should also add this scrape target to the actual telemetry servers, right?

In my PR adding Ansible to the whole Telemetry, I have a config file I copy to prometheus.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we need to add the two new jobs to that file as well

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you found this script online, we should credit the original author.

Please add a comment below the shebang (#!/bin/bash) with a link to the original version.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I actually use install_aligned.sh as a reference.

@JuArce JuArce enabled auto-merge January 16, 2025 21:55
@JuArce JuArce added this pull request to the merge queue Jan 16, 2025
Any commits made after this event will not be merged.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants