Skip to content

Latest commit

 

History

History
252 lines (156 loc) · 11.7 KB

README.md

File metadata and controls

252 lines (156 loc) · 11.7 KB

Qumulo Monitoring Dashboard

This dashboard is a monitoring and alerting solution for Qumulo clusters. This solution uses the Qumulo OpenMetrics API with a Prometheus time-series database and Grafana monitoring software and includes a set of dashboards and alerts that you can customize or use as templates.

For detailed information about available metrics, see Qumulo OpenMetrics API Specification on the Qumulo Documentation Portal.

Table of Contents

Initial Configuration

This section explains the initial configuration of the Qumulo Monitoring Dashboard.

Prerequisites

Before you begin, ensure that you have the following minimum software versions:

  • Git
  • Docker Engine 1.13
  • Docker Compose 1.11
  • Qumulo Core 5.3.0

Step 1: Clone This Repository to Your Docker Host

  1. Log in to your Docker host.

  2. Use the git CLI to clone this repository.

    For more information, see Cloning a repository in the GitHub documentation.

  3. Navigate to the qumulo-monitoring-dashboard directory.

Step 2: Create a Service Account and Access Token on your Qumulo Clusters

For this section, follow Working with Qumulo Access Tokens on the Qumulo Documentation Portal.

  1. Create a service account.

  2. Assign a role with only PRIVILEGE_METRICS_READ to the service account.

  3. Create an access token for the service account.

  4. Save the bearer token temporarily.

    A bearer token is an item in the Authorization HTTP header which acts as the authentication mechanism for the Qumulo REST API.

Step 3: Configure Prometheus

  1. To let Prometheus read metrics from your clusters, update the Prometheus configuration in prometheus.yml.

    ⚠️ Important: Perform the following step for each of your clusters.

  2. Into the scrape_configs section, copy the qumulo-cluster job and fill in the following:

    • job_name: A unique name for labeling the cluster's metrics.

    • In the static_configs block, for targets: A list that contains the cluster's DNS name or IP address.

      ℹ️ Notes:

      • To specify a port, you must append :8000 to the DNS name or IP address.

      • Because the cluster's metrics are labeled with the target variable, use a DNS name rather than an IP address.

      • To allow monitoring to continue to work if a node goes offline, using floating IP addresses rather than DHCP or static IP addresses.

    • In the authorization block, for credentials: The bearer token for the service account.

    • In the tls_config block, for insecure_skip_verify: If the Qumulo cluster uses the default, self-signed SSL certificate, set this value to true.

  3. (Optional) Before you start Prometheus and Grafana, change the default administrator credentials in docker-compose.yml.

    • For ADMIN_USER and ADMIN_PASSWORD, specify credentials that conform to your security policies.

      ⚠️ Important: If you don't specify your credentials, Docker uses the default values.

    • For ADMIN_PASSWORD_HASH, specify the password hash. To generate the hash for your password, install and run the version of caddy that matches the version of the container that the Qumulo Monitoring Dashboard uses (2.6.2).

    ℹ️ Note: Alternatively, you can set your credentials as environment variables:

    • On Linux:

      a. To set the environment variables, use the export command.

      export ADMIN_USER='<username>'
      export ADMIN_PASSWORD='<password>'
      export ADMIN_PASSWORD_HASH='<password-hash>'

      b. To verify the environment variables, use the printenv command.

    • On Windows:

      a. To set the environment variables, use the setx command.

      setx ADMIN_USER "<username>"
      setx ADMIN_PASSWORD <password>
      setx ADMIN_PASSWORD_HASH <password-hash>

      b. To verify the environment variables, use the set command.

Step 4: Start Prometheus and Grafana

To start Prometheus and Grafana on the Docker host, run the following command.

ℹ️ Note: The -d flag runs the container in the background.

docker-compose up -d

Step 5: Verify Your Prometheus Configuration

This section explains how to verify that Prometheus can gather metrics from your Qumulo clusters.

  1. Connect to the Prometheus server at http://<docker-host-ip>:9090.

  2. Log in with the admin username and admin password.

  3. On the top menu bar, select Status > Targets.

  4. On the Targets page, find job name that you defined in the prometheus.yml file and then confirm that that the State is Up.

    If the State isn't Up, check the Error column.

    Common problems include:

    • In the static_configs block, a mistake in targets, in a DNS name or IP address.

    • Inability to connect to the cluster from the machine that runs the containers.

      Test the connection by using the qq CLI from that machine.

    • In the static_configs block, for targets, a missing :8000 port specification.

    • In the tls_config block insecure_skip_verify not set to true when using a self-signed SSL certificate on a Qumulo cluster.

Step 6: Verify Your Grafana Configuration

This section explains how to verify that Grafana can query Prometheus and display metrics:

  1. Connect to the Grafana server at http://<docker-host-ip>:3000.

  2. Log in with the admin username and admin password.

  3. On the Welcome to Grafana page, enter a new, secure password and click Submit.

  4. On the left menu bar, click Dashboards.

  5. On the Dashboards page, click the Qumulo folder and then click Cluster Overview.

  6. On the Qumulo / Cluster Overview Page, next to the cluster label filter, select your cluster from the list.

    Metrics for your cluster begin to populate graphs.

    ℹ️ Note: The Cluster Info and Node Info panels may take up to 24 hours to fully populate after initial setup.

Step 7: Configure Grafana Alert Notifications

This section explains how to configure Grafana alerts to notify you through email, Slack, or an alerting tool.

  1. On the left menu, click Alerting.

  2. On the Alerting page, click Contact Points.

  3. In the Contact Points section, on the right, click + New contact point.

  4. On the New contact point page, do the following.

    a. Enter a Name for the contact point.

    b. Select a Contact point type and fill out the fields that appear depending on the contact point.

    c. To test the contact point click Test.

    ℹ️ Note: The test message might take a few minutes to arrive.

  5. Click Save contact point.

    Grafana begins to use the contact point to deliver alerts.

For SMTP email server configuration, you can add your email server info in the grafana.ini. SMTP configuration can be referenced here Configuring Grafana - SMTP configuration in the Grafana documentation.

Updating Your Qumulo Monitoring Dashboard Configuration

This section explains how to update the configuration of the Qumulo Monitoring Dashboard for your system.

Updating Your Prometheus Configuration

This section explains how to update the Prometheus configuration for your system.

While Prometheus runs, it doesn't apply configuration changes automatically. To reload the configuration, you must do one of the following:

  • To stop and restart the container in which Prometheus runs on your Docker host, run the following commands.

    ℹ️ Note: The -d flag runs the container in the background.

    docker-compose down
    docker-compose up -d
  • To make an HTTP POST call, use the curl command. For example:

    curl -X POST http://admin:[email protected]:9090/-/reload

Updating Your Grafana Configuration

This section explains how to update the Grafana configuration for your system. To update the built-in Grafana alerts, you must modify their configuration files. To create new alerts, use the Grafana web UI.

⚠️ Important: Because the functionality of certain vendors' disks degrades before reaching 0% endurance, by default, the Disk endurance low (low_disk_endurance) alert notifies when 20% endurance remains. For endurance information, check your disks' vendor documentation.

⚠️ Important: Because of dependecies with caddy and container network components, it is not recommended to change the current port settings for accessing Grafana (port 3000). Please make backups of the grafana.ini before making any changes.

For information about working with Grafana dashboards, see Create a dashboard in the Grafana documentation.

For information about working with Grafana configuration using the grafana.ini, see Configuring Grafana in the Grafana documentation.

While Grafana runs, it doesn't apply alert or any other configuration changes automatically. To reload the configuration, you must do one of the following:

  • To stop and restart the container in which Grafana runs on your Docker host, run the following commands.

    ℹ️ Note: The -d flag runs the container in the background.

    docker-compose down
    docker-compose up -d
  • To make an HTTP POST call, use the curl command. For example:

    curl -X POST --user admin:admin http://203.0.113.1:3000/api/admin/provisioning/alerting/reload