Skip to content

Additional recommended alerts #1135

Open
@ravindk89

Description

@ravindk89

Summary

From an internal discussion, we should expand the alerting page to include the following list of recommended metrics:

metric Description
minio_node_drive_free_bytes Total storage available on a drive.
minio_node_drive_free_inodes Total free inodes.
minio_node_drive_latency_us Average last minute latency in µs for drive API storage operations.
minio_node_drive_offline_total Total drives offline in this node.
minio_node_drive_online_total Total drives online in this node.
minio_node_drive_total Total drives in this node.
minio_node_drive_total_bytes Total storage on a drive.
minio_node_drive_used_bytes Total storage used on a drive.
minio_node_drive_errors_timeout Total number of drive timeout errors since server start
minio_node_drive_errors_availability Total number of drive I/O errors, permission denied and timeouts since server start
minio_node_drive_io_waiting Total number I/O operations waiting on drive

There's a lot of metrics here and the page already has some examples, so I'm thinking we can use a tab setup of something like

| Example Alerts | Recommended Alerts |

To help constrain the default length of the procedure.

Goals

List the in-scope goals

  • Add alert examples matching the metrics above
  • Possibly tab out or otherwise organize page for readability

Non-Goals

Extensive testing of Prometheus + Alert Manager w/ the above metrics

Additional context
Add any other context or screenshots about the feature request here.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions