Open
Description
Summary
From an internal discussion, we should expand the alerting page to include the following list of recommended metrics:
metric | Description |
---|---|
minio_node_drive_free_bytes |
Total storage available on a drive. |
minio_node_drive_free_inodes |
Total free inodes. |
minio_node_drive_latency_us |
Average last minute latency in µs for drive API storage operations. |
minio_node_drive_offline_total |
Total drives offline in this node. |
minio_node_drive_online_total |
Total drives online in this node. |
minio_node_drive_total |
Total drives in this node. |
minio_node_drive_total_bytes |
Total storage on a drive. |
minio_node_drive_used_bytes |
Total storage used on a drive. |
minio_node_drive_errors_timeout |
Total number of drive timeout errors since server start |
minio_node_drive_errors_availability |
Total number of drive I/O errors, permission denied and timeouts since server start |
minio_node_drive_io_waiting |
Total number I/O operations waiting on drive |
There's a lot of metrics here and the page already has some examples, so I'm thinking we can use a tab setup of something like
| Example Alerts | Recommended Alerts |
To help constrain the default length of the procedure.
Goals
List the in-scope goals
- Add alert examples matching the metrics above
- Possibly tab out or otherwise organize page for readability
Non-Goals
Extensive testing of Prometheus + Alert Manager w/ the above metrics
Additional context
Add any other context or screenshots about the feature request here.