Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Staging - [Alerting] High Number of Machines With Low Disk Space in Some Queue(s) #4697

Closed
dotnet-eng-status-staging bot opened this issue Dec 21, 2024 · 8 comments
Labels
Grafana Alert Issues opened by Grafana Inactive Alert Issues from Grafana alerts that are now "OK" Ops - First Responder Staging Tied to the Staging environment (as opposed to Production)

Comments

@dotnet-eng-status-staging
Copy link

💔 Metric state changed to alerting

At least one queue has 20% or more of its agents reporting low disk space.

  • Percentage Low Disk {Queue=pr-azurelinux.3.amd64-dougbu-remonitor.azure} 1
  • Percentage Low Disk {Queue=pr-azurelinux.3.amd64.open-dougbu-remonitor.azure} 1
  • Percentage Low Disk {Queue=pr-azurelinux.3.amd64.open.rt-dougbu-remonitor.azure} 1
  • Percentage Low Disk {Queue=pr-azurelinux.3.amd64.open.svc-dougbu-remonitor.azure} 1
  • Percentage Low Disk {Queue=pr-azurelinux.3.amd64.rt-dougbu-remonitor.azure} 1
  • Percentage Low Disk {Queue=pr-azurelinux.3.amd64.svc-dougbu-remonitor.azure} 1

Go to rule

@dotnet/dnceng, @dotnet/prodconsvcs, please investigate

Automation information below, do not change

Grafana-Automated-Alert-Id-2ca5b0285c1e4179b621f916b8b5e75f

@dotnet-eng-status-staging dotnet-eng-status-staging bot added Active Alert Issues from Grafana alerts that are now active Grafana Alert Issues opened by Grafana Ops - First Responder Staging Tied to the Staging environment (as opposed to Production) and removed Active Alert Issues from Grafana alerts that are now active labels Dec 21, 2024
Copy link
Author

💚 Metric state changed to ok

At least one queue has 20% or more of its agents reporting low disk space.

Go to rule

@dotnet-eng-status-staging dotnet-eng-status-staging bot added Inactive Alert Issues from Grafana alerts that are now "OK" Active Alert Issues from Grafana alerts that are now active and removed Inactive Alert Issues from Grafana alerts that are now "OK" labels Dec 21, 2024
Copy link
Author

💔 Metric state changed to alerting

At least one queue has 20% or more of its agents reporting low disk space.

  • Percentage Low Disk {Queue=pr-azurelinux.3.amd64.rt-dougbu-remonitor.azure} 1

Go to rule

@dotnet-eng-status-staging dotnet-eng-status-staging bot removed the Active Alert Issues from Grafana alerts that are now active label Dec 21, 2024
Copy link
Author

💚 Metric state changed to ok

At least one queue has 20% or more of its agents reporting low disk space.

Go to rule

@dotnet-eng-status-staging dotnet-eng-status-staging bot added Inactive Alert Issues from Grafana alerts that are now "OK" and removed Inactive Alert Issues from Grafana alerts that are now "OK" labels Dec 21, 2024
Copy link
Author

💔 Metric state changed to alerting

At least one queue has 20% or more of its agents reporting low disk space.

  • Percentage Low Disk {Queue=pr-azurelinux.3.amd64-dougbu-remonitor.azure} 1
  • Percentage Low Disk {Queue=pr-azurelinux.3.amd64.open-dougbu-remonitor.azure} 1
  • Percentage Low Disk {Queue=pr-azurelinux.3.amd64.open.rt-dougbu-remonitor.azure} 1
  • Percentage Low Disk {Queue=pr-azurelinux.3.amd64.open.svc-dougbu-remonitor.azure} 1
  • Percentage Low Disk {Queue=pr-azurelinux.3.amd64.svc-dougbu-remonitor.azure} 1

Go to rule

@dotnet-eng-status-staging dotnet-eng-status-staging bot added Active Alert Issues from Grafana alerts that are now active Inactive Alert Issues from Grafana alerts that are now "OK" and removed Active Alert Issues from Grafana alerts that are now active labels Dec 22, 2024
Copy link
Author

💚 Metric state changed to ok

At least one queue has 20% or more of its agents reporting low disk space.

Go to rule

@dotnet-eng-status-staging dotnet-eng-status-staging bot added Active Alert Issues from Grafana alerts that are now active and removed Inactive Alert Issues from Grafana alerts that are now "OK" labels Dec 22, 2024
Copy link
Author

💔 Metric state changed to alerting

At least one queue has 20% or more of its agents reporting low disk space.

  • Percentage Low Disk {Queue=pr-azurelinux.3.amd64-dougbu-remonitor.azure} 1

Go to rule

@dotnet-eng-status-staging dotnet-eng-status-staging bot removed the Active Alert Issues from Grafana alerts that are now active label Dec 25, 2024
Copy link
Author

💚 Metric state changed to ok

At least one queue has 20% or more of its agents reporting low disk space.

Go to rule

@dotnet-eng-status-staging dotnet-eng-status-staging bot added the Inactive Alert Issues from Grafana alerts that are now "OK" label Dec 25, 2024
@ilyas1974
Copy link
Contributor

System is currently healthy. Closing alert.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Grafana Alert Issues opened by Grafana Inactive Alert Issues from Grafana alerts that are now "OK" Ops - First Responder Staging Tied to the Staging environment (as opposed to Production)
Projects
None yet
Development

No branches or pull requests

1 participant