Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Temporal Cloud: Updated the dashboards, alerts and config files. #2600

Closed
wants to merge 2 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions alert-policies/temporal-cloud/FailedWorkflows.yml
Original file line number Diff line number Diff line change
Expand Up @@ -3,12 +3,12 @@ name: Failed Workflows

# Description and details
description: |+
This alert is triggered if the Temporal cloud workflows fail once within a 5-minute window.
This alert is triggered if the Temporal Cloud workflows fail once within a 5-minute window.
type: STATIC

# NRQL query
nrql:
query: "FROM temporalCloudWorkflowFailed SELECT latest(`data.result-value1`) FACET `data.result-metric-__name__`"
query: "FROM temporalCloudWorkflowFailed SELECT max(`data.result-value1`)AS 'Failed Workflows' FACET `data.result-metric-operation` AS operation, `data.result-metric-temporal_namespace` AS namespace"

# Function used to aggregate the NRQL query value(s) for comparison to the terms.threshold (Default: SINGLE_VALUE)
valueFunction: SINGLE_VALUE
Expand Down
30 changes: 30 additions & 0 deletions alert-policies/temporal-cloud/RPSLimitErrors.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,30 @@
# Name of the alert
name: Resource Exhausted Errors

# Description and details
description: |+
This alert is triggered if the Temporal Cloud resource exhausted errors exceeds 1 for 5 minutes.
# Type of alert
type: STATIC

# NRQL query
nrql:
query: "SELECT max(`data.result-value1`) AS resource_exhausted_errors FROM temporalCloudResourceExhaustedErrors FACET `data.result-metric-operation` AS operation, `data.result-metric-temporal_namespace` AS namespace"

# Function used to aggregate the NRQL query value(s) for comparison to the terms.threshold (Default: SINGLE_VALUE)
valueFunction: SINGLE_VALUE

# List of Critical and Warning thresholds for the condition
terms:
- priority: CRITICAL
# Operator used to compare against the threshold.
operator: ABOVE
# Value that triggers a violation
threshold: 1
# Time in seconds; 120 - 3600
thresholdDuration: 300
# How many data points must be in violation for the duration
thresholdOccurrences: ALL
# Duration after which a violation automatically closes
# Time in seconds; 300 - 2592000 (Default: 86400 [1 day])
violationTimeLimitSeconds: 86400
8 changes: 4 additions & 4 deletions alert-policies/temporal-cloud/ServiceLatency.yml
Original file line number Diff line number Diff line change
@@ -1,15 +1,15 @@
# Name of the alert
name: Service Latency
name: Start Workflow Execution Latency

# Description and details
description: |+
This alert is triggered if the Temporal cloud service latency exceeds 5 seconds for 5 minutes.
This alert is triggered if the Temporal Cloud service latency exceeds 5 seconds for 5 minutes.
# Type of alert
type: STATIC

# NRQL query
nrql:
query: "FROM temporalCloudWorkflowFailed SELECT latest(`data.result-value1`) FACET `data.result-metric-__name__`"
query: "SELECT max(`data.result-value1`) FROM temporalCloudStartWorkflowExecutionLatencyP95 FACET `data.result-metric-operation` AS operation, `data.result-metric-temporal_namespace` AS namespace"

# Function used to aggregate the NRQL query value(s) for comparison to the terms.threshold (Default: SINGLE_VALUE)
valueFunction: SINGLE_VALUE
Expand All @@ -20,7 +20,7 @@ terms:
# Operator used to compare against the threshold.
operator: ABOVE
# Value that triggers a violation
threshold: 1
threshold: 5
# Time in seconds; 120 - 3600
thresholdDuration: 300
# How many data points must be in violation for the duration
Expand Down
Binary file modified dashboards/temporal-cloud/temporal-cloud-01.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified dashboards/temporal-cloud/temporal-cloud-02.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading