The purpose of this doc is to describe the schema for Canary Configurations using Markdown Syntax for Object Notation (MSON).
The canary config object is how users of the Kayenta API describe how they want Kayenta to compare the canary metrics against the baseline metrics for their applications.
id
some-custom-id (string, optional) - You can supply a custom string here. If not supplied, a GUID will be generated for you. The id is used when you call Kayenta to trigger a canary execution, if you do not want to supply the config as part of the request.name
my-app golden signals canary config (string, required) - Name for canary configuration.description
Canary config for my-app (string, required) - Description for the canary configuration.applications
(array[string], required) - A list of applications that the canary is for. You can have a list with single itemad-hoc
as the entry, unless you are storing the configuration in Kayenta and sharing it.judge
(CanaryJudgeConfig, required) - Judge configuration.metrics
(array(CanaryMetricConfig)) - List of metrics to analyze.templates
(map<string, string>, optional) - Templates allow you to compose and parameterize advanced queries against your telemetry provider. Parameterized queries are hydrated by values provided in the canary stage. The project, resourceType, scope, and location variable bindings are implicitly available. For example, you can interpolate project using the following syntax: ${project}.classifier
(CanaryClassifierConfig, required) - The classification configuration, such as group weights.
Currently there is one judge (NetflixACAJudge-v1.0) and this object should be static across the configuration (see the above examples).
name
NetflixACAJudge-v1.0 (string, required) - Judge to use, as of right now there is onlyNetflixACAJudge-v1.0
.judgeConfigurations
{} (object, required) - Map<string, object> of judgement configuration. As of right now, this should always be an empty object.
Describes a metric that will be used in determining the health of a canary.
name
http errors (string, required) - Human readable name of the metric under test.query
(enum[CanaryMetricSetQueryConfig], required) - Query config object for your metric source type.groups
(array[string], required) - List of metrics groups that this metric will belong to.analysisConfigurations
(AnalysisConfiguration, required) - Analysis configuration, describes how to judge a given metric.scopeName
(enum[string], required)default
- only accepted value here
Metric source interface for describing how to query for a given metric or metric source.
- One of
- AtlasCanaryMetricSetQueryConfig
- DatadogCanaryMetricSetQueryConfig
- GraphiteCanaryMetricSetQueryConfig
- InfluxdbCanaryMetricSetQueryConfig
- NewRelicInsightsCanaryMetricSetQueryConfig
- PrometheusCanaryMetricSetQueryConfig
- SignalFxCanaryMetricSetQueryConfig
- StackdriverCanaryMetricSetQueryConfig
- WavefrontCanaryMetricSetQueryConfig
Wrapper object that includes the Canary Analysis Configuration and describes how to judge a given metric.
canary
(CanaryAnalysisConfiguration)
Describes how to judge a metric, see the Netflix Automated Canary Analysis Judge for more information.
direction
(enum[string], required) - Which direction of statistical change triggers the metric to fail.increase
- Use when you want the canary to fail only if it is significantly higher than the baseline (error counts, memory usage, etc, where a decrease is not a failure).decrease
- Use when you want the canary to fail only if it is significantly lower than the baseline (success counts, etc, where a larger number is not a failure).either
- Use when you want the canary to fail if it is significantly higher or lower than the baseline.
nanStrategy
(enum[string], optional) - How to handle NaN values which can occur if the metric does not return data for a particular time interval.remove
(default) - Use when you expect a metric to always have data and you want the NaNs removed from your data set (usage metrics).replace
- Use when you expect a metric to return no data in certain use cases and you want the NaNs replaced with zeros (for example: count metrics, if no errors happened, then metric will return no data for that time interval).
critical
false (boolean, optional) - Use to fail the entire canary if this metric fails (recommended for important metrics that signal service outages or severe problems).muted
false (boolean, optional) - Use to mute metric result in a total score computation.mustHaveData
false (boolean, optional) - Use to fail a metric if data is missing.effectSize
(EffectSize, optional) - Controls how much different the metric needs to be to fail or fail critically.outliers
(Outliers, optional) - Controls how to classify and handle outliers.
Controls the degree of statistical significance the metric needs to fail or fail critically.
Metrics marked as critical can also define criticalIncrease
and criticalDecrease
.
See the Netflix Automated Canary Analysis Judge and Mann Whitney Classifier classes for more information.
measure
meanRatio (enum[string], optional) - Specifies how effect size is measured.cles
- Common Language Effect SizemeanRatio
- Ratio of means
For meanRatio
measure:
allowedIncrease
1.1 (number, optional) - Defaults to 1. The multiplier increase that must be met for the metric to fail. This example makes the metric fail when the metric has increased 10% from the baseline.allowedDecrease
0.90 (number, optional) - Defaults to 1. The multiplier decrease that must be met for the metric to fail. This example makes the metric fail when the metric has decreased 10% from the baseline.criticalIncrease
5.0 (number, optional) - Defaults to 1. The multiplier increase that must be met for the metric to be a critical failure and fail the entire analysis with a score of 0. This example make the canary fail critically if there is a 5x increase.criticalDecrease
0.5 (number, optional) - Defaults to 1. The multiplier decrease that must be met for the metric to be a critical failure and fail the entire analysis with a score of 0. This example make the canary fail critically if there is a 50% decrease.
For cles
measure:
allowedIncrease
(number, optional) - Defaults to 0.5.allowedDecrease
(number, optional) - Defaults to 0.5.criticalIncrease
(number, optional) - Defaults to 0.5.criticalDecrease
(number, optional) - Defaults to 0.5.
The CLES reports the probability that a value from one group will be greater than a value from the other group. A value of 0.50 indicates that the two groups are stochastically equal. A value of 1 indicates that the first group shows complete stochastic domination over the other group, and a value of 0 indicates the complete stochastic domination by the second group.
Controls how to classify and handle outliers. Outliers are determined based on the interquartile range (IQR) which is the middle 50% of the data. Conceptually, the IQR measures where the bulk of the values lie within a distribution
The low outlier boundary is calculated using the equation: Q1 - factor * IQR
.
The high outlier boundary is calculated using the equation: Q3 + factor * IQR
.
Q1
and Q3
are the data values of the lower and upper quartiles respectively.
factor
is outlierFactor
as explained below.
IQR
is Q3 - Q1
.
Given a latency graph where
Q1
is 50 ms andQ3
is 100 ms and thefactor
is 3:IQR
isQ3 - Q1
which is 100 - 50 = 50. The low outlier boundary isQ1 - factor * IQR
which is50 - 3 * 50 = -100
. The high outlier boundary isQ3 + factor * IQR
which is100 + 3 * 50 = 250
. So for a data point to be classified as an outlier in this example, it must be less than -100 ms or greater than 250 ms.
Here is a visual example: The above image is a crop of Boxplot_vs_PDF.svg by Jhguch at en.wikipedia licensed by CC BY-SA 2.5.
See the Netflix Automated Canary Analysis Judge and Interquartile Range Detector classes for more information.
strategy
(enum[string], optional) - Remove or keep outliers.remove
- Use when you want to classify and remove outliers.keep
(default) - Use when you want to keep outliers.
outlierFactor
(number, optional) - Defaults to 3.0. The degree of significance a data point has to differ from other observations to be considered an outlier. If Q1 and Q3 are the lower and upper quartiles respectively then the values which fall belowQ1 - factor * IQR
or aboveQ3 + factor * IQR
are considered outliers.
groupWeights
(enum[string], required)groups
"Latency" : 33 (object, required) - List of each metrics group along with its corresponding weight. Weights must total 100.