forked from OpenTSDB/opentsdb.net
-
Notifications
You must be signed in to change notification settings - Fork 0
/
query-execution.html
151 lines (137 loc) · 6.72 KB
/
query-execution.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
<!DOCTYPE html>
<html lang="en">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8"/>
<title>Query Execution - OpenTSDB - A Distributed, Scalable Monitoring System</title>
<link rel="stylesheet" href="css/style.css" type="text/css" media="screen"/>
<!--[if lte IE 8]>
<link rel="stylesheet" href="css/ie.css" type="text/css" media="screen"/>
<![endif]-->
<script type="text/javascript">
var _gaq = _gaq || [];
_gaq.push(['_setAccount', 'UA-18339382-1']);
_gaq.push(['_setDomainName', 'none']);
_gaq.push(['_setAllowLinker', true]);
_gaq.push(['_trackPageview']);
(function() {
var ga = document.createElement('script'); ga.type = 'text/javascript'; ga.async = true;
ga.src = ('https:' == document.location.protocol ? 'https://ssl' : 'http://www') + '.google-analytics.com/ga.js';
var s = document.getElementsByTagName('script')[0]; s.parentNode.insertBefore(ga, s);
})();
</script>
</head>
<body>
<header><div id="headerbar">
<h1><a href="index.html">OpenTSDB</a></h1>
<nav><ul id="navbar">
<li><a href="overview.html">Overview</a></li>
<li><a href="getting-started.html">Getting Started</a></li>
<li><a href="manual.html">Manual</a></li>
<li><a href="faq.html">FAQ</a></li>
</ul></nav>
</div></header>
<!--[if lte IE 8]>
<div class="iesucks">Warning: You're using an unsupported, archaic browser.
Get a better, modern browsing experience with
<a href="http://www.google.com/chrome">Chrome</a> or
<a href="http://www.mozilla.com/firefox">Firefox</a>.</div>
<![endif]-->
<section id="content">
<section id="queryexec">
<h2>OpenTSDB Query Execution</h2>
This page sheds some light on how OpenTSDB retrieves and processes data when
it serves a query. It is useful to understand what the results mean and how
to craft queries that yield something meaningful.
<h3>Life of a Query: Overview</h3>
All queries have:
<ul>
<li>A metric name for which to retrieve data;</li>
<li>A start time;</li>
<li>A stop time (optional, if not set, assumed to be "now");</li>
<li>A possibly empty set of tags to filter the data
(e.g. <code>host=foo</code>, or wildcards such as <code>host=*</code>);</li>
<li>An aggregation function (e.g. <code>sum</code> or <code>avg</code>);</li>
<li>Whether or not to get the "rate of change" the data (in mathematical
terms: the first derivative).</li>
<li>Optionally: a downsampling interval (e.g. 10 minutes) and downsampling
function (e.g. <code>avg</code>)</li>
</ul>
Here's an overview of the steps taken to serve a query:
<ol>
<li>Open a scanner, set with a start key composed of the metric requested and
the start time.</li>
<li>The scanner is configured to stop at a key corresponding to the stop time
and to filter out rows containing data with tags that don't match the tags
we're looking for.</li>
<li>If any multiple choice tags (e.g. <code>host=foo|bar</code>) or wildcard
tags are used (e.g. <code>host=*</code>, which is akin to a <code>GROUP
BY</code> in SQL), sort the rows in groups accordingly.<br/>
For each group, repeated the remaining steps:</li>
<li>Apply the downsample function, if there is one. For instance
<code>10m-avg</code> will collapse each consecutive chunk of 600 second worth
of data down to one data point using the average.</li>
<li>Aggregate the values of the different time series together (for instance
<code>sum</code> will sum up all the time series that wound up being together
in this group &ldash; this requires that you understand how to perform such
operations on time series in a sound fashion, see below).</li>
<li>If the rate of change was requested, compute that using the previous value
returned.</li>
</ol>
The entire process has O(n) complexity, "n" being the number of data points,
and in particular the code tries hard to make only a single-pass.
<h3 id="aggregation">Time Series Aggregation</h3>
This step deserves a bit more explanations. Given two (or more) time series,
what is the meaning of "the sum of the time series"? Or their average? Or
the max or min?
<p>
The max seems fairly intuitive: at each time <i>T</i> of each data point, look
at the value that all other time series have, and pick whichever one has the
highest value. The sum (and other functions) works in a similar way: at each
time <i>T</i> of each data point of the time series we're trying to sum up,
take the values of all the time series at time <i>T</i>, and sum their values.
This is fairly intuitive again, it's akin to "stacking up" each line of the
graph.
<p>
However not all time series may have a data point at time <i>T</i>, because
they don't have to progress in lockstep. This is why
<strong>interpolation</strong> is needed. When a time series doesn't have a
data point at time <i>T</i>, we need to find a sensible value for this time
series at time <i>T</i> that can be aggregated with other time series. The
following graph illustrates this quite well.
<p>
The "sum of the m" is the blue line at the top. It's made of the sum
of the red line (for <code>host=foo</code>) and the green line (for
<code>host=bar</code>):<br/>
<img src="img/with-lerp.png" alt="Aggregation by sum with interpolation" />
<br/>
It seems intuitive from the image above that if you "stack up" the red line
and the green line, you'd get the blue line. At any discrete point in time,
the blue line has a value that is equal to the sum of the value of the red
line at that time, and the value of the green line at that time. Without
interpolation, you get something rather unintuitive that is harder to make
sense of, and which is also a lot less meaningful and useful:<br/>
<img src="img/without-lerp.png" alt="Aggregation by sum with interpolation" />
<br/>
No need to be a mathematician or to have taken advanced maths classes to see
that interpolation is needed to properly aggregate multiple time series
together and get meaningful results.
<p>
Note that at the moment OpenTSDB only supports
<a href="http://en.wikipedia.org/wiki/Linear_interpolation">linear interpolation</a>
(sometimes shortened "lerp") for sake of simplicity. Patches are welcome for
those who would like to add other interpolation methods.
<p>
Here is another slightly more complicated example that came from the mailing
list, depicting how multiple time series are aggregated by average:<br/>
<img src="img/aggregation-average.png" alt="Aggregation by average with interpolation" />
<br/>
As we can see, the resulting time series has one data point at each timestamp
of all the underlying time series it aggregates, and that data point is
computed by taking the average of the values of all the time series at that
timestamp. This is also true for the lonely data point of the squared-purple
time series, that temporarily boosted the average until the next data point.
</section>
<footer>
© 2010–2013 The OpenTSDB Authors.
</footer>
</section></body></html>