forked from elastic/elasticsearch-definitive-guide
-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
1 parent
9986c1f
commit 2e192df
Showing
12 changed files
with
335 additions
and
313 deletions.
There are no files selected for viewing
This file was deleted.
Oops, something went wrong.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,93 @@ | ||
|
||
=== Adding a metric to the mix | ||
|
||
The previous example told us how many documents were in each bucket, which is | ||
useful. But often, our applications require more sophisticated _metrics_ about | ||
the documents. For example, what is the average price of cars in each bucket? | ||
|
||
// "nesting"-> need to tell Elasticsearch which metrics to calculate, and on which fields. | ||
To get this information, we need to start nesting metrics inside of the buckets. | ||
Metrics will calculate some kind of mathematical statistic based on the values | ||
in the documents residing within a particular bucket. | ||
|
||
Let's go ahead and add an `average` metric to our car example: | ||
|
||
[source,js] | ||
-------------------------------------------------- | ||
GET /cars/transactions/_search?search_type=count | ||
{ | ||
"aggs": { | ||
"colors": { | ||
"terms": { | ||
"field": "color" | ||
}, | ||
"aggs": { <1> | ||
"avg_price": { <2> | ||
"avg": { | ||
"field": "price" <3> | ||
} | ||
} | ||
} | ||
} | ||
} | ||
} | ||
-------------------------------------------------- | ||
// SENSE: 300_Aggregations/20_basic_example.json | ||
<1> We add a new `aggs` level to hold the metric | ||
<2> We then give the metric a name: "avg_price" | ||
<3> And finally define it as an `avg` metric over the "price" field | ||
|
||
As you can see, we took the previous example and tacked on a new `agga` level. | ||
This new aggregation level allows us to nest the `avg` metric inside the | ||
`terms` bucket. Effectively, this means we will generate an average for each | ||
color. | ||
|
||
Just like the "colors" example, we need to name our metric ("avg_price") so we | ||
can retrieve the values later. Finally, we specify the metric itself (`avg`) | ||
and what field we want the average to be calculated on (`price`). | ||
|
||
// Delete this para | ||
The response is, not surprisingly, nearly identical to the previous response...except | ||
there is now a new "avg_price" element added to each color bucket: | ||
|
||
[source,js] | ||
-------------------------------------------------- | ||
{ | ||
... | ||
"aggregations": { | ||
"colors": { | ||
"buckets": [ | ||
{ | ||
"key": "red", | ||
"doc_count": 4, | ||
"avg_price": { <1> | ||
"value": 32500 | ||
} | ||
}, | ||
{ | ||
"key": "blue", | ||
"doc_count": 2, | ||
"avg_price": { | ||
"value": 20000 | ||
} | ||
}, | ||
{ | ||
"key": "green", | ||
"doc_count": 2, | ||
"avg_price": { | ||
"value": 21000 | ||
} | ||
} | ||
] | ||
} | ||
} | ||
... | ||
} | ||
-------------------------------------------------- | ||
<1> New "avg_price" element in response | ||
|
||
// Would love to have a graph under each example showing how the data can be displayed (later, i know) | ||
Although the response has changed minimally, the data we get out of it has grown | ||
substantially. Before, we knew there were four red cars. Now we know that the | ||
average price of red cars is $32,500. This is something that you can plug directly | ||
into reports or graphs. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,101 @@ | ||
|
||
=== Buckets inside of buckets | ||
|
||
The true power of aggregations becomes apparent once you start playing with | ||
different nesting schemes. In the previous examples, we saw how you could nest | ||
a metric inside a bucket, which is already quite powerful. | ||
|
||
But the real exciting analytics come from nesting buckets inside _other buckets_. | ||
This time, we want to find out the distribution of car manufacturers for each | ||
color: | ||
|
||
|
||
[source,js] | ||
-------------------------------------------------- | ||
GET /cars/transactions/_search?search_type=count | ||
{ | ||
"aggs": { | ||
"colors": { | ||
"terms": { | ||
"field": "color" | ||
}, | ||
"aggs": { | ||
"avg_price": { <1> | ||
"avg": { | ||
"field": "price" | ||
} | ||
}, | ||
"make": { <2> | ||
"terms": { | ||
"field": "make" <3> | ||
} | ||
} | ||
} | ||
} | ||
} | ||
} | ||
-------------------------------------------------- | ||
// SENSE: 300_Aggregations/20_basic_example.json | ||
<1> Notice that we can leave the previous "avg_price" metric in place | ||
<2> Another aggregation named "make" is added to the "color" bucket | ||
<3> This aggregation is a `terms` bucket and will generate unique buckets for | ||
each car make | ||
|
||
A few interesting things happened here. First, you'll notice that the previous | ||
"avg_price" metric is left entirely intact. Each "level" of an aggregation can | ||
have many metrics or buckets. The "avg_price" metric tells us the average price | ||
for each car color. This is independent of other buckets and metrics which | ||
are also being built. | ||
|
||
This is very important for your application, since there are often many related, | ||
but entirely distinct, metrics which you need to collect. Aggregations allow | ||
you to collect all of them in a single pass over the data. | ||
|
||
The other important thing to note is that the aggregation we added, "make", is | ||
a `terms` bucket (nested inside the "colors" `terms` bucket). This means we will | ||
generate a (color, make) tuple for every unique combination in your dataset. | ||
|
||
Let's take a look at the response (truncated for brevity, since it is now | ||
growing quite long): | ||
|
||
|
||
[source,js] | ||
-------------------------------------------------- | ||
{ | ||
... | ||
"aggregations": { | ||
"colors": { | ||
"buckets": [ | ||
{ | ||
"key": "red", | ||
"doc_count": 4, | ||
"make": { <1> | ||
"buckets": [ | ||
{ | ||
"key": "honda", <2> | ||
"doc_count": 3 | ||
}, | ||
{ | ||
"key": "bmw", | ||
"doc_count": 1 | ||
} | ||
] | ||
}, | ||
"avg_price": { | ||
"value": 32500 <3> | ||
} | ||
}, | ||
... | ||
} | ||
-------------------------------------------------- | ||
<1> Our new aggregation is nested under each color bucket, as expected | ||
<2> We now see a breakdown of car makes for each color | ||
<3> Finally, you can see that our previous "avg_price" metric is still intact | ||
|
||
The response tells us: | ||
|
||
- There are four red cars | ||
- The average price of a red car is $32,500 | ||
- Three of the red cars are made by Honda, and one is a BMW | ||
- Similar analytics are generated for other colors and makes |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,97 @@ | ||
|
||
|
||
==== One final modification | ||
|
||
Just to drive the point home, let's make one final modification to our example | ||
before moving on to new topics. Let's add two metrics to calculate the min and | ||
max price for each make: | ||
|
||
|
||
[source,js] | ||
-------------------------------------------------- | ||
GET /cars/transactions/_search?search_type=count | ||
{ | ||
"aggs": { | ||
"colors": { | ||
"terms": { | ||
"field": "color" | ||
}, | ||
"aggs": { | ||
"avg_price": { "avg": { "field": "price" } | ||
}, | ||
"make" : { | ||
"terms" : { | ||
"field" : "make" | ||
}, | ||
"aggs" : { <1> | ||
"min_price" : { "min": { "field": "price"} }, <2> | ||
"max_price" : { "max": { "field": "price"} } <3> | ||
} | ||
} | ||
} | ||
} | ||
} | ||
} | ||
-------------------------------------------------- | ||
// SENSE: 300_Aggregations/20_basic_example.json | ||
|
||
// Careful with the "no surprise", it makes it sound like you're bored :) | ||
|
||
<1> No surprise...we need to add another "aggs" level for nesting | ||
<2> Then we include a `min` metric | ||
<3> And a `max` metric | ||
|
||
Which gives us the following output (again, truncated): | ||
|
||
[source,js] | ||
-------------------------------------------------- | ||
{ | ||
... | ||
"aggregations": { | ||
"colors": { | ||
"buckets": [ | ||
{ | ||
"key": "red", | ||
"doc_count": 4, | ||
"make": { | ||
"buckets": [ | ||
{ | ||
"key": "honda", | ||
"doc_count": 3, | ||
"min_price": { | ||
"value": 10000 <1> | ||
}, | ||
"max_price": { | ||
"value": 20000 <1> | ||
} | ||
}, | ||
{ | ||
"key": "bmw", | ||
"doc_count": 1, | ||
"min_price": { | ||
"value": 80000 | ||
}, | ||
"max_price": { | ||
"value": 80000 | ||
} | ||
} | ||
] | ||
}, | ||
"avg_price": { | ||
"value": 32500 | ||
} | ||
}, | ||
... | ||
-------------------------------------------------- | ||
<1> The `min` and `max` metrics that we added now appear under each "make" | ||
|
||
With those two buckets, we've expanded the information derived from this query | ||
to include: | ||
|
||
// Nice, but "Similar analytics.." -> "etc."? | ||
- There are four red cars | ||
- The average price of a red car is $32,500 | ||
- Three of the red cars are made by Honda, and one is a BMW | ||
- The cheapest Honda is $10,000 | ||
- The most expensive Honda is $20,000 | ||
- Similar analytics are generated for all other colors and makes |
Oops, something went wrong.