Skip to content

Commit

Permalink
Rearrange file/dir structure
Browse files Browse the repository at this point in the history
  • Loading branch information
polyfractal committed May 30, 2014
1 parent 9986c1f commit 2e192df
Show file tree
Hide file tree
Showing 12 changed files with 335 additions and 313 deletions.
21 changes: 0 additions & 21 deletions 300_Aggregations.asciidoc

This file was deleted.

93 changes: 93 additions & 0 deletions 300_Aggregations/21_add_metric.asciidoc
Original file line number Diff line number Diff line change
@@ -0,0 +1,93 @@

=== Adding a metric to the mix

The previous example told us how many documents were in each bucket, which is
useful. But often, our applications require more sophisticated _metrics_ about
the documents. For example, what is the average price of cars in each bucket?

// "nesting"-> need to tell Elasticsearch which metrics to calculate, and on which fields.
To get this information, we need to start nesting metrics inside of the buckets.
Metrics will calculate some kind of mathematical statistic based on the values
in the documents residing within a particular bucket.

Let's go ahead and add an `average` metric to our car example:

[source,js]
--------------------------------------------------
GET /cars/transactions/_search?search_type=count
{
"aggs": {
"colors": {
"terms": {
"field": "color"
},
"aggs": { <1>
"avg_price": { <2>
"avg": {
"field": "price" <3>
}
}
}
}
}
}
--------------------------------------------------
// SENSE: 300_Aggregations/20_basic_example.json
<1> We add a new `aggs` level to hold the metric
<2> We then give the metric a name: "avg_price"
<3> And finally define it as an `avg` metric over the "price" field

As you can see, we took the previous example and tacked on a new `agga` level.
This new aggregation level allows us to nest the `avg` metric inside the
`terms` bucket. Effectively, this means we will generate an average for each
color.

Just like the "colors" example, we need to name our metric ("avg_price") so we
can retrieve the values later. Finally, we specify the metric itself (`avg`)
and what field we want the average to be calculated on (`price`).

// Delete this para
The response is, not surprisingly, nearly identical to the previous response...except
there is now a new "avg_price" element added to each color bucket:

[source,js]
--------------------------------------------------
{
...
"aggregations": {
"colors": {
"buckets": [
{
"key": "red",
"doc_count": 4,
"avg_price": { <1>
"value": 32500
}
},
{
"key": "blue",
"doc_count": 2,
"avg_price": {
"value": 20000
}
},
{
"key": "green",
"doc_count": 2,
"avg_price": {
"value": 21000
}
}
]
}
}
...
}
--------------------------------------------------
<1> New "avg_price" element in response

// Would love to have a graph under each example showing how the data can be displayed (later, i know)
Although the response has changed minimally, the data we get out of it has grown
substantially. Before, we knew there were four red cars. Now we know that the
average price of red cars is $32,500. This is something that you can plug directly
into reports or graphs.
101 changes: 101 additions & 0 deletions 300_Aggregations/22_nested_bucket.asciidoc
Original file line number Diff line number Diff line change
@@ -0,0 +1,101 @@

=== Buckets inside of buckets

The true power of aggregations becomes apparent once you start playing with
different nesting schemes. In the previous examples, we saw how you could nest
a metric inside a bucket, which is already quite powerful.

But the real exciting analytics come from nesting buckets inside _other buckets_.
This time, we want to find out the distribution of car manufacturers for each
color:


[source,js]
--------------------------------------------------
GET /cars/transactions/_search?search_type=count
{
"aggs": {
"colors": {
"terms": {
"field": "color"
},
"aggs": {
"avg_price": { <1>
"avg": {
"field": "price"
}
},
"make": { <2>
"terms": {
"field": "make" <3>
}
}
}
}
}
}
--------------------------------------------------
// SENSE: 300_Aggregations/20_basic_example.json
<1> Notice that we can leave the previous "avg_price" metric in place
<2> Another aggregation named "make" is added to the "color" bucket
<3> This aggregation is a `terms` bucket and will generate unique buckets for
each car make

A few interesting things happened here. First, you'll notice that the previous
"avg_price" metric is left entirely intact. Each "level" of an aggregation can
have many metrics or buckets. The "avg_price" metric tells us the average price
for each car color. This is independent of other buckets and metrics which
are also being built.

This is very important for your application, since there are often many related,
but entirely distinct, metrics which you need to collect. Aggregations allow
you to collect all of them in a single pass over the data.

The other important thing to note is that the aggregation we added, "make", is
a `terms` bucket (nested inside the "colors" `terms` bucket). This means we will
generate a (color, make) tuple for every unique combination in your dataset.

Let's take a look at the response (truncated for brevity, since it is now
growing quite long):


[source,js]
--------------------------------------------------
{
...
"aggregations": {
"colors": {
"buckets": [
{
"key": "red",
"doc_count": 4,
"make": { <1>
"buckets": [
{
"key": "honda", <2>
"doc_count": 3
},
{
"key": "bmw",
"doc_count": 1
}
]
},
"avg_price": {
"value": 32500 <3>
}
},
...
}
--------------------------------------------------
<1> Our new aggregation is nested under each color bucket, as expected
<2> We now see a breakdown of car makes for each color
<3> Finally, you can see that our previous "avg_price" metric is still intact

The response tells us:

- There are four red cars
- The average price of a red car is $32,500
- Three of the red cars are made by Honda, and one is a BMW
- Similar analytics are generated for other colors and makes
97 changes: 97 additions & 0 deletions 300_Aggregations/23_extra_metrics.asciidoc
Original file line number Diff line number Diff line change
@@ -0,0 +1,97 @@


==== One final modification

Just to drive the point home, let's make one final modification to our example
before moving on to new topics. Let's add two metrics to calculate the min and
max price for each make:


[source,js]
--------------------------------------------------
GET /cars/transactions/_search?search_type=count
{
"aggs": {
"colors": {
"terms": {
"field": "color"
},
"aggs": {
"avg_price": { "avg": { "field": "price" }
},
"make" : {
"terms" : {
"field" : "make"
},
"aggs" : { <1>
"min_price" : { "min": { "field": "price"} }, <2>
"max_price" : { "max": { "field": "price"} } <3>
}
}
}
}
}
}
--------------------------------------------------
// SENSE: 300_Aggregations/20_basic_example.json

// Careful with the "no surprise", it makes it sound like you're bored :)

<1> No surprise...we need to add another "aggs" level for nesting
<2> Then we include a `min` metric
<3> And a `max` metric

Which gives us the following output (again, truncated):

[source,js]
--------------------------------------------------
{
...
"aggregations": {
"colors": {
"buckets": [
{
"key": "red",
"doc_count": 4,
"make": {
"buckets": [
{
"key": "honda",
"doc_count": 3,
"min_price": {
"value": 10000 <1>
},
"max_price": {
"value": 20000 <1>
}
},
{
"key": "bmw",
"doc_count": 1,
"min_price": {
"value": 80000
},
"max_price": {
"value": 80000
}
}
]
},
"avg_price": {
"value": 32500
}
},
...
--------------------------------------------------
<1> The `min` and `max` metrics that we added now appear under each "make"

With those two buckets, we've expanded the information derived from this query
to include:

// Nice, but "Similar analytics.." -> "etc."?
- There are four red cars
- The average price of a red car is $32,500
- Three of the red cars are made by Honda, and one is a BMW
- The cheapest Honda is $10,000
- The most expensive Honda is $20,000
- Similar analytics are generated for all other colors and makes
Loading

0 comments on commit 2e192df

Please sign in to comment.