A common requirement is the need to present search results grouped by a particular
field. We might want to return the most relevant blog posts grouped by the
user’s name. Grouping by name implies the need for a terms
aggregation. To
be able to group on the user’s whole name, the name field should be
available in its original not_analyzed
form, as explained in
[aggregations-and-analysis]:
PUT /my_index/_mapping/blogpost
{
"properties": {
"user": {
"properties": {
"name": { (1)
"type": "string",
"fields": {
"raw": { (2)
"type": "string",
"index": "not_analyzed"
}
}
}
}
}
}
}
-
The
user.name
field will be used for full-text search. -
The
user.name.raw
field will be used for grouping with theterms
aggregation.
Then add some data:
PUT /my_index/user/1
{
"name": "John Smith",
"email": "[email protected]",
"dob": "1970/10/24"
}
PUT /my_index/blogpost/2
{
"title": "Relationships",
"body": "It's complicated...",
"user": {
"id": 1,
"name": "John Smith"
}
}
PUT /my_index/user/3
{
"name": "Alice John",
"email": "[email protected]",
"dob": "1979/01/04"
}
PUT /my_index/blogpost/4
{
"title": "Relationships are cool",
"body": "It's not complicated at all...",
"user": {
"id": 3,
"name": "Alice John"
}
}
Now we can run a query looking for blog posts about relationships
, by users
called John
, and group the results by user, thanks to the
top_hits
aggregation:
GET /my_index/blogpost/_search?search_type=count (1)
{
"query": { (2)
"bool": {
"must": [
{ "match": { "title": "relationships" }},
{ "match": { "user.name": "John" }}
]
}
},
"aggs": {
"users": {
"terms": {
"field": "user.name.raw", (3)
"order": { "top_score": "desc" } (4)
},
"aggs": {
"top_score": { "max": { "script": "_score" }}, (4)
"blogposts": { "top_hits": { "_source": "title", "size": 5 }} (5)
}
}
}
}
-
The blog posts that we are interested in are returned under the
blogposts
aggregation, so we can disable the usual searchhits
by setting thesearch_type=count
. -
The
query
returns blog posts aboutrelationships
by users namedJohn
. -
The
terms
aggregation creates a bucket for eachuser.name.raw
value. -
The
top_score
aggregation orders the terms in theusers
aggregation by the top-scoring document in each bucket. -
The
top_hits
aggregation returns just thetitle
field of the five most relevant blog posts for each user.
The abbreviated response is shown here:
...
"hits": {
"total": 2,
"max_score": 0,
"hits": [] (1)
},
"aggregations": {
"users": {
"buckets": [
{
"key": "John Smith", (2)
"doc_count": 1,
"blogposts": {
"hits": { (3)
"total": 1,
"max_score": 0.35258877,
"hits": [
{
"_index": "my_index",
"_type": "blogpost",
"_id": "2",
"_score": 0.35258877,
"_source": {
"title": "Relationships"
}
}
]
}
},
"top_score": { (4)
"value": 0.3525887727737427
}
},
...
-
The
hits
array is empty because we setsearch_type=count
. -
There is a bucket for each user who appeared in the top results.
-
Under each user bucket there is a
blogposts.hits
array containing the top results for that user. -
The user buckets are sorted by the user’s most relevant blog post.
Using the top_hits
aggregation is the equivalent of running a query to
return the names of the users with the most relevant blog posts, and then running
the same query for each user, to get their best blog posts. But it is much more
efficient.
The top hits returned in each bucket are the result of running a light mini-query based on the original main query. The mini-query supports the usual features that you would expect from search such as highlighting and pagination.