Queries can find only terms that actually exist in the inverted index, so it is important to ensure that the same analysis process is applied both to the document at index time, and to the query string at search time so that the terms in the query match the terms in the inverted index.
Although we say document, analyzers are determined per field. Each field can have a different analyzer, either by configuring a specific analyzer for that field or by falling back on the type, index, or node defaults. At index time, a field’s value is analyzed by using the configured or default analyzer for that field.
For instance, let’s add a new field to my_index
:
PUT /my_index/_mapping/my_type
{
"my_type": {
"properties": {
"english_title": {
"type": "string",
"analyzer": "english"
}
}
}
}
Now we can compare how values in the english_title
field and the title
field are
analyzed at index time by using the analyze
API to analyze the word Foxes
:
GET /my_index/_analyze
{
"field": "my_type.title", (1)
"text": "Foxes"
}
GET /my_index/_analyze
{
"field": "my_type.english_title", (2)
"text": "Foxes"
}
-
Field
title
, which uses the defaultstandard
analyzer, will return the termfoxes
. -
Field
english_title
, which uses theenglish
analyzer, will return the termfox
.
This means that, were we to run a low-level term
query for the exact term
fox
, the english_title
field would match but the title
field would
not.
High-level queries like the match
query understand field mappings and can
apply the correct analyzer for each field being queried. We can see this
in action with the validate-query
API:
GET /my_index/my_type/_validate/query?explain
{
"query": {
"bool": {
"should": [
{ "match": { "title": "Foxes"}},
{ "match": { "english_title": "Foxes"}}
]
}
}
}
which returns this explanation
:
(title:foxes english_title:fox)
The match
query uses the appropriate analyzer for each field to ensure
that it looks for each term in the correct format for that field.
While we can specify an analyzer at the field level, how do we determine which analyzer is used for a field if none is specified at the field level?
Analyzers can be specified at three levels: per-field, per-index or the global default. Elasticsearch works through each level until it finds an analyzer that it can use. At index time, the order is as follows:
-
The
analyzer
defined in the field mapping, else -
The analyzer named
default
in the index settings, which defaults to -
The
standard
analyzer
At search time, the sequence is slightly different:
-
The
analyzer
defined in the query itself, else -
The
analyzer
defined in the field mapping, else -
The analyzer named
default
in the index settings, which defaults to -
The
standard
analyzer
Occasionally, it makes sense to use a different analyzer at index and search
time. For instance, at index time we may want to index synonyms (for example, for every
occurrence of quick
, we also index fast
, rapid
, and speedy
). But at
search time, we don’t need to search for all of these synonyms. Instead we
can just look up the single word that the user has entered, be it quick
,
fast
, rapid
, or speedy
.
To enable this distinction, Elasticsearch also supports an
optional search_analyzer
mapping which will only be used at search-time (analyzer
is still used for indexing). There is also an equivalent default_search
mapping
for configuring the default at the index-level.
Taking these extra parameters into account, the full sequence at search time:
-
The
analyzer
defined in the query itself, else -
The
search_analyzer
defined in the field mapping, else -
The
analyzer
defined in the field mapping, else -
The analyzer named
default_search
in the index settings, which defaults to -
The analyzer named
default
in the index settings, which defaults to -
The
standard
analyzer
The sheer number of places where you can specify an analyzer is quite overwhelming. In practice, though, it is pretty simple.
Most of the time, you will know what fields your documents will contain ahead of time. The simplest approach is to set the analyzer for each full-text field when you create your index or add type mappings. While this approach is slightly more verbose, it enables you to easily see which analyzer is being applied to each field.
Typically, most of your string fields will be exact-value not_analyzed
fields such as tags or enums, plus a handful of full-text fields that will
use some default analyzer like standard
or english
or some other language.
Then you may have one or two fields that need custom analysis: perhaps the
title
field needs to be indexed in a way that supports find-as-you-type.
You can set the default
analyzer in the index to the analyzer you want to
use for almost all full-text fields, and just configure the specialized
analyzer on the one or two fields that need it.
Note
|
A common work flow for time based data like logging is to create a new index per day on the fly by just indexing into it. While this work flow prevents you from creating your index up front, you can still use {ref}/indices-templates.html[index templates] to specify the settings and mappings that a new index should have. |