The bool
query is the mainstay of multiclause queries. It works well
for many cases, especially when you are able to map different query strings to
individual fields.
The problem is that, these days, users expect to be able to type all of their search terms into a single field, and expect that the application will figure out how to give them the right results. It is ironic that the multifield search form is known as Advanced Search—it may appear advanced to the user, but it is much simpler to implement.
There is no simple one-size-fits-all approach to multiword, multifield queries. To get the best results, you have to know your data and know how to use the appropriate tools.
When your only user input is a single query string, you will encounter three scenarios frequently:
- Best fields
-
When searching for words that represent a concept, such as
`brown fox,'' the words mean more together than they do individually. Fields like the `title
andbody
, while related, can be considered to be in competition with each other. Documents should have as many words as possible in the same field, and the score should come from the best-matching field. - Most fields
-
A common technique for fine-tuning relevance is to index the same data into multiple fields, each with its own analysis chain.
The main field may contain words in their stemmed form, synonyms, and words stripped of their diacritics, or accents. It is used to match as many documents as possible.
The same text could then be indexed in other fields to provide more-precise matching. One field may contain the unstemmed version, another the original word with accents, and a third might use shingles to provide information about word proximity.
These other fields act as signals to increase the relevance score of each matching document. The more fields that match, the better.
- Cross fields
-
For some entities, the identifying information is spread across multiple fields, each of which contains just a part of the whole:
-
Person:
first_name
andlast_name
-
Book:
title
,author
, anddescription
-
Address:
street
,city
,country
, andpostcode
In this case, we want to find as many words as possible in any of the listed fields. We need to search across multiple fields as if they were one big field.
-
All of these are multiword, multifield queries, but each requires a different strategy. We will examine each strategy in turn in the rest of this chapter.