This simple project is more like a note to self to gather an understanding on how Elasticsearch Percolator functionality works under the hood.
The Elasticsearch Percolator functionlity can be found under the percolator module.
The basic idea behind the percolator is that (as denoted by the documentation of the percolate query):
It can be used to match queries stored in an index. The percolate query itself contains the document that will be used as query to match with the stored queries.
Instead of naively executing all the queries stored in the index against the percolated document, Elasticsearch will filter out the queries that don't match the document by doing a search on top of the stored queries. Only the queries that are matching the document (or the ones that can't be evaluated in bulk) will be finally executed against a memory index that contains only the document to percolate.
The percolate query is called turning search upside down in the Elasticsearch in action book for the following reasons:
- You index queries instead of documents. This registers the query in memory, so it can be quickly run later.
- You send a document to Elasticsearch instead of a query. This is called percolating a document, basically indexing it into a small, in-memory index. Registered queries are run against the small index, so Elasticsearch finds out which queries match.
- You get back a list of queries matching the document, instead of the other way around like a regular search.
The percolator analyses the queries and creates appropriate data structures in order to be able to filter only the relevant search queries that need to be executed on the memory index containing the percolated document:
- for a term (e.g. : search for all the documents containing
elastic
) query there will be used aorg.apache.lucene.search.TermScorer
that will make use of the Lucene's inverted index functionality for retrieving the documents that contain the searched term - for a int field (e.g. : search all the real estate object with
4
rooms) an intersection will be made between the range of the percolated document (e.g. :4
to4
) and the KD tree containing the ranges (e.g. :0
TO3
,4
TO*
,2
TO5
, etc.)
This very simple project contains a series of tests in the CandidateQueryTests.java test class that showcase how the percolator actually works behind the scenes.