Replies: 1 comment
-
Here's a simplified explanation: In Qdrant, payloads are integrated into the graph search. That means that when a user queries with filtering, the graph is traversed with filtering. Resulting vectors found during the graph traversal all match the filter, meaning all retrieved points are relevant. In ElasticSearch, filtering is not part of the graph traversal. ES tries to find the best results in the whole collection from the graph, without taking filters into account. Many of the retrieved points may not be relevant. The retrieved results are filtered afterwards to drop all points not matching the filter. This gives lower quality results. Since both use forms of nearest neighbor algorithms, it is important that matching (relevant) points are found during graph traversal. If that isn't the case (when not filtering for example), other points with a good filter-match may never be reached. Please read this article for more details on this: https://qdrant.tech/articles/filtrable-hnsw/ This problem occurs because a (graph based) index is used for searching. This wouldn't happen with full-scan search, but that is very slow on large collections. Note that it is called post-filtering because filtering is not part of the graph traversal, filtering is done after graph traversal instead, thus "post-filtering". I hope that makes sense. 😄 |
Beta Was this translation helpful? Give feedback.
-
On the README it mentions:
Can you clarify what this means? Under what circumstance does ElasticSearch post-filtering not return all relevant vectors?
Beta Was this translation helpful? Give feedback.
All reactions