Skip to content
This repository has been archived by the owner on Apr 20, 2020. It is now read-only.

Expose searching as bool query, rather than isolated _aknn_search #3

Open
henrywallace opened this issue Jun 20, 2018 · 2 comments
Open

Comments

@henrywallace
Copy link

The motivation is that I would like to combine the KNN search with other bool clauses, or combine the score with other queries. Preferably, I would like to do something like

GET localhost:9200/twitter_images/twitter_image/_search
{
    "size": 10,
    "query": {
        "bool": {
            "should": [
                {"term": {"name": "Some Name"}},
                {"aknn": {
                    "vec": [0.1, 0.2, ...],
                    "k1": 1000,
                }},
            ],
        }
    }
}
@alexklibisz
Copy link
Owner

Hi @henrywallace, thanks for your interest in the project. I agree this is a cool feature and have also thought about the potential integration of multiple queries. I'm not sure about the best design though, as I've only ever used Elasticsearch for this specific project.

More generally to your issues: I've begun a re-write of the plugin on the dev branches over at https://github.com/alexklibisz/elasticsearch-aknn. I've also considered rewriting the whole thing in Scala as I find the syntax and terseness more appealing. But this is fairly low-priority at the moment as I'm currently in the middle of job interviews. I'm hoping to spend a couple weekends hacking on the re-write in about a month.

@alexklibisz
Copy link
Owner

alexklibisz commented Jul 22, 2018

@henrywallace I'm starting to think a little bit about how to implement this part. I searched around and I don't see any way to implement a custom bool clause within a plugin.

I think re-scoring might be do-able though.. there is an example re-scoring plugin in the ES repo: https://github.com/elastic/elasticsearch/tree/master/plugins/examples/rescore. I've never used rescoring, but conceptually it makes sense to first find all the docs which match based on a standard query, and then score/rerank them based on similarity to a query vector.

Another option (which is not as sleek but I know how to implement it) would be to allow passing an arbitrary query to the _aknn_search endpoint (like your bool query minus the aknn field), then execute a search using the given query, and finally constrain the aknn query to only consider documents which were returned from the first search. So the query would look like:

GET localhost:9200/twitter_images/twitter_image/_aknn_search
{
  "pre_query": {
      "size": 10,
      "query": {
          "bool": {
              "should": [
                  {"term": {"name": "Some Name"}},
              ],
          }
      }
  }
... other aknn search params ...
}

Thoughts?


Edit: After some more researching....

It looks like rescoring actually involves using the Lucene API and specifying details about shards in the query. I don't know what kind of access that will give me to the document source and to the elasticsearch Java API, and operating with shards as a detail could complicate the search interface.

The second approach I mentioned (passing a query as part of the _aknn_search request) is also used by the popular carrot2 clustering plugin. They expose an endpoint called _search_with_clusters which takes a search_request parameter. Here is an example of the request body from their HTML docs. Here is what appears to be the corresponding Java code that parses the search_request.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants