Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEATURE] Apply min_score to normalized/combined results in hybrid query #1164

Open
martin-gaievski opened this issue Feb 3, 2025 · 0 comments
Assignees
Labels
backlog All the backlog features should be marked with this label enhancement

Comments

@martin-gaievski
Copy link
Member

Is your feature request related to a problem?

Today hybrid query pushes min_score to the shard level, and that filter got applied to individual sub-queries and before the normalization and/or combination. This is confusing in some scenarios, where expected behavior is to apply the filter to the normalized/combined scores.

Example:

{
  "size": 100,
  "_source": [
    "item_id",
    "item_description"],
  "query": {
    "hybrid": {
      "queries": [
        {
          "knn": {
            "embedding": {
              "vector": [...],
              "k": 100
            }
          }
        },
        {
          "match": {
            "item_description": {
              "query": "Looking for something",
              "fuzziness": 1,
              "operator": "and"
            }
          }
        }
      ]
    }
  },
  "min_score": 1.5
}

min_score will cut off all the results from knn query because they are within [0..1.0] interval for normalized vectors. At the same time for match query this will cut off mostly only irrelevant documents, because top matching docs will have scores of 20 and greater.

What solution would you like?

min_score or equivalent new custom parameter that defines the cut off line for scores after normalization/combination.

What alternatives have you considered?

Some workaround are possible with custom scoring function

Do you have any additional context?

Don't confuse it with the lower bound feature of min/max technique #299

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backlog All the backlog features should be marked with this label enhancement
Projects
None yet
Development

No branches or pull requests

2 participants