Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature Request] /msearch API fails if any query is malformed #16862

Open
colinking opened this issue Dec 16, 2024 · 2 comments
Open

[Feature Request] /msearch API fails if any query is malformed #16862

colinking opened this issue Dec 16, 2024 · 2 comments
Labels
enhancement Enhancement or improvement to existing feature or request Search Search query, autocomplete ...etc

Comments

@colinking
Copy link

Is your feature request related to a problem? Please describe

In the example below, we run two queries on the ecommerce dataset via an /msearch request. The first query is valid, but the second query is malformed (due to the null value). The entire /msearch API fails with a 400 error and we do not get a response for the first query.

curl -s "http://localhost:3955/_msearch" -H 'Content-Type: application/json' -d'
{ "index": "ecommerce"}
{ "query": { "term": {"customer_first_name.keyword": "Jason"} }, "from": 0, "size": 1 }
{ "index": "ecommerce"}
{ "query": { "term": {"customer_first_name.keyword": null } }, "from": 0, "size": 1 }
' | jq .
{
  "error": {
    "root_cause": [
      {
        "type": "illegal_argument_exception",
        "reason": "field name is null or empty"
      }
    ],
    "type": "illegal_argument_exception",
    "reason": "field name is null or empty"
  },
  "status": 400
}

Contrast this with what happens if the second query is malformed in a different way, such as referencing an index that doesn't exist. In this case, the /msearch API returns a response for the first query, and an error for the second query.

curl -s "http://localhost:3955/_msearch" -H 'Content-Type: application/json' -d'
{ "index": "ecommerce"}
{ "query": { "term": {"customer_first_name.keyword": "Jason"} }, "from": 0, "size": 1 }
{ "index": "ecommerce2"}
{ "query": { "term": {"customer_first_name.keyword": "Selena" } }, "from": 0, "size": 1 }
' | jq .
{
  "took": 4,
  "responses": [
    {
      "took": 4,
      "timed_out": false,
      "_shards": {
        "total": 1,
        "successful": 1,
        "skipped": 0,
        "failed": 0
      },
      "hits": {
        "total": {
          "value": 65,
          "relation": "eq"
        },
        "max_score": 4.268148,
        "hits": [
          {
            "_index": "ecommerce",
            "_id": "89",
            "_score": 4.268148,
            "_source": {
              "category": [
                "Men's Clothing",
                "Men's Accessories"
              ],
              "currency": "EUR",
              "customer_first_name": "Jason",
              "customer_full_name": "Jason Jacobs",
              "customer_gender": "MALE",
              "customer_id": 16,
              "customer_last_name": "Jacobs",
              "customer_phone": "",
              "day_of_week": "Monday",
              "day_of_week_i": 0,
              "email": "[email protected]",
              "manufacturer": [
                "Elitelligence"
              ],
              "order_date": "2016-12-19T20:09:36+00:00",
              "order_id": 575797,
              "products": [
                {
                  "base_price": 14.99,
                  "discount_percentage": 0,
                  "quantity": 1,
                  "manufacturer": "Elitelligence",
                  "tax_amount": 0,
                  "product_id": 18217,
                  "category": "Men's Clothing",
                  "sku": "ZO0555205552",
                  "taxless_price": 14.99,
                  "unit_discount_amount": 0,
                  "min_price": 7.2,
                  "_id": "sold_product_575797_18217",
                  "discount_amount": 0,
                  "created_on": "2016-12-19T20:09:36+00:00",
                  "product_name": "Long sleeved top - dark blue/pink",
                  "price": 14.99,
                  "taxful_price": 14.99,
                  "base_unit_price": 14.99
                },
                {
                  "base_price": 10.99,
                  "discount_percentage": 0,
                  "quantity": 1,
                  "manufacturer": "Elitelligence",
                  "tax_amount": 0,
                  "product_id": 21624,
                  "category": "Men's Accessories",
                  "sku": "ZO0602206022",
                  "taxless_price": 10.99,
                  "unit_discount_amount": 0,
                  "min_price": 5.17,
                  "_id": "sold_product_575797_21624",
                  "discount_amount": 0,
                  "created_on": "2016-12-19T20:09:36+00:00",
                  "product_name": "Wallet - brown",
                  "price": 10.99,
                  "taxful_price": 10.99,
                  "base_unit_price": 10.99
                }
              ],
              "sku": [
                "ZO0555205552",
                "ZO0602206022"
              ],
              "taxful_total_price": 25.98,
              "taxless_total_price": 25.98,
              "total_quantity": 2,
              "total_unique_products": 2,
              "type": "order",
              "user": "jason",
              "geoip": {
                "country_iso_code": "US",
                "location": {
                  "lon": -74,
                  "lat": 40.8
                },
                "region_name": "New York",
                "continent_name": "North America",
                "city_name": "New York"
              },
              "event": {
                "dataset": "sample_ecommerce"
              }
            }
          }
        ]
      },
      "status": 200
    },
    {
      "error": {
        "root_cause": [
          {
            "type": "index_not_found_exception",
            "reason": "no such index [ecommerce2]",
            "index": "ecommerce2",
            "resource.id": "ecommerce2",
            "resource.type": "index_or_alias",
            "index_uuid": "_na_"
          }
        ],
        "type": "index_not_found_exception",
        "reason": "no such index [ecommerce2]",
        "index": "ecommerce2",
        "resource.id": "ecommerce2",
        "resource.type": "index_or_alias",
        "index_uuid": "_na_"
      },
      "status": 404
    }
  ]
}

In both cases, only one of the two queries is malformed. However, only the second scenario gives us a partial response. This contrasts with the docs, which state: " OpenSearch executes each search independently, so the failure of one doesn’t affect the others."

Describe the solution you'd like

I would like the /msearch API to return the following response for the first scenario. This would give us a partial response, and would prevent the failure of the second query from impacting the first query.

curl -s "http://localhost:3955/_msearch" -H 'Content-Type: application/json' -d'
{ "index": "ecommerce"}
{ "query": { "term": {"customer_first_name.keyword": "Jason"} }, "from": 0, "size": 1 }
{ "index": "ecommerce"}
{ "query": { "term": {"customer_first_name.keyword": null } }, "from": 0, "size": 1 }
' | jq .
{
  "took": 4,
  "responses": [
    {
      "took": 4,
      "timed_out": false,
      "_shards": {
        "total": 1,
        "successful": 1,
        "skipped": 0,
        "failed": 0
      },
      "hits": {
        "total": {
          "value": 65,
          "relation": "eq"
        },
        "max_score": 4.268148,
        "hits": [
          {
            "_index": "ecommerce",
            "_id": "89",
            "_score": 4.268148,
            "_source": {
              "category": [
                "Men's Clothing",
                "Men's Accessories"
              ],
              "currency": "EUR",
              "customer_first_name": "Jason",
              "customer_full_name": "Jason Jacobs",
              "customer_gender": "MALE",
              "customer_id": 16,
              "customer_last_name": "Jacobs",
              "customer_phone": "",
              "day_of_week": "Monday",
              "day_of_week_i": 0,
              "email": "[email protected]",
              "manufacturer": [
                "Elitelligence"
              ],
              "order_date": "2016-12-19T20:09:36+00:00",
              "order_id": 575797,
              "products": [
                {
                  "base_price": 14.99,
                  "discount_percentage": 0,
                  "quantity": 1,
                  "manufacturer": "Elitelligence",
                  "tax_amount": 0,
                  "product_id": 18217,
                  "category": "Men's Clothing",
                  "sku": "ZO0555205552",
                  "taxless_price": 14.99,
                  "unit_discount_amount": 0,
                  "min_price": 7.2,
                  "_id": "sold_product_575797_18217",
                  "discount_amount": 0,
                  "created_on": "2016-12-19T20:09:36+00:00",
                  "product_name": "Long sleeved top - dark blue/pink",
                  "price": 14.99,
                  "taxful_price": 14.99,
                  "base_unit_price": 14.99
                },
                {
                  "base_price": 10.99,
                  "discount_percentage": 0,
                  "quantity": 1,
                  "manufacturer": "Elitelligence",
                  "tax_amount": 0,
                  "product_id": 21624,
                  "category": "Men's Accessories",
                  "sku": "ZO0602206022",
                  "taxless_price": 10.99,
                  "unit_discount_amount": 0,
                  "min_price": 5.17,
                  "_id": "sold_product_575797_21624",
                  "discount_amount": 0,
                  "created_on": "2016-12-19T20:09:36+00:00",
                  "product_name": "Wallet - brown",
                  "price": 10.99,
                  "taxful_price": 10.99,
                  "base_unit_price": 10.99
                }
              ],
              "sku": [
                "ZO0555205552",
                "ZO0602206022"
              ],
              "taxful_total_price": 25.98,
              "taxless_total_price": 25.98,
              "total_quantity": 2,
              "total_unique_products": 2,
              "type": "order",
              "user": "jason",
              "geoip": {
                "country_iso_code": "US",
                "location": {
                  "lon": -74,
                  "lat": 40.8
                },
                "region_name": "New York",
                "continent_name": "North America",
                "city_name": "New York"
              },
              "event": {
                "dataset": "sample_ecommerce"
              }
            }
          }
        ]
      },
      "status": 200
    },
    {
      "error": {
        "root_cause": [
          {
            "type": "illegal_argument_exception",
            "reason": "field name is null or empty"
          }
        ],
        "type": "illegal_argument_exception",
        "reason": "field name is null or empty"
      },
      "status": 400
    }
  ]
}

If this is considered a breaking change, I'd propose making this behavior opt-in via a boolean request argument.

Related component

Search

Describe alternatives you've considered

We could call the Validate Query on each of the queries we want to execute, but that adds latency to our overall search performance. A bulk version of this API would help, but we'd still prefer if this happened automatically within msearch.

Additional context

The examples above were tested on OpenSearch 2.18 using the dataset from here.

@colinking colinking added enhancement Enhancement or improvement to existing feature or request untriaged labels Dec 16, 2024
@github-actions github-actions bot added the Search Search query, autocomplete ...etc label Dec 16, 2024
@sandeshkr419
Copy link
Contributor

[Search Triage]
The first query throws error in the query parsing stage itself and therefore the search does not even reaches the index. This is more like a json parsing error based on parsing rules.
However, in the second query, we are able to parse the request successfully and then it returns an index not found error.

This behavior is consistent with what _bulk api as well. I think it will be tough to break the json body and then parse those at an individual level. Either way, we'd need to keep the behavior consistent between both _bulk and _msearch. Let's see if other people have different opinions.

@colinking
Copy link
Author

Thanks for the explanation, @sandeshkr419.

The first query throws error in the query parsing stage itself and therefore the search does not even reaches the index. This is more like a json parsing error based on parsing rules.

I'd differentiate between a syntactic error (e.g. invalid JSON) and a semantic error (e.g. the null value example above). A syntactic error isn't necessarily isolated to one query (e.g. if the body was pretty-printed JSON, one of the lines may happen to be valid). However, with a semantic error we can isolate that error to a single query.

I'm not familiar with the implementation details here, but this seems like an implementation detail related to how queries are parsed is getting exposed. However, I understand that changing that may be non-trivial.

Either way, we'd need to keep the behavior consistent between both _bulk and _msearch.

Agreed, good point.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Enhancement or improvement to existing feature or request Search Search query, autocomplete ...etc
Projects
Status: 🆕 New
Development

No branches or pull requests

3 participants
@colinking @sandeshkr419 and others