Skip to content
This repository has been archived by the owner on Aug 2, 2022. It is now read-only.

[BUG] Roll up target index metric schema is not proper for max and min when values of type float/double #450

Open
narayananaidup opened this issue Jun 3, 2021 · 0 comments
Labels
bug Something isn't working

Comments

@narayananaidup
Copy link

narayananaidup commented Jun 3, 2021

Describe the bug

I have created roll up policy successful and able to index the data into the target index but the schema of some of the metric fields is getting configured wrongly .

For example max and min metrics its been mapped to keyword and more details below :

roll up policy which we tried ::

curl -XPUT "localhost:9200/_opendistro/_rollup/jobs/latest_stats_roll_up" -H 'Content-Type: application/json' -d'{"rollup":{"enabled":true,"schedule":{"interval":{"period":1,"unit":"Minutes"}},"description":"An example policy that rolls up the sample ecommerce data","source_index":"fmstats_2021-06-03*","target_index":"temp_stats_roll_1","page_size":1000,"delay":0,"continuous":false,"dimensions":[{"date_histogram":{"source_field":"timestamp","fixed_interval":"60m"}},{"terms":{"source_field":"portIdToClusterId"}},{"terms":{"source_field":"alias"}}],"metrics":[{"source_field":"port.rx.packets","metrics":[{"avg":{}},{"sum":{}},{"max":{}},{"min":{}},{"value_count":{}}]}]}}'

Data fetching indexing is properly done from the index management :

source[{"rollup._id":"latest_stats_roll_up","rollup._doc_count":12,"rollup._schema_version":9,"timestamp.date_histogram":1620342000000,"portIdToClusterId.terms":"4_1_x18;c2c20049","alias.terms":"c20049-4-1-x18","port.rx.packets.sum":1.0810224507E10,"port.rx.packets.value_count":12,"port.rx.packets.max":9.01208458E8,"port.rx.packets.min":9.00494681E8,"port.rx.packets.avg.sum":1.0810224507E10,"port.rx.packets.avg.value_count":12}]}

we have 5 minutes of actual data and we are rolling up to 60m granularity .

Problem is the schema which it got generated on the target index is wrong for the metrics like max and min its keeping the type as keyword instead of long/float.

here is the schema of the target index

{"temp_stats_roll_1":{"mappings":{"_meta":{"rollups":{"latest_stats_roll_up":{"enabled_time":1622716301620,"target_index":"temp_stats_roll_1","roles":[],"description":"An example policy that rolls up the sample ecommerce data","source_index":"fmstats_2021-06-03*","enabled":true,"rollup_id":"latest_stats_roll_up","schema_version":8,"schedule":{"interval":{"start_time":1622716301620,"period":1,"unit":"Minutes"}},"delay":0,"last_updated_time":1622716301620,"continuous":false,"metadata_id":"IFpu0XkBCA59Kcdjqyqa","metrics":[{"source_field":"port.rx.packets","metrics":[{"avg":{}},{"sum":{}},{"max":{}},{"min":{}},{"value_count":{}}]}],"page_size":1000,"dimensions":[{"date_histogram":{"fixed_interval":"60m","source_field":"timestamp","target_field":"timestamp","timezone":"UTC"}},{"terms":{"source_field":"portIdToClusterId","target_field":"portIdToClusterId"}},{"terms":{"source_field":"alias","target_field":"alias"}}]}}},"dynamic_templates":[{"strings":{"match_mapping_type":"string","mapping":{"type":"keyword"}}},{"date_histograms":{"path_match":"*.date_histogram","mapping":{"type":"date"}}}],"properties":{"alias":{"properties":{"terms":{"type":"keyword"}}},"port":{"properties":{"rx":{"properties":{"packets":{"properties":{"avg":{"properties":{"sum":{"type":"float"},"value_count":{"type":"long"}}},"max":{"type":"keyword"},"min":{"type":"keyword"},"sum":{"type":"float"},"value_count":{"type":"long"}}}}}}},"portIdToClusterId":{"properties":{"terms":{"type":"keyword"}}},"rollup":{"properties":{"_doc_count":{"type":"long"},"_id":{"type":"keyword"},"_schema_version":{"type":"long"}}},"timestamp":{"properties":{"date_histogram":{"type":"date"}}}}}}}

snippet is :

"port":{"properties":{"rx":{"properties":{"packets":{"properties":{"avg":{"properties":{"sum":{"type":"float"},"value_count":{"type":"long"}}},"max":{"type":"keyword"},"min":{"type":"keyword"}

For sum its proper and for max/min its getting as keyword .

further analysis on this issue looks like we are dynamically mapping all float/double value to strings and dynamically mapping and getting indexing those fields as keywords as per the dynamic schema on the target roll indices ..

confirmed the same by making the following change to the indexer file and it started working ..

git diff src/main/kotlin/com/amazon/opendistroforelasticsearch/indexmanagement/rollup/RollupIndexer.kt
diff --git a/src/main/kotlin/com/amazon/opendistroforelasticsearch/indexmanagement/rollup/RollupIndexer.kt b/src/main/kotlin/com/amazon/opendistroforelasticsearch/indexmanagement/rollup/RollupIndexer.kt
index 0f0d717..f73b80c 100644
--- a/src/main/kotlin/com/amazon/opendistroforelasticsearch/indexmanagement/rollup/RollupIndexer.kt
+++ b/src/main/kotlin/com/amazon/opendistroforelasticsearch/indexmanagement/rollup/RollupIndexer.kt
@@ -109,17 +109,16 @@ class RollupIndexer(
             val hash = MurmurHash3.hash128(docByteArray, 0, docByteArray.size, DOCUMENT_ID_SEED, MurmurHash3.Hash128())
             val byteArray = ByteBuffer.allocate(BYTE_ARRAY_SIZE).putLong(hash.h1).putLong(hash.h2).array()
             val documentId = Base64.getUrlEncoder().withoutPadding().encodeToString(byteArray)
-
             val mapOfKeyValues = job.getInitialDocValues(it.docCount)
             val aggResults = mutableMapOf<String, Any?>()
             it.key.entries.forEach { aggResults[it.key] = it.value }
             it.aggregations.forEach {
                 when (it) {
                     is InternalSum -> aggResults[it.name] = it.value
-                    is InternalMax -> aggResults[it.name] = it.value
-                    is InternalMin -> aggResults[it.name] = it.value
+                    is InternalMax -> aggResults[it.name] = it.value.toLong()
+                    is InternalMin -> aggResults[it.name] = it.value.toLong()

For example if we run the query with sum its working

curl -XPOST "localhost:9200/temp*/_search?pretty" -H 'Content-Type: application/json' -d'{"size":0,"aggregations":{"daily_numbers":{"terms":{"field":"portIdToClusterId"},"aggregations":{"Sub_dateHistogramAgg":{"date_histogram":{"field":"timestamp","missing":0,"fixed_interval":"5m","offset":0,"order":{"_key":"asc"},"keyed":false,"min_doc_count":0},"aggregations":{"sumAggporttxpackets":{"sum":{"field":"port.rx.packets"}}}}}}}}'
output:

{
  "took" : 2,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 10000,
      "relation" : "gte"
    },
    "max_score" : null,
    "hits" : [ ]
  },
  "aggregations" : {
    "daily_numbers" : {
      "doc_count_error_upper_bound" : 0,
      "sum_other_doc_count" : 107990,
      "buckets" : [
        {
          "key" : "9_1_x1;s1.0000.proxy-s1-1",
          "doc_count" : 2,
          "Sub_dateHistogramAgg" : {
            "buckets" : [
              {
                "key_as_string" : "2021-06-03T10:00:00.000Z",
                "key" : 1622714400000,
                "doc_count" : 2,
                "maxAggporttxoctets" : {
                  "value" : 6554289.0
                }
              }
            ]
          }
.......

if we run with max/min its not working :

[root@fmha1 opendistro-index-management]# curl -XPOST "localhost:9200/temp*/_search?pretty" -H 'Content-Type: application/json' -d'{"size":0,"aggregations":{"daily_numbers":{"terms":{"field":"portIdToClusterId"},"aggregations":{"Sub_dateHistogramAgg":{"date_histogram":{"field":"timestamp","missing":0,"fixed_interval":"5m","offset":0,"order":{"_key":"asc"},"keyed":false,"min_doc_count":0},"aggregations":{"maxAggportRxPackets":{"max":{"field":"port.rx.packets"}}}}}}}}'
output:

{
  "error" : {
    "root_cause" : [
      {
        "type" : "illegal_argument_exception",
        "reason" : "Field [port.rx.packets.max] of type [keyword] is not supported for aggregation [max]"
      }
    ],
    "type" : "search_phase_execution_exception",
    "reason" : "all shards failed",
    "phase" : "query",
    "grouped" : true,
    "failed_shards" : [
      {
        "shard" : 0,
        "index" : "temp_stats_roll_1",
        "node" : "jkpujITXSKCVYfYAqb6fAw",
        "reason" : {
          "type" : "illegal_argument_exception",
          "reason" : "Field [port.rx.packets.max] of type [keyword] is not supported for aggregation [max]"
        }
      }
    ],
    "caused_by" : {
      "type" : "illegal_argument_exception",
      "reason" : "Field [port.rx.packets.max] of type [keyword] is not supported for aggregation [max]",
      "caused_by" : {
        "type" : "illegal_argument_exception",
        "reason" : "Field [port.rx.packets.max] of type [keyword] is not supported for aggregation [max]"
      }
    }
  },
  "status" : 400
}

After adding above fix search queries are working , can you please validate the fix and fix accordingly .

Expected behavior
All the metrics used in the dimensions should have schema as non keywords

OpenDistro version : 1.13.2.0

@narayananaidup narayananaidup added the bug Something isn't working label Jun 3, 2021
@narayananaidup narayananaidup changed the title [BUG] Roll up target index metric schema is not proper for max,min and avg [BUG] Roll up target index metric schema is not proper for max,min Jun 4, 2021
@narayananaidup narayananaidup changed the title [BUG] Roll up target index metric schema is not proper for max,min [BUG] Roll up target index metric schema is not proper for max and min when values of type float/double Jun 4, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant