[BUG] Do not use the 'search' queue for everything #875

mvanderlee · 2024-03-01T14:47:00Z

v 2.11.1

Our cluster was stable on a r5.2xlarge instance, hovering at ~10% CPU usage. Then we enabled windows detectors and even a r5.8xlarge isn't enough.

We were experimenting with detectors. But they essentially brought down our entire instance.
The main issue can be boiled down to the fact that it's all running in the same 'search' queue. The detector UI is backed by 'search', the detectors themselves are backed by 'search' etc.

Why is this the worst idea ever?
Because as the detectors fill up the queue and cause literally millions of searches to be rejected, ~48 Million per hour were observed overnight.
While this is a tuning and scaling issue, it also completely killed ingestion (our spark pipeline kept failing to write to OS and dropped it in our DLQ) and all dashboards no longer work since the UI also uses the 'search' queue.
So it wasn't just detectors that were failing. Everything started to fail. We couldn't even stop the detector because that request kept failing as well.

We have tried tuning the queues, but even a queue size of 100K is still filling up and we're still running into memory issues.

Management wanted us to try to use Detectors as they were hoping we'd no longer have to maintain our own rules engine with Sigma rules. But it can do the job with far less resources on the exact same data set and not affect anything else if it falls behind.

We are no longer moving forward with OS security analytics.

sbcd90 · 2024-03-01T17:23:21Z

hi @mvanderlee , we have a bunch of performance fixes we're planning to release for 2.13. We're aware of the high cpu & high jvmmp issues caused by running security-analytics detectors.
These issues should go away once the 2.13 release is out.

sbcd90 · 2024-03-01T17:34:48Z

also, some of the optimizations which you can already try out is using an index alias to configure a detector instead of an index pattern. Here are the steps to do it.

1. ISM Changes

Define Component Template with mappings

PUT /_component_template/test-alias-template458
{"template" : {
  "mappings": {
    "properties": {
      "hello": {
        "type": "text"
      }
    }
  }
}}


Define Index template with the component template

POST /_index_template/test-index-template458
{
  "index_patterns": [
    "test-index458-*"
  ],
  "composed_of": [
    "test-alias-template458"
  ]
}


Create Initial Index

PUT /test-index458-1
{
  "aliases": {
    "test-alias458": {
      "is_write_index": true
    }
  }
}


Index data via the alias

POST /test-alias458/_doc
{
  "hello": "world"
}

use the alias test-alias458 to create the detector now.

mvanderlee · 2024-03-01T17:34:50Z

@sbcd90 glad to hear it.
Until then, can you confirm if rejected tasks mean that events are not being analyzed by the detector, and thus not be alerted upon?

mvanderlee · 2024-03-01T17:41:10Z

And we already have aliases, but they don't show up as options in the Data source dropdown. We'll try just entering it manually.
It'd be great if it could show aliases in the UI and preferably prioritize them.

amsiglan · 2024-03-01T19:00:58Z

@mvanderlee already working on showing the aliases in the dropdown and should be available in 2.13

) (opensearch-project#875) * Fix getAlerts API for standard Alerting monitors Signed-off-by: Ashish Agrawal <[email protected]>

mvanderlee added bug Something isn't working untriaged labels Mar 1, 2024

sbcd90 removed the untriaged label Mar 1, 2024

github-project-automation bot added this to Security Analytics Roadmap Aug 30, 2024

github-project-automation bot moved this to Bugs in Security Analytics Roadmap Aug 30, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] Do not use the 'search' queue for everything #875

[BUG] Do not use the 'search' queue for everything #875

mvanderlee commented Mar 1, 2024 •

edited

Loading

sbcd90 commented Mar 1, 2024

sbcd90 commented Mar 1, 2024

mvanderlee commented Mar 1, 2024

mvanderlee commented Mar 1, 2024

amsiglan commented Mar 1, 2024

[BUG] Do not use the 'search' queue for everything #875

[BUG] Do not use the 'search' queue for everything #875

Comments

mvanderlee commented Mar 1, 2024 • edited Loading

sbcd90 commented Mar 1, 2024

sbcd90 commented Mar 1, 2024

mvanderlee commented Mar 1, 2024

mvanderlee commented Mar 1, 2024

amsiglan commented Mar 1, 2024

mvanderlee commented Mar 1, 2024 •

edited

Loading