Adds population job how to guide.

elastic · Sep 5, 2024 · 3991bf4 · 3991bf4
1 parent 437db55
commit 3991bf4
Show file tree

Hide file tree

Showing 5 changed files with 96 additions and 10 deletions.
diff --git a/docs/en/stack/ml/anomaly-detection/images/ml-population-anomalies.png b/docs/en/stack/ml/anomaly-detection/images/ml-population-anomalies.png
diff --git a/docs/en/stack/ml/anomaly-detection/images/ml-population-anomaly.png b/docs/en/stack/ml/anomaly-detection/images/ml-population-anomaly.png
diff --git a/docs/en/stack/ml/anomaly-detection/images/ml-population-wizard.png b/docs/en/stack/ml/anomaly-detection/images/ml-population-wizard.png
diff --git a/docs/en/stack/ml/anomaly-detection/ml-population-analysis.asciidoc b/docs/en/stack/ml/anomaly-detection/ml-population-analysis.asciidoc
@@ -0,0 +1,96 @@
+[[ml-configuring-populations]]
+= Performing population analysis
+
+Population analysis is a method of detecting anomalies by comparing the behavior of entities or events within a specified population.
+In this approach, {ml} analytics create a profile of what is considered "typical" behavior for users, machines, or other entities over a specified time period.
+An entity is considered as anomalous when its behavior deviates from that of the population, indicating abnormal activity compared to the rest of the population.
+
+This type of analysis is most effective when the behavior within a group is generally homogeneous, allowing for the identification of unusual patterns.
+However, it is less useful when members of the population show vastly different behaviors.
+In such cases, you can segment your data into groups with similar behaviors and run separate jobs for each.
+This can be done by using a query filter in the datafeed or by applying the `partition_field_name` to split the analysis across different groups.
+
+Population analysis is resource-efficient and scales well, enabling the analysis of populations consisting of hundreds of thousands or even millions of entities with a lower resource footprint than analyzing each series individually.
+
+
+
+[discrete]
+[[population-recommendations]]
+== Recommendations
+
+* Use population analysis when the behavior within a group is mostly homogeneous, as it helps identify anomalous patterns effectively.
+* Leverage population analysis when dealing with large-scale datasets.
+* Avoid using population analysis when members of the population exhibit vastly different behaviors, as it may not be effective.
+
+
+[discrete]
+[[creating-population-jobs]]
+== Creating population jobs
+
+. In {kib}, navigate to **{ml-app} > Anomaly Detection > Jobs**.
+. Click **Create {anomaly-jobs}**, select the {data-view} you want to analyze.
+. Select the **Population** wizard from the list.
+. Choose a population field - it's the `clientip` field in this example - and the metric you want to use for the analysis - `Mean(bytes)` in this example.
++
+--
+[role="screenshot"]
+image::images/ml-population-wizard.png[Creating a population job in Kibana]
+--
+. Click **Next**.
+. Provide a job ID and click **Next**.
+. If the validation is successful, click **Next** to review the summary of the job creation.
+. Click **Create job**.
+
+[%collapsible]
+.API example
+====
+To specify the population, use the `over_field_name` property. For example:
+
+[source,console]
+----------------------------------
+PUT _ml/anomaly_detectors/population
+{
+  "description" : "Population analysis",
+  "analysis_config" : {
+    "bucket_span":"15m",
+    "influencers": [
+      "clientip"
+    ],
+    "detectors": [
+      {
+        "function": "mean",
+        "field_name": "bytes",
+        "over_field_name": "clientip" <1>
+      }
+    ]
+  },
+  "data_description" : {
+    "time_field":"timestamp",
+    "time_format": "epoch_ms"
+  }
+}
+----------------------------------
+// TEST[skip:needs-licence]
+
+<1> This `over_field_name` property indicates that the metrics for each client (as identified by their IP address) are analyzed relative to other clients in each bucket.
+====
+
+[discrete]
+[[population-job-results]]
+=== Viewing the job results
+
+Use the **Anomaly Explorer** in {kib} to view the analysis results:
+
+[role="screenshot"]
+image::images/ml-population-anomalies.png["Population results in the Anomaly Explorer"]
+
+The results are often quite sparse.
+There might be just a few data points for the selected time period.
+Population analysis is particularly useful when you have many entities and the data for specific entitles is sporadic or sparse.
+If you click on a section in the timeline or swim lanes, you can see more details about the anomalies:
+
+[role="screenshot"]
+image::images/ml-population-anomaly.png["Anomaly details for a specific user"]
+
+In this example, the client IP address `167.145.234.154` received a high volume of bytes on the date and time shown.
+This event is anomalous because the mean is four times higher than the expected behavior of the population.
diff --git a/docs/en/stack/ml/redirects.asciidoc b/docs/en/stack/ml/redirects.asciidoc
@@ -13,16 +13,6 @@ This page has moved. See <<ml-ad-datafeeds>>.
 
 This page has moved. See <<ml-ad-datafeeds>>.
 
-[role="exclude",id="ml-configuring-pop"]
-=== Performing population analysis
-
-This page has been removed. Refer to <<population-jobs>>.
-
-[role="exclude",id="ml-configuring-populations"]
-=== Configuring population analysis
-
-This page has been removed. Refer to <<population-jobs>>.
-
 [role="exclude",id="ml-inference-models"]
 === Trained {ml} models as functions