From 7eceb2baf0c72baea69f0af8c4cb4d57a042299f Mon Sep 17 00:00:00 2001 From: Jonah Calvo Date: Thu, 21 Sep 2023 15:07:38 -0500 Subject: [PATCH] Update documentation for new AD settings (#4835) * Update documentation for new AD settings Signed-off-by: Jonah Calvo * update wording for verbose Signed-off-by: Jonah Calvo * Update _data-prepper/pipelines/configuration/processors/anomaly-detector.md Co-authored-by: Melissa Vagi Signed-off-by: Jonah Calvo * Update _data-prepper/pipelines/configuration/processors/anomaly-detector.md Co-authored-by: Melissa Vagi Signed-off-by: Jonah Calvo * Update _data-prepper/pipelines/configuration/processors/anomaly-detector.md Co-authored-by: Melissa Vagi Signed-off-by: Jonah Calvo * Remove 'few' from description Signed-off-by: Jonah Calvo --------- Signed-off-by: Jonah Calvo Signed-off-by: Jonah Calvo Co-authored-by: Melissa Vagi --- .../pipelines/configuration/processors/anomaly-detector.md | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/_data-prepper/pipelines/configuration/processors/anomaly-detector.md b/_data-prepper/pipelines/configuration/processors/anomaly-detector.md index 2010c53856..9628bb6caf 100644 --- a/_data-prepper/pipelines/configuration/processors/anomaly-detector.md +++ b/_data-prepper/pipelines/configuration/processors/anomaly-detector.md @@ -18,6 +18,10 @@ You can configure the anomaly detector processor by specifying a key and the opt | :--- | :--- | :--- | | `keys` | Yes | A non-ordered `List` that is used as input to the ML algorithm to detect anomalies in the values of the keys in the list. At least one key is required. | `mode` | Yes | The ML algorithm (or model) used to detect anomalies. You must provide a mode. See [random_cut_forest mode](#random_cut_forest-mode). +| `identification_keys` | No | If provided, anomalies will be detected within each unique instance of this key. For example, if you provide the `ip` field, anomalies will be detected separately for each unique IP address. +| `cardinality_limit` | No | If using the `identification_keys` settings, a new ML model will be created for every degree of cardinality. This can cause a large amount of memory usage, so it is helpful to set a limit on the number of models. Default limit is 5000. +| `verbose` | No | RCF will try to automatically learn and reduce the number of anomalies detected. For example, if latency is consistently between 50 and 100, and then suddenly jumps to around 1000, only the first one or two data points after the transition will be detected (unless there are other spikes/anomalies). Similarly, for repeated spikes to the same level, RCF will likely eliminate many of the spikes after a few initial ones. This is because the default setting is to minimize the number of alerts detected. Setting the `verbose` setting to `true` will cause RCF to consistently detect these repeated cases, which may be useful for detecting anomalous behavior that lasts an extended period of time. + ### Keys @@ -69,4 +73,4 @@ ad-pipeline: When you run the anomaly detector processor, the processor extracts the value for the `latency` key, and then passes the value through the RCF ML algorithm. You can configure any key that comprises integers or real numbers as values. In the following example, you can configure `bytes` or `latency` as the key for an anomaly detector. -`{"ip":"1.2.3.4", "bytes":234234, "latency":0.2}` \ No newline at end of file +`{"ip":"1.2.3.4", "bytes":234234, "latency":0.2}`