Anomaly Detection for Amazon Elasticsearch Service - Amazon Elasticsearch Service

Anomaly Detection for Amazon Elasticsearch Service

Anomaly detection in Amazon Elasticsearch Service (Amazon ES) automatically detects anomalies in your Elasticsearch data in near-real time by using the Random Cut Forest (RCF) algorithm. RCF is an unsupervised machine learning algorithm that models a sketch of your incoming data stream. The algorithm computes an anomaly grade and confidence score value for each incoming data point. Anomaly detection uses these values to differentiate an anomaly from normal variations in your data.

You can pair the anomaly detection plugin with the Alerting for Amazon Elasticsearch Service plugin to notify you as soon as an anomaly is detected.

Anomaly detection requires Elasticsearch 7.4 or later. All instance types support anomaly detection except for t2.micro and t2.small. Full documentation for anomaly detection, including detailed steps and API descriptions, is available in the Open Distro for Elasticsearch documentation.

Getting Started with Anomaly Detection

To get started, choose Anomaly Detection in Kibana.

Step 1: Create a Detector

A detector is an individual anomaly detection task. You can create multiple detectors, and all the detectors can run simultaneously, with each analyzing data from different sources.

Step 2: Add Features to Your Detector

A feature is the field in your index that you check for anomalies. A detector can discover anomalies across one or more features. You must choose one of the following aggregations for each feature: average(), sum(), count(), min(), or max().

Note

The count() aggregation method is only available in Elasticsearch version 7.7 and later. For version 7.4, use a custom expression like the following:

{ "aggregation_name": { "value_count": { "field": "field_name" } } }

The aggregation method determines what constitutes an anomaly. For example, if you choose min(), the detector focuses on finding anomalies based on the minimum values of your feature. If you choose average(), the detector finds anomalies based on the average values of your feature. You can add a maximum of five features per detector.

You can configure the following optional settings (available in 7.7 and later):

  • Category field - Categorize or slice your data with a dimension like IP address, product ID, country code, and so on.

  • Window size - Set the number of aggregation intervals from your data stream to consider in a detection window.

After you set up your features, preview sample anomalies and adjust the feature settings if necessary.

Step 3: Observe the Results


                    The following visualizations are available on the anomaly detection
                        dashboard:
  • Live anomalies - displays the live anomaly results for the last 60 intervals. For example, if the interval is set to 10, it shows the results for the last 600 minutes. This chart refreshes every 30 seconds.

  • Anomaly history - plots the anomaly grade with the corresponding measure of confidence.

  • Feature breakdown - plots the features based on the aggregation method. You can vary the date-time range of the detector.

  • Anomaly occurrence - shows the Start time, End time, Data confidence, and Anomaly grade for each anomaly detected.

    If you set the category field, you see an additional Heat map chart that correlates results for anomalous entities. Choose a filled rectangle to see a more detailed view of the anomaly.

Step 4: Set Up Alerts

To create a monitor to send you notifications when any anomalies are detected, choose Set up alerts. The plugin redirects you to the Add monitor page where you can configure an alert.