Example: Detecting Data Anomalies and Getting an Explanation (RANDOM_CUT_FOREST_WITH_EXPLANATION Function) - Amazon Kinesis Data Analytics for SQL Applications Developer Guide

For new projects, we recommend that you use the new Managed Service for Apache Flink Studio over Kinesis Data Analytics for SQL Applications. Managed Service for Apache Flink Studio combines ease of use with advanced analytical capabilities, enabling you to build sophisticated stream processing applications in minutes.

Example: Detecting Data Anomalies and Getting an Explanation (RANDOM_CUT_FOREST_WITH_EXPLANATION Function)

Amazon Kinesis Data Analytics provides the RANDOM_CUT_FOREST_WITH_EXPLANATION function, which assigns an anomaly score to each record based on values in the numeric columns. The function also provides an explanation of the anomaly. For more information, see RANDOM_CUT_FOREST_WITH_EXPLANATION in the Amazon Managed Service for Apache Flink SQL Reference.

In this exercise, you write application code to obtain anomaly scores for records in your application's streaming source. You also obtain an explanation for each anomaly.

First Step

Step 1: Prepare the Data