Scenario 4: Device sensors real-time anomaly detection and notifications - Streaming Data Solutions on AWS

Scenario 4: Device sensors real-time anomaly detection and notifications

Company ABC4Logistics transports highly flammable petroleum products such as gasoline, liquid propane (LPG), and naphtha from the port to various cities. There are hundreds of vehicles which have multiple sensors installed on them for monitoring things such as location, engine temperature, temperature inside the container, driving speed, parking location, road conditions, and so on. One of the requirements ABC4Logistics has is to monitor the temperatures of the engine and the container in real-time and alert the driver and the fleet monitoring team in case of any anomaly. To detect such conditions and generate alerts in real-time, ABC4Logistics implemented the following architecture on AWS.


        ABC4Logistics’s device sensors real-time anomaly detection and notifications
          architecture

ABC4Logistics’s device sensors real-time anomaly detection and notifications architecture

Data from device sensors is ingested by AWS IoT Gateway, where the AWS IoT rules engine will make the streaming data available in Amazon Kinesis Data Streams. Using Managed Service for Apache Flink, ABC4Logistics can perform the real-time analytics on streaming data in Kinesis Data Streams.

Using Managed Service for Apache Flink, ABC4Logistics can detect if temperature readings from the sensors deviate from the normal readings over a period of ten seconds, and ingest the record onto another Kinesis Data Streams instance, identifying the anomalous records. Amazon Kinesis Data Streams then invokes Lambda functions, which can send the alerts to the driver and the fleet monitoring team through Amazon SNS.

Data in Kinesis Data Streams is also pushed down to Amazon Data Firehose. Amazon Data Firehose persists this data in Amazon S3, allowing ABC4Logistics to perform batch or near-real time analytics on sensor data. ABC4Logistics uses Amazon Athena to query data in Amazon S3, and Amazon QuickSight for visualizations. For long-term data retention, the S3 Lifecycle policy is used to archive data to Amazon S3 Glacier.

Important components of this architecture are detailed next.

Amazon Managed Service for Apache Flink

Amazon Managed Service for Apache Flink enables you to transform and analyze streaming data and respond to anomalies in real time. It is a serverless service on AWS, which means Managed Service for Apache Flink takes care of provisioning, and elastically scales the infrastructure to handle any data throughput. This takes away all the undifferentiated heavy lifting of setting up and managing the streaming infrastructure, and enables you to spend more time on writing steaming applications.

With Amazon Managed Service for Apache Flink, you can interactively query streaming data using multiple options, including Standard SQL, Apache Flink applications in Java, Python and Scala, and build Apache Beam applications using Java to analyze data streams.

These options provide you with flexibility of using a specific approach depending on the complexity level of streaming application and source/target support. The following section discusses Managed Service for Apache Flink for Flink applications.

Apache Flink is a popular open-source framework and distributed processing engine for stateful computations over unbounded and bounded data streams. Apache Flink is designed to perform computations at in-memory speed and at scale with support for exactly-one semantics. Apache Flink-based applications help achieve low latency with high throughput in a fault tolerant manner.

With Amazon Managed Service for Apache Flink, you can author and run code against streaming sources to perform time series analytics, feed real-time dashboards, and create real-time metrics without managing the complex distributed Apache Flink environment. You can use the high-level Flink programming features in the same way that you use them when hosting the Flink infrastructure yourself.

Managed Service for Apache Flink enables you to create applications in Java, Scala, Python or SQL to process and analyze streaming data. A typical Flink application reads the data from the input stream or data location or source, transforms/filters or joins data using operators or functions, and stores the data on output stream or data location, or sink.

The following architecture diagram shows some of the supported sources and sinks for the Apache Flink application. In addition to the pre-bundled connectors for source/sink, you can also bring in custom connectors to a variety of other source/sinks for Flink Applications on Managed Service for Apache Flink.


          Apache Flink application on Managed Service for Apache Flink for real-time stream processing

Apache Flink application on Managed Service for Apache Flink for real-time stream processing

Developers can use their preferred IDE to develop Flink applications and deploy them on Managed Service for Apache Flink from AWS Management Console or DevOps tools.

Amazon Managed Service for Apache Flink Studio

As part of Managed Service for Apache Flink service, Managed Service for Apache Flink Studio is available for customers to interactively query data streams in real time, and easily build and run stream processing applications using SQL, Python, and Scala. Studio notebooks are powered by Apache Zeppelin.

Using Studio notebook, you have the ability to develop your Flink Application code in a notebook environment, view results of your code in real time, and visualize it within your notebook. You can create a Studio Notebook powered by Apache Zeppelin and Apache Flink with a single click from Kinesis Data Streams and Amazon MSK console, or launch it from Managed Service for Apache Flink Console.

Once you develop the code iteratively as part of the Managed Service for Apache Flink Studio, you can deploy a notebook as a Apache Flink application, to run in streaming mode continuously, reading data from your sources, writing to your destinations, maintaining long-running application state, and scaling automatically based on the throughput of your source streams. Earlier, customers used Managed Service for Apache Flink for SQL Applications for such interactive analytics of real-time streaming data on AWS.

Managed Service for Apache Flink for SQL applications is still available, but for new projects, AWS recommends that you use the new Managed Service for Apache Flink Studio. Managed Service for Apache Flink Studio combines ease of use with advanced analytical capabilities, which makes it possible to build sophisticated stream processing applications in minutes.

For making the Apache Flink application fault-tolerant, you can make use of checkpointing and snapshots, as described in the Implementing Fault Tolerance in Managed Service for Apache Flink.

Apache Flink applications are useful for writing complex streaming analytics applications such as applications with exactly-one semantics of data processing, checkpointing capabilities, and processing data from data sources such as Kinesis Data Streams, Firehose, Amazon MSK, Rabbit MQ, and Apache Cassandra including Custom Connectors.

After processing streaming data in the Flink application, you can persist data to various sinks or destinations such as Amazon Kinesis Data Streams, Amazon Data Firehose, Amazon DynamoDB, Amazon OpenSearch Service, Amazon Timestream, Amazon S3, and so on. The Apache Flink application also provides sub-second performance guarantees.

Apache Beam applications for Managed Service for Apache Flink

Apache Beam is a programming model for processing streaming data. Apache Beam provides a portable API layer for building sophisticated data-parallel processing pipelines that may be run across a diversity of engines, or runners such as Flink, Spark Streaming, Apache Samza, and so on.

You can use the Apache Beam framework with your Apache Flink application to process streaming data. Flink applications that use Apache Beam use Apache Flink runner to run Beam pipelines.

Summary

By making use of the AWS streaming services Amazon Kinesis Data Streams, Amazon Managed Service for Apache Flink, and Amazon Data Firehose,

ABC4Logistics can detect anomalous patterns in temperature readings and notify the driver and the fleet management team in real-time, preventing major accidents such as complete vehicle breakdown or fire.