Deployment framework for capturing, storing, processing, and delivering real-time streaming data - AWS Streaming Data Solution for Amazon Kinesis

Deployment framework for capturing, storing, processing, and delivering real-time streaming data

AWS Implementation Guide

AWS Solutions Builder Team

August 2020 (last update: July 2021)

The AWS Streaming Data Solution for Amazon Kinesis enables you to capture, store, process, and deliver real-time streaming data. By automatically configuring the included AWS services, this solution helps you address real-time streaming use cases, for example:

  • Capture high volume application log files

  • Analyze website clickstreams

  • Process database event streams

  • Track financial transactions

  • Aggregate social media feeds

  • Collect IT log files

  • Continuously deliver to a data lake

This solution helps accelerate your development lifecycle by minimizing or eliminating the need to model and provision resources using AWS CloudFormation, setup preconfigured Amazon CloudWatch alarms set to recommended thresholds, dashboards, and logging, and manually implement streaming data best practices. This solution is data and logic agnostic, meaning that you can start with boilerplate code and then customize it to your needs.

The solution uses templates where data flows through producers, streaming storage, consumers, and destinations. Producers continuously generate data and send it to streaming storage where it is durably captured and made available for processing by a data consumer. Data consumers process the data and then send it to a destination.

To support multiple use cases and business needs, this solution offers four AWS CloudFormation templates. You can use this solution to test new service combinations as the basis for your production environment, and to improve existing applications.

  1. Option 1 captures data from non-AWS environments (for example, mobile clients). This option uses an Amazon API Gateway as a layer of abstraction, which enables you to implement custom authentication approaches for data producers, control quotas for specific producers, and change the target Kinesis stream. This template uses AWS Lambda as the data consumer, which is best suited for use cases that don’t require internal state like filtering, business event processing, and data cleansing. AWS Lambda offers a small surface area for error scenarios and is simple to scale and operate.

  2. Option 2 is intended for use cases such as streaming extract-transform-load (ETL), real-time analytics, predictive analytics, and machine learning. It uses Apache Flink and provides a fully managed service to handle backups for snapshots, a Kinesis Data Analytics implementation of an Apache Flink Savepoint, automatically. This option also supports the Amazon Kinesis Producer Library (KPL), which is best suited when you control the code that is written to the Kinesis Data Streams. This control enables you to implement cost optimizations through buffering on the data producer and has fewer resources to manage compared to alternatives. Kinesis Data Analytics for Apache Flink is used as the data consumer, which is best suited when you require capabilities such as durable application and exactly-once processing, that are very efficient processes for high volume data streams with low latency and high availability.

  3. Option 3 uses Amazon Kinesis Data Firehose. Use this option when you want a simple way to back up incoming streaming data with minimal administration for the processing layer and ability to send data into Amazon Simple Storage Service (among other destinations) in near real time. Kinesis Data Firehose takes care of compression and encryption, minimizing the amount of storage used at the destination and increasing security.

  4. Option 4 uses Apache Flink, and showcases how to asynchronously invoke an external endpoint in a streaming application, for example, when you want to enrich or filter incoming events. The external API can be any integration supported by API Gateway, such as a Lambda function or an Amazon SageMaker endpoint.

All templates are configured to apply best practices to monitor functionality using dashboards and alarms, and to secure data

This implementation guide discusses architectural considerations and configuration steps for deploying the AWS Streaming Data Solution for Amazon Kinesis in the Amazon Web Services (AWS) Cloud. It includes links to AWS CloudFormation templates that launches and configures the AWS services required to deploy this solution using AWS best practices for security and availability.

The guide is intended for IT architects, developers, and DevOps professionals who want to get started quickly with the core streaming services available in the AWS Cloud.