Streaming Analytics Pipeline
Streaming Analytics Pipeline

Design Considerations

Regional Deployment

The Streaming Analytics Pipeline uses AWS Lambda and Amazon Kinesis Data Analytics. Therefore, you must deploy this solution in an AWS Region that supports both Lambda and Amazon Kinesis Data Analytics. As of the date of publication, this includes the US East (N. Virginia) Region, the US West (Oregon) Region, and the EU (Ireland) Region.

Streaming Data Format

Amazon Kinesis Data Analytics allows you to specify a schema to classify your streaming data before it executes SQL queries against your input Kinesis stream. If you specify a strict schema for all records, the analysis could fail if some records do not match the expected format specified in the schema. For this solution, consider applying a flexible schema to your streaming data to ensure all data is collected. Then, refine the schema using standard SQL.

Shard Count

The number of shards you need for a new Kinesis stream depends on the amount of streaming data you plan to produce. Each shard can support up to 1,000 records per second for writes, up to a maximum total data write rate of 1 MB per second (including partition keys). For example, an application that produces 100 records per second with a size of 35 kilobytes per record for a total data input rate of 3.4 megabytes per second needs 4 shards.

The Streaming Analytics Pipeline AWS Lambda function processes data at a default rate of 1,000 records per second. But, you can adjust the timeout and batch size to accommodate faster processing and delivery of raw data.

While there is no upper limit to the number of shards in a stream or account, each region has a default shard limit. For information on shard limits, please visit Amazon Kinesis Data Streams Limits. To request an increase in your shard limit, please use the Stream Limits form.

Multiple External Destinations

Amazon Kinesis Data Analytics allows users to specify up to three external destinations for analyzed data. By default, the Streaming Analytics Pipeline allows users to specify a single external destination for their analyzed data. For customers who want to send analyzed data to multiple external destinations, this solution includes a template (add-output) to allow you to specify multiple destinations without the use of the AWS Command Line Interface or a custom script.