Menu
Amazon Kinesis Streams
Developer Guide

Step 7: Finishing Up

Because you are paying to use the Amazon Kinesis stream, make sure you delete it and the corresponding DynamoDB table once you are done with it. Nominal charges will occur on an active stream even when you aren't sending and getting records. This is because an active stream is using resources by continuously "listening" for incoming records and requests to get records.

To delete the stream and table

  1. Shut down any producers and consumers that you may still have running.

  2. Open the Amazon Kinesis console at https://console.aws.amazon.com/kinesis.

  3. Choose the stream that you created for this application (StockTradeStream).

  4. Choose Delete Stream.

  5. Open the DynamoDB console at https://console.aws.amazon.com/dynamodb/.

  6. Delete the StockTradesProcessor table.

Summary

Processing a large amount of data in near real time doesn’t require writing any magical code or developing a huge infrastructure. It is as simple as writing logic to process a small amount of data (like writing processRecord(Record)) but using Streams to scale so that it works for a large amount of streaming data. You don’t have to worry about how your processing would scale because Streams handles it for you. All you have to do is send your streaming records to Streams and write the logic to process each new record received.

Here are some potential enhancements for this application.

Aggregate across all shards

Currently, you get stats resulting from aggregation of the data records received by a single worker from a single shard (a shard cannot be processed by more than one worker in a single application at the same time). Of course, when you scale and have more than one shard, you may want to aggregate across all shards. This can be done by having a pipeline architecture where the output of each worker is fed into another stream with a single shard, which is processed by a worker that aggregates the outputs of the first stage. Because the data from the first stage is limited (one sample per minute per shard), it would easily be handled by one shard.

Scale processing

When the stream scales up to have many shards (because many producers are sending data), the way to scale the processing is to add more workers. You can run the workers in EC2 instances and leverage Auto Scaling groups.

Leverage connectors to S3/DynamoDB/Redshift/Storm

As a stream is continuously processed, its output can be sent to other destinations. AWS provides connectors to integrate Streams with other AWS services and third-party tools.