You can develop a consumer application for Amazon Kinesis, known as an Amazon Kinesis application using the Amazon Kinesis Client Library (KCL). Although you can use the Amazon Kinesis API to get data from an Amazon Kinesis stream, we recommend using the design patterns and code for Amazon Kinesis applications provided by the KCL.
The Amazon Kinesis Client Library (KCL) helps you consume and process data from an Amazon Kinesis stream. The KCL takes care of many of the complex tasks associated with distributed computing, such as load-balancing across multiple instances, responding to instance failures, checkpointing processed records, and reacting to resharding. The KCL enables you to focus on writing record processing logic.
Note that the KCL is different from the Amazon Kinesis API that is available in the AWS SDKs. The Amazon Kinesis API helps you manage many aspects of Amazon Kinesis (including creating streams, resharding, and putting and getting records), while the KCL provides a layer of abstraction specifically for processing data. For information about the Amazon Kinesis API, see the Amazon Kinesis API Reference.
You can download the KCL from Git as follows:
For more information about the KCL Ruby support library, see KCL Ruby Gems Documentation.
An Amazon Kinesis application is simply an app that uses the KCL. An Amazon Kinesis application instantiates a worker with configuration information, and then uses a record processor to process the data received from an Amazon Kinesis stream.
You can run an Amazon Kinesis application on any number of instances. Multiple instances of the same application coordinate on failures and load-balance dynamically. You can also have multiple Amazon Kinesis applications working on the same stream, subject to throughput limits.
The KCL acts as an intermediary between your record processing logic and Amazon Kinesis.
When you start an Amazon Kinesis application, it calls the KCL to instantiate a worker. This call provides the KCL with configuration information for the application, such as the stream name and AWS credentials.
The KCL performs the following tasks:
Connects to the stream
Enumerates the shards
Coordinates shard associations with other workers (if any)
Instantiates a record processor for every shard it manages
Pulls data records from the stream
Pushes the records to the corresponding record processor
Checkpoints processed records
Balances shard-worker associations when the worker instance count changes
Balances shard-worker associations when shards are split or merged