Use Amazon Kinesis to collect and process large streams of data records in real time.
You'll create data-processing applications, known as Amazon Kinesis applications. A typical Amazon Kinesis application takes data from data generators called producers and puts it into an Amazon Kinesis stream as data records. These applications can use the Amazon Kinesis Client Library, and they can run on Amazon EC2 instances. The processed records can be sent to dashboards, used to generate alerts, dynamically change pricing and advertising strategies, or send data to a variety of other AWS services. For information about Amazon Kinesis features and pricing, see Amazon Kinesis.
For more information about AWS big data solutions, see Big Data.
You can use Amazon Kinesis for rapid and continuous data intake and aggregation. The type of data used includes IT infrastructure log data, application logs, social media, market data feeds, and web clickstream data. Because the response time for the data intake and processing is in real time, the processing is typically lightweight.
The following are typical scenarios for using Amazon Kinesis:
You can have producers push data directly into a stream. For example, push system and application logs and they'll be available for processing in seconds. This prevents the log data from being lost if the front end or application server fails. Amazon Kinesis provides accelerated data feed intake because you don't batch the data on the servers before you submit it for intake.
You can use data collected into Amazon Kinesis for simple data analysis and reporting in real time. For example, your data-processing application can work on metrics and reporting for system and application logs as the data is streaming in, rather than wait to receive batches of data.
This combines the power of parallel processing with the value of real-time data. For example, process website clickstreams in real time, and then analyze site usability engagement using multiple different Amazon Kinesis applications running in parallel.
You can create Directed Acyclic Graphs (DAGs) of Amazon Kinesis applications and data streams. This typically involves putting data from multiple Amazon Kinesis applications into another stream for downstream processing by a different Amazon Kinesis application.
While you can use Amazon Kinesis to solve a variety of streaming data problems, a common use is the real-time aggregation of data followed by loading the aggregate data into a data warehouse or map-reduce cluster.
Data is put into Amazon Kinesis streams, which ensures durability and elasticity. The delay between the time a record is put into the stream and the time it can be retrieved (put-to-get delay) is typically less than 1 second — in other words, an Amazon Kinesis application can start consuming the data from the stream almost immediately after the data is added. The managed service aspect of Amazon Kinesis relieves you of the operational burden of creating and running a data intake pipeline. You can create streaming map-reduce type applications, and the elasticity of Amazon Kinesis enables you to scale the stream up or down, so that you never lose data records prior to their expiration.
Multiple Amazon Kinesis applications can consume data from a stream, so that multiple actions, like archiving and processing, can take place concurrently and independently. For example, two applications can read data from the same stream. The first application calculates running aggregates and updates a DynamoDB table, and the second application compresses and archives data to a data store like Amazon S3. The DynamoDB table with running aggregates is then read by a dashboard for up-to-the-minute reports.
The Amazon Kinesis Client Library enables fault-tolerant consumption of data from streams and provides scaling support for Amazon Kinesis applications.
For examples of how to use Amazon EMR clusters to read and process Amazon Kinesis streams directly, see Analyze Amazon Kinesis Data in the Amazon Elastic MapReduce Developer Guide.