|Did this page help you? Yes | No | Tell us about it...|
Amazon Kinesis is a managed service that scales elastically for real-time processing of streaming big data.
Amazon Kinesis takes in large streams of data records that can then be consumed in real time by multiple data-processing applications that can be run on Amazon Elastic Compute Cloud (Amazon EC2) instances. The data-processing applications use the Amazon Kinesis Client Library and are called Amazon Kinesis applications. For more information about Amazon EC2, see the Amazon Elastic Compute Cloud User Guide.
Amazon Kinesis applications read data from the Amazon Kinesis stream and process the read data in real time. The processed records can be emitted to dashboards, used to generate alerts, dynamically change pricing and advertising strategies, or emit data to a variety of other Amazon big data services such as Amazon Simple Storage Service (Amazon S3), Amazon Elastic MapReduce (Amazon EMR), or Amazon Redshift. Amazon Kinesis applications can also emit data into another Amazon Kinesis stream, enabling more complex data processing. For information about Amazon Kinesis service highlights and pricing, see Amazon Kinesis.
Amazon Kinesis takes in large streams of data for processing in real time. The most common Amazon Kinesis use case scenario is rapid and continuous data intake and aggregation. The type of data used in an Amazon Kinesis use case includes IT infrastructure log data, application logs, social media, market data feeds, web clickstream data, and more. Because the response time for the data intake and processing is in real time, the processing is typically lightweight.
Amazon Kinesis enables sophisticated streaming data processing, because one Amazon Kinesis application may emit Amazon Kinesis stream data into another Amazon Kinesis stream. Near-real-time aggregation of data enables processing logic that can extract complex key performance indicators and metrics from that data. For example, complex data-processing graphs can be generated by emitting data from multiple Amazon Kinesis applications to another Amazon Kinesis stream for downstream processing by a different Amazon Kinesis application.
The following are typical scenarios for using Amazon Kinesis:
Accelerated log and data feed intake and processing: Using Amazon Kinesis you can have producers push data directly into an Amazon Kinesis stream. For example, system and application logs can be submitted to Amazon Kinesis and be available for processing in seconds. This prevents the log data from being lost if the front end or application server fails. Amazon Kinesis provides accelerated data feed intake because you are not batching up the data on the servers before you submit them for intake.
Real-time metrics and reporting: You can use data ingested into Amazon Kinesis for simple data analysis and reporting in real time. For example, metrics and reporting for system and application logs ingested into the Amazon Kinesis stream are available in real time. This enables data-processing application logic to work on data as it is streaming in, rather than wait for data batches to be sent to the data-processing applications.
Real-time data analytics: Amazon Kinesis enables real-time analytics of streaming big data, combining the power of parallel processing with the value of real-time data. For example, website clickstreams can be ingested in real time, and then site usability engagement can be analyzed by many different Amazon Kinesis client applications running in parallel.
Complex stream processing: Lastly, Amazon Kinesis enables you to create Directed Acyclic Graphs (DAGs) of Amazon Kinesis applications and data streams. This scenario typically involves emitting data from multiple Amazon Kinesis applications to another Amazon Kinesis stream for downstream processing by a different Amazon Kinesis application.
While Amazon Kinesis can be used to solve a variety of streaming data problems, a common use is the real-time aggregation of data followed by loading the aggregate data into a data warehouse or map-reduce cluster.
Data can be taken into Amazon Kinesis streams, which will ensure durability and elasticity. The delay between the time a record is added to the stream and the time it can be retrieved (put-to-get delay) is less than 10 seconds — in other words, Amazon Kinesis applications can start consuming the data from the stream less than 10 seconds after the data is added. The managed service aspect of Amazon Kinesis relieves customers of the operational burden of creating and running a data intake pipeline. Customers can create streaming map-reduce type applications, and the elasticity of the Amazon Kinesis service enables customers to scale the stream up or down, ensuring they never lose data records prior to their expiration.
Multiple Amazon Kinesis applications can consume data from an Amazon Kinesis stream, so that multiple actions, like archiving and processing, can take place concurrently and independently. For example, two Amazon Kinesis applications can read data from the same Amazon Kinesis stream. The first Amazon Kinesis application calculates running aggregates and updates an Amazon DynamoDB table, and the second Amazon Kinesis application compresses and archives data to a data store like Amazon S3. The Amazon DynamoDB table with running aggregates is then read by a dashboard for up-to-the-minute reports.
The Amazon Kinesis Client Library enables fault-tolerant consumption of data from the Amazon Kinesis stream and provides scaling support for Amazon Kinesis applications.
For more information about Amazon Kinesis, see the Amazon Kinesis Key Concepts section.
For examples of how to use Amazon EMR clusters to read and process Amazon Kinesis streams directly, see Analyze Amazon Kinesis Data in the Amazon Elastic MapReduce Developer Guide.