Menu
Amazon Kinesis Firehose
Developer Guide

Amazon Kinesis Firehose Data Delivery

After data is sent to your delivery stream, it is automatically delivered to the destination you choose.

Data Delivery Format

For data delivery to Amazon S3, Kinesis Firehose concatenates multiple incoming records based on buffering configuration of your delivery stream, and then delivers them to Amazon S3 as an S3 object. You may want to add a record separator at the end of each record before you send it to Kinesis Firehose so that you can divide a delivered S3 object to individual records.

For data delivery to Amazon Redshift, Kinesis Firehose first delivers incoming data to your S3 bucket in the format described earlier. Kinesis Firehose then issues an Amazon Redshift COPY command to load the data from your S3 bucket to your Amazon Redshift cluster. You need to make sure that after Kinesis Firehose concatenates multiple incoming records to an S3 object, the S3 object can be copied to your Amazon Redshift cluster. For more information, see Amazon Redshift COPY Command Data Format Parameters.

For data delivery to Amazon ES, Kinesis Firehose buffers incoming records based on buffering configuration of your delivery stream, and then generates an Elasticsearch bulk request to index multiple records to your Elasticsearch cluster. You need to make sure that your record is UTF-8 encoded and flattened to a single line JSON object before you send it to Kinesis Firehose. Also, the rest.action.multi.allow_explicit_index option for your Elasticsearch cluster needs to be set to true (default) in order to take bulk requests with an explicit index that is set per record. For more information, see Amazon ES Configure Advanced Options in the Amazon Elasticsearch Service Developer Guide.

Data Delivery Frequency

Each Kinesis Firehose destination has its own data delivery frequency.

Amazon S3

The frequency of data delivery to Amazon S3 is determined by the S3 Buffer size and Buffer interval value you configured for your delivery stream. Kinesis Firehose buffers incoming data before delivering it to Amazon S3. You can configure the values for S3 Buffer size (1 MB to 128 MB) or Buffer interval (60 to 900 seconds), and the condition satisfied first triggers data delivery to Amazon S3. Note that in circumstances where data delivery to the destination is falling behind data writing to the delivery stream, Kinesis Firehose raises the buffer size dynamically to catch up and make sure that all data is delivered to the destination.

Amazon Redshift

The frequency of data COPY operations from Amazon S3 to Amazon Redshift is determined by how fast your Amazon Redshift cluster can finish the COPY command. If there is still data to copy, Kinesis Firehose issues a new COPY command as soon as the previous COPY command is successfully finished by Amazon Redshift.

Amazon Elasticsearch Service

The frequency of data delivery to Amazon ES is determined by the Elasticsearch Buffer size and Buffer interval values that you configured for your delivery stream. Kinesis Firehose buffers incoming data before delivering it to Amazon ES. You can configure the values for Elasticsearch Buffer size (1 MB to 100 MB) or Buffer interval (60 to 900 seconds), and the condition satisfied first triggers data delivery to Amazon ES. Note that in circumstances where data delivery to the destination is falling behind data writing to the delivery stream, Kinesis Firehose raises the buffer size dynamically to catch up and make sure that all data is delivered to the destination.

Data Delivery Failure Handling

Each Kinesis Firehose destination has its own data delivery failure handling.

Amazon S3

Data delivery to your S3 bucket might fail for reasons such as the bucket doesn’t exist anymore, the IAM role that Kinesis Firehose assumes doesn’t have access to the bucket, network failure, or similar events. Under these conditions, Kinesis Firehose keeps retrying for up to 24 hours until the delivery succeeds. The maximum data storage time of Kinesis Firehose is 24 hours and your data is lost if data delivery fails for more than 24 hours.

Amazon Redshift

For the Amazon Redshift destination, you can specify a retry duration (0-7200 seconds) when creating a delivery stream.

Data delivery to your Amazon Redshift cluster might fail for reasons such as incorrect Amazon Redshift cluster configuration of your delivery stream, Amazon Redshift cluster under maintenance, network failure, or similar events. Under these conditions, Kinesis Firehose retries for the specified time duration and skips that particular batch of S3 objects. The skipped objects' information is delivered to your S3 bucket as a manifest file in the errors/ folder, which you can use for manual backfill. For information about how to COPY data manually with manifest files, see Using a Manifest to Specify Data Files.

Amazon Elasticsearch Service

For the Amazon ES destination, you can specify a retry duration (0-7200 seconds) when creating a delivery stream.

Data delivery to your Amazon ES cluster might fail for reasons such as an incorrect Amazon ES cluster configuration of your delivery stream, an Amazon ES cluster under maintenance, network failure, or similar events. Under these conditions, Kinesis Firehose retries for the specified time duration and then skips that particular index request. The skipped documents are delivered to your S3 bucket in the elasticsearch_failed/ folder, which you can use for manual backfill. Each document has the following JSON format:

Copy
{ "attemptsMade": "(number of index requests attempted)", "arrivalTimestamp": "(the time when the document was received by Firehose)", "errorCode": "(http error code returned by Elasticsearch)", "errorMessage": "(error message returned by Elasticsearch)", "attemptEndingTimestamp": "(the time when Firehose stopped attempting index request)", "esDocumentId": "(intended Elasticsearch document ID)", "esIndexName": "(intended Elasticsearch index name)", "esTypeName": "(intended Elasticsearch type name)", "rawData": "(base64-encoded document data)" }

Amazon S3 Object Name Format

Kinesis Firehose adds a UTC time prefix in the format YYYY/MM/DD/HH before writing objects to Amazon S3. This prefix creates a logical hierarchy in the bucket, where each forward slash (/) creates a level in the hierarchy. You can modify this structure by adding to the start of the prefix when you create the Kinesis Firehose delivery stream. For example, add myApp/ to use the myApp/YYYY/MM/DD/HH prefix or myApp to use the myApp YYYY/MM/DD/HH prefix.

The S3 object name follows the pattern DeliveryStreamName-DeliveryStreamVersion-YYYY-MM-DD-HH-MM-SS-RandomString, where DeliveryStreamVersion begins with 1 and increases by 1 for every configuration change of the Kinesis Firehose delivery stream. You can change Kinesis Firehose delivery stream configurations (for example, the name of the S3 bucket, buffering hints, compression, and encryption) with the Kinesis Firehose console, or by using the UpdateDestination API operation.

Index Rotation for the Amazon ES Destination

For the Amazon ES destination, you can specify a time-based index rotation option from one of the following five options: NoRotation, OneHour, OneDay, OneWeek, OneMonth.

Depending on the rotation option you choose, Kinesis Firehose appends a portion of the UTC arrival timestamp to your specified index name, and rotates the appended timestamp accordingly. The following example shows the resulting index name in Amazon ES for each index rotation option, where specified index name is myindex and the arrival timestamp is 2016-02-25T13:00:00Z.

RotationPeriod IndexName
NoRotation myindex
OneHour myindex-2016-02-25-13
OneDay myindex-2016-02-25
OneWeek myindex-2016-w08
OneMonth myindex-2016-02