Understand data delivery in Amazon Data Firehose
When you send data to your Firehose stream, it is automatically delivered to the destination you choose. The following table explains data delivery to different destinations.
Destination | Details |
---|---|
Amazon S3 |
For data delivery to Amazon S3, Firehose concatenates multiple incoming records based on the buffering configuration of your Firehose stream. It then delivers the records to Amazon S3 as an Amazon S3 object. By default, Firehose concatenates data without any delimiters. If you want to have new line delimiters between records, you can add new line delimiters by enabling the feature in the Firehose console configuration or API parameter. |
Amazon Redshift |
For data delivery to Amazon Redshift, Firehose first delivers incoming data to your S3 bucket in the format described earlier. Firehose then issues an Amazon Redshift COPY command to load the data from your S3 bucket to your Amazon Redshift provisioned cluster or Amazon Redshift Serverless workgroup. Ensure that after Amazon Data Firehose concatenates multiple incoming records to an Amazon S3 object, the Amazon S3 object can be copied to your Amazon Redshift provisioned cluster or Amazon Redshift Serverless workgroup. For more information, see Amazon Redshift COPY Command Data Format Parameters. |
OpenSearch Service and OpenSearch Serverless | For data delivery to OpenSearch Service and OpenSearch Serverless,
Amazon Data Firehose buffers incoming records based on the buffering configuration of
your Firehose stream. It then generates an OpenSearch Service or OpenSearch
Serverless bulk request to index multiple records to your OpenSearch Service
cluster or OpenSearch Serverless collection. Make sure that your record is
UTF-8 encoded and flattened to a single-line JSON object before you send it
to Amazon Data Firehose. Also, the rest.action.multi.allow_explicit_index
option for your OpenSearch Service cluster must be set to true (default) to
take bulk requests with an explicit index that is set per record. For more
information, see OpenSearch Service Configure Advanced Options in the
Amazon OpenSearch Service Developer Guide. |
Splunk |
For data delivery to Splunk, Amazon Data Firehose concatenates the bytes that you
send. If you want delimiters in your data, such as a new line character,
you must insert them yourself. Make sure that Splunk is configured to
parse any such delimiters. To redrive the data that was delivered to S3
error bucket (S3 backup) back to Splunk, follow the steps mentioned in
the Splunk documentation |
HTTP endpoint | For data delivery to an HTTP endpoint owned by a supported third-party service provider, you can use the integrated Amazon Lambda service to create a function to transform the incoming record(s) to the format that matches the format the service provider's integration is expecting. Contact the third-party service provider whose HTTP endpoint you've chosen for your destination to learn more about their accepted record format. |
Snowflake |
For data delivery to Snowflake, Amazon Data Firehose internally buffers data for one second and uses Snowflake streaming API operations to insert data to Snowflake. By default, records that you insert are flushed and committed to the Snowflake table every second. After you make the insert call, Firehose emits a CloudWatch metric that measures how long it took for the data to be committed to Snowflake. Firehose currently supports only single JSON item as record payload and doesn’t support JSON arrays. Make sure that your input payload is a valid JSON object and is well formed without any extra double quotes, quotes, or escape characters. |
Each Firehose destination has its own data delivery frequency. For more information, see Configure buffering hints.
Duplicate records
Amazon Data Firehose uses at-least-once semantics for data delivery. In some circumstances, such as when data delivery times out, delivery retries by Amazon Data Firehose might introduce duplicates if the original data-delivery request eventually goes through. This applies to all destination types that Amazon Data Firehose supports except Apache Iceberg Tables and Snowflake destinations.