Backup and Advanced Settings - Amazon Data Firehose

Amazon Data Firehose was previously known as Amazon Kinesis Data Firehose

Backup and Advanced Settings

This topic describes how to configure the backup and the advanced settings for your Firehose stream.

Backup Settings

Amazon Data Firehose uses Amazon S3 to backup all or failed only data that it attempts to deliver to your chosen destination.

Important

Backup settings are only supported if the source for your Firehose stream is Direct PUT or Kinesis Data Streams.

You can specify the S3 backup settings for your Firehose stream if you made one of the following choices:

  • If you set Amazon S3 as the destination for your Amazon Data Firehose Firehose stream and you choose to specify an AWS Lambda function to transform data records or if you choose to convert data record formats for your delivery stream.

  • If you set Amazon Redshift as the destination for your Amazon Data Firehose Firehose stream and you choose to specify an AWS Lambda function to transform data records.

  • If you set any of the following services as the destination for your Firehose Firehose stream: Amazon OpenSearch Service, Datadog, Dynatrace, HTTP Endpoint, LogicMonitor, MongoDB Cloud, New Relic, Splunk, or Sumo Logic.

The following are the backup settings for your Amazon Data Firehose delivery stream:

  • Source record backup in Amazon S3 - if S3 or Amazon Redshift is your selected destination, this setting indicates whether you want to enable source data backup or keep it disabled. If any other supported service (other than S3 or Amazon Redshift) is set as your selected destination, then this setting indicates if you want to backup all your source data or failed data only.

  • S3 backup bucket - this is the S3 bucket where Amazon Data Firehose backs up your data.

  • S3 backup bucket prefix - this is the prefix where Amazon Data Firehose backs up your data.

  • S3 backup bucket error output prefix - all failed data is backed up in the this S3 bucket error output prefix.

  • Buffering hints, compression and encryption for backup - Amazon Data Firehose uses Amazon S3 to backup all or failed only data that it attempts to deliver to your chosen destination. Amazon Data Firehose buffers incoming data before delivering it (backing it up) to Amazon S3. You can choose a buffer size of 1–128 MiBs and a buffer interval of 60–900 seconds. The condition that is satisfied first triggers data delivery to Amazon S3. If you enable data transformation, the buffer interval applies from the time transformed data is received by Amazon Data Firehose to the data delivery to Amazon S3. If data delivery to the destination falls behind data writing to the Firehose stream, Amazon Data Firehose raises the buffer size dynamically to catch up. This action helps ensure that all data is delivered to the destination.

  • S3 compression - choose GZIP, Snappy, Zip, or Hadoop-Compatible Snappy data compression, or no data compression. Snappy, Zip, and Hadoop-Compatible Snappy compression is not available for delivery streams with Amazon Redshift as the destination.

  • S3 file extension format (optional) – Specify a file extension format for objects delivered to Amazon S3 destination bucket. If you enable this feature, specified file extension will override default file extensions appended by Data Format Conversion or S3 compression features such as .parquet or .gz. Make sure if you configured the right file extension when you use this feature with Data Format Conversion or S3 compression. File extension must start with a period (.) and can contain allowed characters: 0-9a-z!-_.*‘(). File extension cannot exceed 128 characters.

  • Firehose supports Amazon S3 server-side encryption with AWS Key Management Service (SSE-KMS) for encrypting delivered data in Amazon S3. You can choose to use the default encryption type specified in the destination S3 bucket or to encrypt with a key from the list of AWS KMS keys that you own. If you encrypt the data with AWS KMS keys, you can use either the default AWS managed key (aws/s3) or a customer managed key. For more information, see Protecting Data Using Server-Side Encryption with AWS KMS-Managed Keys (SSE-KMS).

Advanced Settings

The following are the advanced settings for your Amazon Data Firehose delivery stream:

  • Server-side encryption - Amazon Data Firehose supports Amazon S3 server-side encryption with AWS Key Management Service (AWS KMS) for encrypting delivered data in Amazon S3. For more information, see Protecting Data Using Server-Side Encryption with AWS KMS–Managed Keys (SSE-KMS).

  • Error logging - Amazon Data Firehose logs errors related to processing and delivery. Additionally, when data transformation is enabled, it can log Lambda invocations and send data delivery errors to CloudWatch Logs. For more information, see Monitoring Amazon Data Firehose Using CloudWatch Logs.

    Important

    While optional, enabling Amazon Data Firehose error logging during Firehose stream creation is strongly recommended. This practice ensures that you can access error details in case of record processing or delivery failures.

  • Permissions - Amazon Data Firehose uses IAM roles for all the permissions that the Firehose stream needs. You can choose to create a new role where required permissions are assigned automatically, or choose an existing role created for Amazon Data Firehose. The role is used to grant Firehose access to various services, including your S3 bucket, AWS KMS key (if data encryption is enabled), and Lambda function (if data transformation is enabled). The console might create a role with placeholders. For more information, see What is IAM?.

  • Tags - You can add tags to organize your AWS resources, track costs, and control access.

    If you specify tags in the CreateDeliveryStream action, Amazon Data Firehose performs an additional authorization on the firehose:TagDeliveryStream action to verify if users have permissions to create tags. If you do not provide this permission, requests to create new Firehose delivery streams with IAM resource tags will fail with an AccessDeniedException such as following.

    AccessDeniedException User: arn:aws:sts::x:assumed-role/x/x is not authorized to perform: firehose:TagDeliveryStream on resource: arn:aws:firehose:us-east-1:x:deliverystream/x with an explicit deny in an identity-based policy.

    The following example demonstrates a policy that allows users to create a delivery stream and apply tags.

    { "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Action": "firehose:CreateDeliveryStream", "Resource": "*", } }, { "Effect": "Allow", "Action": "firehose:TagDeliveryStream", "Resource": "*", } } ] }

Once you've chosen your backup and advanced settings, review your choices, and then choose Create Firehose stream.

The new Firehose stream takes a few moments in the Creating state before it is available. After your Firehose stream is in an Active state, you can start sending data to it from your producer.