Summary Prerequisites and limitations Architecture Tools Epics Related resources

Deliver DynamoDB records to Amazon S3 using Kinesis Data Streams and Firehose with AWS CDK

Created by Shashank Shrivastava (AWS) and Daniel Matuki da Cunha (AWS)

Code repository: Amazon DynamoDB ingestion into Amazon S3	Environment: PoC or pilot	Technologies: Serverless; Data lakes; Databases; Storage & backup
AWS services: AWS CDK; Amazon DynamoDB; Amazon Data Firehose; Amazon Kinesis Data Streams; AWS Lambda; Amazon S3

Summary

This pattern provides sample code and an application for delivering records from Amazon DynamoDB to Amazon Simple Storage Service (Amazon S3) by using Amazon Kinesis Data Streams and Amazon Data Firehose. The pattern’s approach uses AWS Cloud Development Kit (AWS CDK) L3 constructs and includes an example of how to perform data transformation with AWS Lambda before data is delivered to the target S3 bucket on the Amazon Web Services (AWS) Cloud.

Kinesis Data Streams records item-level modifications in DynamoDB tables and replicates them to the required Kinesis data stream. Your applications can access the Kinesis data stream and view the item-level changes in near-real time. Kinesis Data Streams also provides access to other Amazon Kinesis services, such as Firehose and Amazon Managed Service for Apache Flink. This means that you can build applications that provide real-time dashboards, generate alerts, implement dynamic pricing and advertising, and perform sophisticated data analysis.

You can use this pattern for your data integration use cases. For example, transportation vehicles or industrial equipment can send high volumes of data to a DynamoDB table. This data can then be transformed and stored in a data lake hosted in Amazon S3. You can then query and process the data and predict any potential defects by using serverless services such as Amazon Athena, Amazon Redshift Spectrum, Amazon Rekognition, and AWS Glue.

Prerequisites and limitations

Prerequisites

An active AWS account.
AWS Command Line Interface (AWS CLI), installed and configured. For more information, see Getting started with the AWS CLI in the AWS CLI documentation.
Node.js (18.x+) and npm, installed and configured. For more information, see Downloading and installing Node.js and npm in the npm documentation.
aws-cdk (2.x+), installed and configured. For more information, see Getting started with the AWS CDK in the AWS CDK documentation.
The GitHub aws-dynamodb-kinesisfirehose-s3-ingestion repository, cloned and configured on your local machine.
Existing sample data for the DynamoDB table. The data must use the following format: {"SourceDataId": {"S": "123"},"MessageData":{"S": "Hello World"}}

Architecture

The following diagram shows an example workflow for delivering records from DynamoDB to Amazon S3 by using Kinesis Data Streams and Firehose.

An example workflow for delivering records from DynamoDB to Amazon S3 using Kinesis Data Streams and Firehose.

The diagram shows the following workflow:

Data is ingested using Amazon API Gateway as a proxy for DynamoDB. You can also use any other source to ingest data into DynamoDB.
Item-level changes are generated in near-real time in Kinesis Data Streams for delivery to Amazon S3.
Kinesis Data Streams sends the records to Firehose for transformation and delivery.
A Lambda function converts the records from a DynamoDB record format to JSON format, which contains only the record item attribute names and values.

Tools

AWS services

AWS Cloud Development Kit (AWS CDK) is a software development framework that helps you define and provision AWS Cloud infrastructure in code.
AWS CDK Toolkit is a command line cloud development kit that helps you interact with your AWS CDK app.
AWS Command Line Interface (AWS CLI) is an open-source tool that helps you interact with AWS services through commands in your command-line shell.
AWS CloudFormation helps you set up AWS resources, provision them quickly and consistently, and manage them throughout their lifecycle across AWS accounts and AWS Regions.

Code repository

The code for this pattern is available in the GitHub aws-dynamodb-kinesisfirehose-s3-ingestion repository.

Epics

Task Description Skills required

Task	Description	Skills required
Install the dependencies.	On your local machine, install the dependencies from the `package.json` files in the `pattern/aws-dynamodb-kinesisstreams-s3` and `sample-application` directories by running the following commands: `cd <project_root>/pattern/aws-dynamodb-kinesisstreams-s3` `npm install && npm run build` `cd <project_root>/sample-application/` `npm install && npm run build`	App developer, General AWS
Generate the CloudFormation template.	Run the `cd <project_root>/sample-application/` command. Run the `cdk synth` command to generate the CloudFormation template. The `AwsDynamodbKinesisfirehoseS3IngestionStack.template.json` output is stored in the `cdk.out` directory. Use AWS CDK or the AWS Management Console to process the template in CloudFormation.	App developer, General AWS, AWS DevOps

Install the dependencies.

On your local machine, install the dependencies from the package.json files in the pattern/aws-dynamodb-kinesisstreams-s3 and sample-application directories by running the following commands:


cd <project_root>/pattern/aws-dynamodb-kinesisstreams-s3


npm install && npm run build


cd <project_root>/sample-application/


npm install && npm run build

App developer, General AWS

Generate the CloudFormation template.

Run the cd <project_root>/sample-application/ command.
Run the cdk synth command to generate the CloudFormation template.
The AwsDynamodbKinesisfirehoseS3IngestionStack.template.json output is stored in the cdk.out directory.
Use AWS CDK or the AWS Management Console to process the template in CloudFormation.

App developer, General AWS, AWS DevOps

Task	Description	Skills required
Check and deploy the resources.	Run the `cdk diff` command to identify the resource types that are created by the AWS CDK construct. Run the `cdk deploy` command to deploy the resources.	App developer, General AWS, AWS DevOps

Task Description Skills required

Task	Description	Skills required
Ingest your sample data into the DynamoDB table.	Send a request to your DynamoDB table by running the following command in AWS CLI: `aws dynamodb put-item --table-name <your_table_name> --item '{"<table_partition_key>": {"S": "<partition_key_ID>"},"MessageData":{"S": "<data>"}}'` example: `aws dynamodb put-item --table-name SourceData_table --item '{"SourceDataId": {"S": "123"},"MessageData":{"S": "Hello World"}}'` By default, the `put-item` doesn't return any value as output if the operation succeeds. If the operation fails, it returns an error. The data is stored in DynamoDB and then sent to Kinesis Data Streams and Firehose. Note: You use different approaches to add data into a DynamoDB table. For more information, see Load data into tables in the DynamoDB documentation.	App developer
Verify that a new object is created in the S3 bucket.	Sign in to the AWS Management Console and monitor the S3 bucket to verify that a new object was created with the data that you sent. For more information, see GetObject in the Amazon S3 documentation.	App developer, General AWS

Ingest your sample data into the DynamoDB table.

Send a request to your DynamoDB table by running the following command in AWS CLI:

aws dynamodb put-item --table-name <your_table_name> --item '{"<table_partition_key>": {"S": "<partition_key_ID>"},"MessageData":{"S": "<data>"}}'

example:

aws dynamodb put-item --table-name SourceData_table --item '{"SourceDataId": {"S": "123"},"MessageData":{"S": "Hello World"}}'

By default, the put-item doesn't return any value as output if the operation succeeds. If the operation fails, it returns an error. The data is stored in DynamoDB and then sent to Kinesis Data Streams and Firehose.

Note: You use different approaches to add data into a DynamoDB table. For more information, see Load data into tables in the DynamoDB documentation.

App developer

Verify that a new object is created in the S3 bucket.

Sign in to the AWS Management Console and monitor the S3 bucket to verify that a new object was created with the data that you sent.

For more information, see GetObject in the Amazon S3 documentation.

App developer, General AWS

Task	Description	Skills required
Clean up resources.	Run the `cdk destroy` command to delete all the resources used by this pattern.	App developer, General AWS

Related resources

s3-static-site-stack.ts (GitHub repository)
aws-apigateway-dynamodb module (GitHub repository)
aws-kinesisstreams-kinesisfirehose-s3 module (GitHub repository)
Change data capture for DynamoDB Streams (DynamoDB documentation)
Using Kinesis Data Streams to capture changes to DynamoDB (DynamoDB documentation)

Warning Javascript is disabled or is unavailable in your browser.

To use the Amazon Web Services Documentation, Javascript must be enabled. Please refer to your browser's Help pages for instructions.

Document Conventions

Centrally manage tenants across multiple SaaS products

Integrate API Gateway with Amazon SQS