Deliver DynamoDB records to Amazon S3 using Kinesis Data Streams and Firehose with AWS CDK - AWS Prescriptive Guidance

Deliver DynamoDB records to Amazon S3 using Kinesis Data Streams and Firehose with AWS CDK

Created by Shashank Shrivastava (AWS) and Daniel Matuki da Cunha (AWS)

Code repository: Amazon DynamoDB ingestion into Amazon S3

Environment: PoC or pilot

Technologies: Serverless; Data lakes; Databases; Storage & backup

AWS services: AWS CDK; Amazon DynamoDB; Amazon Data Firehose; Amazon Kinesis Data Streams; AWS Lambda; Amazon S3

Summary

This pattern provides sample code and an application for delivering records from Amazon DynamoDB to Amazon Simple Storage Service (Amazon S3) by using Amazon Kinesis Data Streams and Amazon Data Firehose. The pattern’s approach uses AWS Cloud Development Kit (AWS CDK) L3 constructs and includes an example of how to perform data transformation with AWS Lambda before data is delivered to the target S3 bucket on the Amazon Web Services (AWS) Cloud.

Kinesis Data Streams records item-level modifications in DynamoDB tables and replicates them to the required Kinesis data stream. Your applications can access the Kinesis data stream and view the item-level changes in near-real time. Kinesis Data Streams also provides access to other Amazon Kinesis services, such as Firehose and Amazon Managed Service for Apache Flink. This means that you can build applications that provide real-time dashboards, generate alerts, implement dynamic pricing and advertising, and perform sophisticated data analysis.

You can use this pattern for your data integration use cases. For example, transportation vehicles or industrial equipment can send high volumes of data to a DynamoDB table. This data can then be transformed and stored in a data lake hosted in Amazon S3. You can then query and process the data and predict any potential defects by using serverless services such as Amazon Athena, Amazon Redshift Spectrum, Amazon Rekognition, and AWS Glue.

Prerequisites and limitations

Prerequisites

Architecture

The following diagram shows an example workflow for delivering records from DynamoDB to Amazon S3 by using Kinesis Data Streams and Firehose.

An example workflow for delivering records from DynamoDB to Amazon S3 using Kinesis Data Streams and Firehose.

The diagram shows the following workflow:

  1. Data is ingested using Amazon API Gateway as a proxy for DynamoDB. You can also use any other source to ingest data into DynamoDB. 

  2. Item-level changes are generated in near-real time in Kinesis Data Streams for delivery to Amazon S3.

  3. Kinesis Data Streams sends the records to Firehose for transformation and delivery. 

  4. A Lambda function converts the records from a DynamoDB record format to JSON format, which contains only the record item attribute names and values.

Tools

AWS services

  • AWS Cloud Development Kit (AWS CDK) is a software development framework that helps you define and provision AWS Cloud infrastructure in code.

  • AWS CDK Toolkit is a command line cloud development kit that helps you interact with your AWS CDK app.

  • AWS Command Line Interface (AWS CLI) is an open-source tool that helps you interact with AWS services through commands in your command-line shell.

  • AWS CloudFormation helps you set up AWS resources, provision them quickly and consistently, and manage them throughout their lifecycle across AWS accounts and AWS Regions.

Code repository

The code for this pattern is available in the GitHub aws-dynamodb-kinesisfirehose-s3-ingestion repository.

Epics

TaskDescriptionSkills required

Install the dependencies.

On your local machine, install the dependencies from the package.json files in the pattern/aws-dynamodb-kinesisstreams-s3 and sample-application directories by running the following commands:

cd <project_root>/pattern/aws-dynamodb-kinesisstreams-s3
npm install && npm run build
cd <project_root>/sample-application/
npm install && npm run build

 

App developer, General AWS

Generate the CloudFormation template.

  1. Run the cd <project_root>/sample-application/ command.

  2. Run the cdk synth command to generate the CloudFormation template.

  3. The AwsDynamodbKinesisfirehoseS3IngestionStack.template.json output is stored in the cdk.out directory.

  4. Use AWS CDK or the AWS Management Console to process the template in CloudFormation.

App developer, General AWS, AWS DevOps
TaskDescriptionSkills required

Check and deploy the resources.

  1. Run the cdk diff command to identify the resource types that are created by the AWS CDK construct.

  2. Run the cdk deploy command to deploy the resources.

App developer, General AWS, AWS DevOps
TaskDescriptionSkills required

Ingest your sample data into the DynamoDB table.

Send a request to your DynamoDB table by running the following command in AWS CLI:

aws dynamodb put-item --table-name <your_table_name> --item '{"<table_partition_key>": {"S": "<partition_key_ID>"},"MessageData":{"S": "<data>"}}'

example:

aws dynamodb put-item --table-name SourceData_table --item '{"SourceDataId": {"S": "123"},"MessageData":{"S": "Hello World"}}'

By default, the put-item doesn't return any value as output if the operation succeeds. If the operation fails, it returns an error. The data is stored in DynamoDB and then sent to Kinesis Data Streams and Firehose. 

Note: You use different approaches to add data into a DynamoDB table. For more information, see Load data into tables in the DynamoDB documentation.

App developer

Verify that a new object is created in the S3 bucket.

Sign in to the AWS Management Console and monitor the S3 bucket to verify that a new object was created with the data that you sent. 

For more information, see GetObject in the Amazon S3 documentation.

App developer, General AWS
TaskDescriptionSkills required

Clean up resources.

Run the cdk destroy command to delete all the resources used by this pattern.

App developer, General AWS

Related resources