aws-kinesisstreams-gluejob - AWS Solutions Constructs

aws-kinesisstreams-gluejob

All classes are under active development and subject to non-backward compatible changes or removal in any future version. These are not subject to the Semantic Versioning model. This means that while you may use them, you may need to update your source code when upgrading to a newer version of this package.

Language Package
Python
aws_solutions_constructs.aws_kinesis_streams_gluejob
Typescript
@aws-solutions-constructs/aws-kinesisstreams-gluejob
Java
software.amazon.awsconstructs.services.kinesisstreamsgluejob

This AWS Solutions Construct deploys an Amazon Kinesis Data Stream, and configures an AWS Glue Job to perform custom ETL transformation with the appropriate resources/properties for interaction and security. It also creates an Amazon S3 bucket where the Python script for the AWS Glue Job can be uploaded.

Here is a minimal deployable pattern definition in TypeScript:

import * as glue from '@aws-cdk/aws-glue'; import * as s3assets from '@aws-cdk/aws-s3-assets'; import { KinesisstreamsToGluejob } from '@aws-solutions-constructs/aws-kinesisstreams-gluejob'; const fieldSchema: glue.CfnTable.ColumnProperty[] = [ { name: 'id', type: 'int', comment: 'Identifier for the record', }, { name: 'name', type: 'string', comment: 'Name for the record', }, { name: 'address', type: 'string', comment: 'Address for the record', }, { name: 'value', type: 'int', comment: 'Value for the record', }, ]; const customEtlJob = new KinesisstreamsToGluejob(this, 'CustomETL', { glueJobProps: { command: { name: 'gluestreaming', pythonVersion: '3', scriptLocation: new s3assets.Asset(this, 'ScriptLocation', { path: `${__dirname}/../etl/transform.py`, }).s3ObjectUrl, }, }, fieldSchema: fieldSchema, });

Initializer

new KinesisstreamsToGluejob(scope: Construct, id: string, props: KinesisstreamsToGluejobProps);

Parameters

Pattern Construct Props

Name Type Description
kinesisStreamProps? kinesis.StreamProps Optional user-provided props to override the default props for the Amazon Kinesis Data Stream.
existingStreamObj? kinesis.Stream Existing instance of Amazon Kinesis Data Stream. If this is set, then kinesisStreamProps is ignored.
glueJobProps? cfnJob.CfnJobProps User-provided props to override the default props for the AWS Glue job.
existingGlueJob? cfnJob.CfnJob Existing instance of an AWS Glue job. If this is provided, then glueJobProps is ignored.
existingDatabase? CfnDatabase Existing AWS Glue database to be used with this construct. If not provided, the construct will create a new AWS Glue database. If this is set, then databaseProps is ignored.
databaseProps? CfnDatabaseProps User-provided props to override the default props used to create the AWS Glue database.
existingTable? CfnTable Existing instance of AWS Glue table. If this is set, then tableProps and fieldSchema are ignored.
tableProps? CfnTableProps User-provided props to override default props used to create an AWS Glue table.
fieldSchema? CfnTable.ColumnProperty[] User-provided schema structure to create an AWS Glue table.
outputDataStore? SinkDataStoreProps User-provided props for an Amazon S3 bucket that stores output from the AWS Glue job. Currently only supports Amazon S3 as the output datastore type.

SinkDataStoreProps

Name Type Description
existingS3OutputBucket? Bucket Existing instance of an Amazon S3 bucket where the data should be written. If this is provided, then outputBucketProps is ignored.
outputBucketProps BucketProps User-provided bucket properties to create the Amazon S3 bucket used to store the output from the AWS Glue job.
datastoreType SinkStoreType Sink data store type.

SinkStoreType

Enumeration of data store types that could include S3, DynamoDB, DocumentDB, RDS or Redshift. Current construct implementation only supports S3, but potential to add other output types in the future.

Name Type Description
S3 string S3 storage type

Default settings

Out-of-the-box implementation of this pattern without any overrides will set the following defaults:

Amazon Kinesis Stream

  • Configure least privilege access IAM role for the Amazon Kinesis Data Stream.

  • Enable server-side encryption for the Amazon Kinesis Stream using an AWS Managed KMS Key.

  • Deploy best-practice Amazon CloudWatch Alarms for the Amazon Kinesis Stream.

Glue Job

  • Create an AWS Glue security configuration that configures encryption for CloudWatch, Job Bookmarks, and S3. CloudWatch and Job Bookmarks are encrypted using AWS Managed KMS Key created for AWS Glue Service. The S3 bucket is configured with SSE-S3 encryption mode.

  • Configure service role policies that allow AWS Glue to read from Amazon Kinesis Data Streams.

Glue Database

  • Create an AWS Glue database. An AWS Glue table will be added to the database. This table defines the schema for the records buffered in the Amazon Kinesis Data Stream.

Glue Table

  • Create an AWS Glue table. The table schema definition is based on the JSON structure of the records buffered in the Amazon Kinesis Data Stream.

IAM Role

  • A job execution role that has privileges to 1) read the ETL script from the Amazon S3 bucket location, 2) read records from the Amazon Kinesis Data Stream, and 3) execute the Amazon Glue job.

Output S3 Bucket

  • An Amazon S3 bucket to store the output of the ETL transformation. This bucket will be passed as an argument to the created AWS Glue job so that it can be used in the ETL script to write data into it.

Architecture

GitHub

To view the code for this pattern, create/view issues and pull requests, and more:
@aws-solutions-constructs/aws-kinesisstreams-gluejob