Storing exported data in Amazon S3
Using a predefined CloudFormation template
Amazon Monitron provides a predefined AWS CloudFormation template to help quickly set up the
Firehose to deliver data from a Kinesis data stream to the Amazon S3 bucket. This
template enables dynamic partitioning and the Amazon S3 objects delivered will
use the following key format recommended by Amazon Monitron:
/project={projectName}/site={siteName}/time={yyyy-mm-dd
00:00:00}/{filename}
-
Sign into your AWS account.
-
Open a new browser tab with the following URL:
https://console.aws.amazon.com/cloudformation/home?region=us-east-1#/stacks/create/review?templateURL=https://s3.us-east-1.amazonaws.com/monitron-cloudformation-templates-us-east-1/monitron_kinesis_data_export.yaml&stackName=monitron-kinesis-live-data-export
-
On the AWS CloudFormation page that opens, in the upper right corner, select the region in which you are using Amazon Monitron.
-
By default, the template will create a new Kinesis data stream and S3 bucket along with other resources needed to deliver data to Amazon S3. You can change the parameters to use existing resources.
-
Check the box that says I acknowledge that AWS CloudFormation might create IAM resources.
-
Choose Create stack.
-
On the next page, choose the refresh icon as often as you like until the status of the stack is CREATE_COMPLETE.
Configuring Kinesis manually in the console
-
Sign in to the AWS Management Console and open the Kinesis console at https://console.aws.amazon.com/kinesis.
-
Choose Delivery streams in the navigation pane.
-
Choose Create delivery stream.
-
For Source, select Amazon Kinesis Data Streams.
-
For Destination, select Amazon S3.
-
Under Source settings, Kinesis data stream, enter the ARN of your Kinesis data stream.
-
Under delivery stream name, enter the name of your Kinesis data stream.
-
Under Desination settings, choose an Amazon S3 bucket or enter a bucket URI.
-
(optional) Enable dynamic partitioning using inline parsing for JSON. This option is appropriate if you want to partition streaming measurement data based on source information and timestamp. For example:
-
Choose Enabled for Dynamic partitioning.
-
Choose Enabled for New line delimiter.
-
Choose Enabled for Inline parsing for JSON.
-
Under Dynamic partitioning keys, add:
Key name JQ expression project
.projectDisplayName| "project=\(.)"
site
.siteDisplayName| "site=\(.)"
time
.timestamp| sub("[0-9]{2}:[0-9]{2}:[0-9]{2}.[0-9]{3}$"; "00:00:00")| "time=\(.)"
-
-
Choose Apply dynamic partitioning keys and confirm the generated Amazon S3 bucket prefix is
!{partitionKeyFromQuery:project}/!{partitionKeyFromQuery:site}/!{partitionKeyFromQuery:time}/
. -
In Amazon S3, objects will use the following key format:
/project={projectName}/site={siteName}/time={yyyy-mm-dd 00:00:00}/{filename}
. -
Choose Create delivery stream.
-
(optional) Use a more granular path.
If you chose a dynamic partition, use the preceeding Amazon S3 key format if you plan to use AWS Glue and Athena to query the data. You can also choose a finer key format, but the Amazon Athena query will not be efficient. Here is an example of setting up a finer Amazon S3 key path.
Under Dynamic partitioning keys, add:
Key name JQ expression project
.projectDisplayName| "project=\(.)"
site
.siteDisplayName| "site=\(.)"
asset
.assetDisplayName| "asset=\(.)"
position
.sensorPositionDisplayName| "position=\(.)"
sensor
.sensor.physicalId | "sensor=\(.)"
date
.timestamp| sub(" [0-9]{2}:[0-9]{2}:[0-9]{2}.[0-9]{3}$"; "")| "date=\(.)"
In Amazon S3, objects will use the following key format:
/project={projectName}/site={siteName}/asset={assetName}/position={positionName}/sensor={sensorId}/date={yyyy-mm-dd}/time={HH:MM:SS}/{filename}