Amazon Kinesis Data Analytics for Apache Flink
You can use Apache Flink to transfer your time series data from Amazon Kinesis Data
Analytics, Amazon MSK, Apache Kafka,
and other streaming technologies directly into Amazon Timestream.
We've created an Apache Flink sample data connector for Timestream.
We’ve also created a sample application for sending data to
Amazon Kinesis so that the data can flow from Kinesis to Kinesis Data Analytics, and
finally on to Amazon Timestream.
All of these artifacts are available to you in GitHub. This video tutorial
Java 1.8 is the required version for using Kinesis Data Analytics for Apache Flink Application. If you have multiple Java versions ensure to export Java 1.8 to your JAVA_HOME environment variable.
Topics
Sample Application
To get started, follow the procedure below:
-
Create a database in Timestream with the name
kdaflink
following the instructions described in Create a database -
Create a table in Timestream with the name
kinesisdata1
following the instructions described in Create a table -
Create an Amazon Kinesis Data Stream with the name
TimestreamTestStream
following the instructions described in Creating a Stream -
Clone the GitHub repository for the Apache Flink data connector for Timestream
following the instructions from GitHub -
Compile the Apache Flink data connector for Timestream following the instructions below
-
Ensure that you have Apache Maven
installed. You can test your Apache Maven install with the following command mvn -version
-
The latest version of Apache Flink that Kinesis Data Analytics supports is 1.8.2. To download and install Apache Flink version 1.8.2 you can follow these steps
-
Download the Apache Flink version 1.8.2 source code
wget https://archive.apache.org/dist/flink/flink-1.8.2/flink-1.8.2-src.tgz
-
Uncompress the Apache Flink source code and install Apache Flink
tar -xvf flink-1.8.2-src.tgz cd flink-1.8.2 mvn clean install -Pinclude-kinesis -DskipTests
-
Go the flink_connector directory to compile and run the Apache Flink data connector for Timestream
mvn clean compile mvn exec:java - Dexec.mainClass="com.amazonaws.services.kinesisanalytics.StreamingJob" - Dexec.args="--InputStreamName TimestreamTestStream --Region us-east-1 -- TimestreamDbName kdaflink --TimestreamTableName kinesisdata1"
-
By default, the Timestream data connector for Apache Flink batches records in batch of 50. This can be adjusted using `--TimestreamIngestBatchSize` option
mvn exec:java -Dexec.mainClass="com.amazonaws.services.kinesisanalytics.StreamingJob" -Dexec.args="--InputStreamName TimestreamTestStream --Region us-east-1 --TimestreamDbName kdaflink --TimestreamTableName kinesisdata1 --TimestreamIngestBatchSize 75"
-
-
-
Compile the Kinesis Data Analytics application following the instructions for Compiling the Application Code
-
Upload the Kinesis Data Analytics application binary following the instructions to Upload the Apache Flink Streaming Code
-
After clicking on Create Application, click on the link of the IAM Role for the application
-
Attach the IAM policies for AmazonKinesisReadOnlyAccess and AmazonTimestreamFullAccess.
Note The above IAM policies are not restricted to specific resources and are unsuitable for production use. For a production system, consider using policies that restrict access to specific resources.
-
-
Clone the GitHub repository for the sample application writing data to Kinesis
following the instructions from GitHub -
Follow the instructions in the README
to run the sample application for writing data to Kinesis -
Run one or more queries in Timestream to ensure that data is being sent from Kinesis to Kinesis Data Analytics to Timestream following the instructions to Create a table
Video Tutorial
This video