Amazon Kinesis Data Analytics for Apache Flink - Amazon Timestream

Amazon Kinesis Data Analytics for Apache Flink

You can use Apache Flink to transfer your time series data from Amazon Kinesis, Amazon MSK, Apache Kafka, and other streaming technologies directly into Amazon Timestream. We've created an Apache Flink sample data connector for Timestream. We’ve also created a sample application for sending data to Amazon Kinesis so the data can flow from Amazon Kinesis to Amazon Kinesis Data Analytics, and finally on to Amazon Timestream. All of these artifacts are available to you in GitHub. This video tutorial describes the setup.

Note

Java 1.8 is the required version for using Kinesis Data Analytics for Apache Flink Application. If you have multiple Java versions ensure to export Java 1.8 to your JAVA_HOME environment variable.

Sample Application

To get started, follow the procedure below:

  1. Create a database in Timestream with the name kdaflink following the instructions described in Create a database

  2. Create a table in Timestream with the name kinesisdata1 following the instructions described in Create a table

  3. Create an Amazon Kinesis Data Stream with the name TimestreamTestStream following the instructions described in Creating a Stream

  4. Clone the GitHub repository for the Apache Flink data connector for Timestream following the instructions from GitHub

  5. Compile the Apache Flink data connector for Timestream following the instructions below

    1. Ensure that you have Apache Maven installed. You can test your Apache Maven install with the following command

      mvn -version
    2. The latest version of Apache Flink that Kinesis Data Analytics supports is 1.8.2. To download and install Apache Flink version 1.8.2 you can follow these steps

      1. Download the Apache Flink version 1.8.2 source code

        wget https://archive.apache.org/dist/flink/flink-1.8.2/flink-1.8.2-src.tgz
      2. Uncompress the Apache Flink source code and install Apache Flink

        tar -xvf flink-1.8.2-src.tgz cd flink-1.8.2 mvn clean install -Pinclude-kinesis -DskipTests
      3. Go the flink_connector directory to compile and run the Apache Flink data connector for Timestream

        mvn clean compile mvn exec:java - Dexec.mainClass="com.amazonaws.services.kinesisanalytics.StreamingJob" - Dexec.args="--InputStreamName TimestreamTestStream --Region us-east-1 -- TimestreamDbName kdaflink --TimestreamTableName kinesisdata1"
      4. By default, the Timestream data connector for Apache Flink batches records in batch of 50. This can be adjusted using `--TimestreamIngestBatchSize` option

        mvn exec:java -Dexec.mainClass="com.amazonaws.services.kinesisanalytics.StreamingJob" -Dexec.args="--InputStreamName TimestreamTestStream --Region us-east-1 --TimestreamDbName kdaflink --TimestreamTableName kinesisdata1 --TimestreamIngestBatchSize 75"
  6. Compile the Kinesis Data Analytics application following the instructions for Compiling the Application Code

  7. Upload the Kinesis Data Analytics application binary following the instructions to Upload the Apache Flink Streaming Code

    1. After clicking on Create Application, click on the link of the IAM Role for the application

    2. Attach the IAM policies for AmazonKinesisReadOnlyAccess and AmazonTimestreamFullAccess.

      Note

      The above IAM policies are not restricted to specific resources and are unsuitable for production use. For a production system, consider using policies that restrict access to specific resources.

  8. Clone the GitHub repository for the sample application writing data to Kinesis following the instructions from GitHub

  9. Follow the instructions in the README to run the sample application for writing data to Kinesis

  10. Run one or more queries in Timestream to ensure that data is being sent from Kinesis to Kinesis Data Analytics to Timestream following the instructions to Create a table

Video Tutorial

This video explains how to use Timestream with Kinesis Data Analytics for Apache Flink.