Kinesis Data Analytics for Apache Flink: How It Works - Amazon Kinesis Data Analytics

Kinesis Data Analytics for Apache Flink: How It Works

Kinesis Data Analytics for Apache Flink is a fully managed AWS service that enables you to use an Apache Flink application to process streaming data.

Programming Your Apache Flink Application

An Apache Flink application is a Java or Scala application that is created with the Apache Flink framework. You author and build your Apache Flink application locally.

Most applications will primarily use the DataStream API. The other Apache Flink APIs are also available for you to use, but they are less commonly used in building streaming applications.

The Apache Flink programming model is based on two components:

  • Data stream: The structured representation of a continuous flow of data records.

  • Transformation operator: Takes one or more data streams as input, and produces one or more data streams as output.

Your application processes data by using a connector. Apache Flink uses the following types of connectors:

  • Source: A connector used to read external data.

  • Sink: A connector used to write to external locations.

  • Operator: A connector used to process data within the application.

A typical application consists of at least one data stream with a source, a data stream with one or more operators, and at least one data sink.

Creating Your Kinesis Data Analytics Application

A Kinesis Data Analytics application is an AWS resource that is hosted by the Kinesis Data Analytics service. Your Kinesis Data Analytics application hosts your Apache Flink application and provides it with the following settings:

  • Runtime Properties: Parameters that you can provide to your application. You can change these parameters without recompiling your application code.

  • Fault Tolerance: How your application recovers from interrupts and restarts.

  • Logging and Monitoring: How your application logs events to CloudWatch Logs.

  • Scaling: How your application provisions computing resources.

You create your Kinesis Data Analytics application using either the console or the AWS CLI. To get started creating a Kinesis Data Analytics application, see Getting Started.

Running Your Kinesis Data Analytics Application

When you run your Kinesis Data Analytics application, the Kinesis Data Analytics service creates an Apache Flink job. An Apache Flink job is the execution lifecycle of your Kinesis Data Analytics application. The execution of the job, and the resources it uses, are managed by the Job Manager. The Job Manager separates the execution of the application into tasks. Each task is managed by a Task Manager. When you monitor your application's performance, you can examine the performance of each Task Manager, or of the Job Manager as a whole.

For information about Apache Flink jobs, see Jobs and Scheduling in the Apache Flink Documentation.

Application and Job Status

Both your application and the application's job have a current execution status:

  • Application status: Your application has a current status that describes its current execution state. Application statuses include the following:

    • Steady application statuses: Your application typically stays in these statuses until you make a status change:

      • READY: A new or stopped application is in the READY state until you run it.

      • RUNNING: An application that has successfully started is in the RUNNING state.

      • FAILED: An application that has encountered an unrecoverable error or issue.

    • Transient application states: An application in these states is typically in the process of transitioning to another state. These states include STARTING, STOPPING, DELETING, and UPDATING.

    You can check your application's status using the console, or by using the DescribeApplication action.

  • Job status: When your application is in the RUNNING state, your job has a status that describes its current state. A job starts in the CREATED state, and then proceeds to the RUNNING state when it has started. If error conditions occur, your application enters the FAILING state, and then proceeds to either the RESTARTING or FAILED state, depending on whether the job can be restarted.

    You can check the job's status by examining your application's CloudWatch log for state changes.