Amazon Kinesis Data Analytics
Developer Guide

Troubleshooting Kinesis Data Analytics for Java Applications

The following can help you troubleshoot problems that you might encounter with Amazon Kinesis Data Analytics for Java Applications.

General Troubleshooting: Analyze Logs

You can investigate issues with your application by querying your application's CloudWatch logs.

For information about setting up and analyzing CloudWatch logs, see Logging and Monitoring.

Compile error: "Could not resolve dependencies for project"

In order to compile the Kinesis Data Analytics for Java Applications sample applications, you must first download and compile the Apache Flink Kinesis connector and add it to your local Maven repository. If the connector hasn't been added to your repository, a compile error similar to the following appears:

Could not resolve dependencies for project your project name: Failure to find org.apache.flink:flink-connector-kinesis_2.11:jar:1.6.2 in https://repo.maven.apache.org/maven2 was cached in the local repository, resolution will not be reattempted until the update interval of central has elapsed or updates are forced

To resolve this error, you must download the Apache Flink source code (version 1.6.2 from https://flink.apache.org/downloads.html) for the connector and build it as described in the Apache Flink documentation.

Invalid Choice: 'kinesisanalyticsv2'

To use v2 of the Kinesis Data Analytics API, you need the latest version of the AWS Command Line Interface (AWS CLI).

For information about upgrading the AWS CLI, see Installing the AWS Command Line Interface in the AWS Command Line Interface User Guide.

Application Is in RUNNING State but Not Processing Data

You can check your application state using either the ListApplications or the DescribeApplication actions. If your application enters the RUNNING state but is not writing data to your sink, you can troubleshoot the issue by adding an Amazon CloudWatch log stream to your application. For more information, see Working with Application CloudWatch Logging Options. The log stream contains messages that you can use to troubleshoot application issues.

Snapshot Fails to Be Created

Kinesis Data Analytics takes a snapshot of the application during an UpdateApplication or StopApplication request. The service then uses this snapshot state and restores the application using the updated application configuration to provide exactly-once processing semantics.

The service can't take a snapshot of the application under the following circumstances:

  • The application exceeded the snapshot limit. The limit for snapshots is 1,000. For more information, see Snapshots.

  • The application is not in a healthy state.

  • The application does not have permissions to access its source or sink.

  • The application code is not functioning properly.

  • The application is experiencing other configuration issues.

If you get an exception while taking a snapshot during an application update or while stopping the application, check the application's CloudWatch logs for errors, and retry the request. You can also retry the request by setting the SnapshotsEnabled property of your application's ApplicationSnapshotConfiguration to false.

After the application returns to a healthy state, we recommend that you set the SnapshotsEnabled property to true.

You can set the SnapshotsEnabled property using the UpdateApplication action. The following UpdateApplication example sets the SnapshotsEnabled property to true:

aws kinesisanalyticsv2 update-application \ --application-name MyApplication \ --current-application-version-id 10 \ --application-configuration-update '{"ApplicationSnapshotConfigurationUpdate":{"SnapshotsEnabledUpdate":true}}'

You can also update the SnapshotsEnabled property using the console.

Update the SnapshotsEnabled Property Using the Console

  1. Open the Kinesis Data Analytics console at https://console.aws.amazon.com/kinesisanalytics.

  2. In the Kinesis Data Analytics console, choose your application.

  3. In your application's page, choose Configure.

  4. In the Snapshots section, choose Enable.

    
                    Screenshot showing the Snapshots section.

Throughput Is Too Slow or MillisBehindLatest Is Increasing

If the application metrics are showing that throughput is too slow or the MillisBehindLatest metric is steadily increasing, do the following:

  • Enable auto scaling if it is disabled, or increase application parallelism. For more information, see Scaling.

  • Check if the application is logging every record being processed. Logging each record during times when the application has high throughput will cause severe bottlenecks in data processing. To check for this condition, query your logs for log entries that your application writes with every record it processes. For more information, see Analyzing Logs with CloudWatch Logs Insights.

  • Verify that your application uses Apache Flink version 1.6.2. Using later versions of Apache Flink may reduce throughput.

Downtime Is Not Zero

If the Downtime metric is not zero, the application is not healthy. Common causes of this condition include the following:

  • Your application is under-provisioning sources and sinks. Check that any sources or sinks used in the application are well-provisioned, and are not experiencing read or write throttling.

    If the source or sink is a Kinesis data stream, check the metrics for the stream for ReadProvisionedThroughputExceeded or WriteProvisionedThroughputExceeded errors.

    You can investigate the causes of this condition by querying your application logs for changes from your application's state from RUNNING to FAILED. For more information, see Analyze Errors: Application Task-Related Failures.

  • If any exception in an operator in your application is unhandled, the application fails over (by interpreting that the failure cannot be handled by operator) and the application will restart from the latest checkpoint to maintain "exactly-once" processing semantics. This will lead to Downtime being not zero during these restart periods. In order to prevent this from happening, we recommend that you handle any retryable exceptions in the application code.