Analyzing logs with CloudWatch Logs Insights - Managed Service for Apache Flink

Amazon Managed Service for Apache Flink was previously known as Amazon Kinesis Data Analytics for Apache Flink.

Analyzing logs with CloudWatch Logs Insights

After you've added a CloudWatch logging option to your application as described in the previous section, you can use CloudWatch Logs Insights to query your log streams for specific events or errors.

CloudWatch Logs Insights enables you to interactively search and analyze your log data in CloudWatch Logs.

For information on getting started with CloudWatch Logs Insights, see Analyze Log Data with CloudWatch Logs Insights.

Run a sample query

This section describes how to run a sample CloudWatch Logs Insights query.

Prerequisites

  • Existing log groups and log streams set up in CloudWatch Logs.

  • Existing logs stored in CloudWatch Logs.

If you use services such as AWS CloudTrail, Amazon RouteĀ 53, or Amazon VPC, you've probably already set up logs from those services to go to CloudWatch Logs. For more information about sending logs to CloudWatch Logs, see Getting Started with CloudWatch Logs.

Queries in CloudWatch Logs Insights return either a set of fields from log events, or the result of a mathematical aggregation or other operation performed on log events. This section demonstrates a query that returns a list of log events.

To run a CloudWatch Logs Insights sample query
  1. Open the CloudWatch console at https://console.aws.amazon.com/cloudwatch/.

  2. In the navigation pane, choose Insights.

  3. The query editor near the top of the screen contains a default query that returns the 20 most recent log events. Above the query editor, select a log group to query.

    When you select a log group, CloudWatch Logs Insights automatically detects fields in the data in the log group and displays them in Discovered fields in the right pane. It also displays a bar graph of log events in this log group over time. This bar graph shows the distribution of events in the log group that matches your query and time range, not just the events displayed in the table.

  4. Choose Run query.

    The results of the query appear. In this example, the results are the most recent 20 log events of any type.

  5. To see all of the fields for one of the returned log events, choose the arrow to the left of that log event.

For more information about how to run and modify CloudWatch Logs Insights queries, see Run and Modify a Sample Query.

Example queries

This section contains CloudWatch Logs Insights example queries for analyzing Managed Service for Apache Flink application logs. These queries search for several example error conditions, and serve as templates for writing queries that find other error conditions.

Note

Replace the Region (us-west-2), Account ID (012345678901) and application name (YourApplication) in the following query examples with your application's Region and your Account ID.

Analyze operations: Distribution of tasks

The following CloudWatch Logs Insights query returns the number of tasks the Apache Flink Job Manager distributes between Task Managers. You need to set the query's time frame to match one job run so that the query doesn't return tasks from previous jobs. For more information about Parallelism, see Scaling.

fields @timestamp, message | filter message like /Deploying/ | parse message " to flink-taskmanager-*" as @tmid | stats count(*) by @tmid | sort @timestamp desc | limit 2000

The following CloudWatch Logs Insights query returns the subtasks assigned to each Task Manager. The total number of subtasks is the sum of every task's parallelism. Task parallelism is derived from operator parallelism, and is the same as the application's parallelism by default, unless you change it in code by specifying setParallelism. For more information about setting operator parallelism, see Setting the Parallelism: Operator Level in the Apache Flink documentation.

fields @timestamp, @tmid, @subtask | filter message like /Deploying/ | parse message "Deploying * to flink-taskmanager-*" as @subtask, @tmid | sort @timestamp desc | limit 2000

For more information about task scheduling, see Jobs and Scheduling in the Apache Flink documentation.

Analyze operations: Change in parallelism

The following CloudWatch Logs Insights query returns changes to an application's parallelism (for example, due to automatic scaling). This query also returns manual changes to the application's parallelism. For more information about automatic scaling, see Automatic scaling.

fields @timestamp, @parallelism | filter message like /property: parallelism.default, / | parse message "default, *" as @parallelism | sort @timestamp asc

Analyze errors: Access denied

The following CloudWatch Logs Insights query returns Access Denied logs.

fields @timestamp, @message, @messageType | filter applicationARN like /arn:aws:kinesisanalyticsus-west-2:012345678901:application\/YourApplication/ | filter @message like /AccessDenied/ | sort @timestamp desc

Analyze errors: Source or sink not found

The following CloudWatch Logs Insights query returns ResourceNotFound logs. ResourceNotFound logs result if a Kinesis source or sink is not found.

fields @timestamp,@message | filter applicationARN like /arn:aws:kinesisanalyticsus-west-2:012345678901:application\/YourApplication/ | filter @message like /ResourceNotFoundException/ | sort @timestamp desc

Analyze errors: Application task-related failures

The following CloudWatch Logs Insights query returns an application's task-related failure logs. These logs result if an application's status switches from RUNNING to RESTARTING.

fields @timestamp,@message | filter applicationARN like /arn:aws:kinesisanalyticsus-west-2:012345678901:application\/YourApplication/ | filter @message like /switched from RUNNING to RESTARTING/ | sort @timestamp desc

For applications using Apache Flink version 1.8.2 and prior, task-related failures will result in the application status switching from RUNNING to FAILED instead. When using Apache Flink 1.8.2 and prior, use the following query to search for application task-related failures:

fields @timestamp,@message | filter applicationARN like /arn:aws:kinesisanalyticsus-west-2:012345678901:application\/YourApplication/ | filter @message like /switched from RUNNING to FAILED/ | sort @timestamp desc