DevOps Guru concepts - Amazon DevOps Guru

DevOps Guru concepts

The following concepts are important for understanding how Amazon DevOps Guru works.

Anomaly

An anomaly represents one or more related metrics detected by DevOps Guru that are unexpected or unusual. DevOps Guru generates anomalies by using machine learning to analyze metrics and operational data that are related to your AWS resources. You specify the AWS resources that you want analyzed when you set up Amazon DevOps Guru. For more information, see Setting up Amazon DevOps Guru.

Insight

An insight is a collection of anomalies that are created during the analysis of the AWS resources you specify when you set up DevOps Guru. Each insight contains observations, recommendations, and analytical data you can use to improve your operational performance. There are two types of insights:

  • Reactive: A reactive insight identifies anomalous behavior as it occurs. It contains anomalies with recommendations, related metrics, and events to help you understand and address the issues now.

  • Proactive: A proactive insight lets you know about anomalous behavior before it occurs. It contains anomalies with recommendations to help you address the issues before they are predicted to happen.

Metrics and operational events

The anomalies that make up an insight are generated by analyzing the metrics returned by Amazon CloudWatch and operational events emitted by your AWS resources. You can view the metrics and the operational events that create an insight to help you better understand issues in your application.

Log groups and log anomalies

When you enable log anomaly detection, relevant log groups are displayed on DevOps Guru insight pages in the DevOps Guru console. A log group lets you know about critical diagnostic information about how a resource is performing and being accessed.

A log anomaly represents a cluster of similar anomalous log events found within a log group. Examples of anomalous log events that may be displayed in DevOps Guru include keyword anomalies, format anomalies, HTTP code anomalies, and more.

You can use log anomalies to diagnose the root cause of an operational issue. DevOps Guru also references log lines in insight recommendations to provide more context for recommended solutions.

Note

DevOps Guru works with Amazon CloudWatch to enable log anomaly detection. When you enable log anomaly detection, DevOps Guru adds tags to your CloudWatch log groups. When you turn off log anomaly detection, DevOps Guru removes tags from your CloudWatch log groups.

In addition, administrators should ensure that only users with permissions to view CloudWatch logs have permissions to view anomalous CloudWatch logs. We recommend that you use IAM policies to allow or deny access to the ListAnomalousLogs operation. For more information, see Identity and Access Management for DevOps Guru.

Recommendations

Each insight provides recommendations with suggestions to help you improve the performance of your application. The recommendation includes the following:

  • A description of the recommendation actions to address the anomalies that comprise the insight.

  • A list of the analyzed metrics in which DevOps Guru found anomalous behavior. Each metric includes the name of the AWS CloudFormation stack that generated the resource associated with the metrics, the resource's name, and the name of the AWS service associated with the resource.

  • A list of the events that are related to the anomalous metrics associated with the insight. Each related event contains the name of the AWS CloudFormation stack that generated the resource associated with the event, the name of the resource that generated the event, and the name of the AWS service associated with the event.

  • A list of log groups that are related to the anomalous behavior associated with the insight. Each log group contains a sample log message, information about the kinds of log anomalies reported, the times the log anomalies occurred, and a link to view the log lines on CloudWatch.