Key concepts for DevOps Guru for RDS - Amazon DevOps Guru

Key concepts for DevOps Guru for RDS

An insight is generated by DevOps Guru when it detects anomalous or problematic behavior in your operational applications. An insight contains anomalies for one or more resources. An anomaly represents one or more related metrics detected by DevOps Guru that are unexpected or unusual.

An insight has a severity of high, medium, or low. The insight severity is determined by the most severe anomaly that contributed to creating the insight. For example, if the insight AWS-ECS_MemoryUtilization_and_others includes one anomaly with low severity and another with high severity, the overall severity of the insight is high.

If Amazon RDS DB instances have Performance Insights turned on, DevOps Guru for RDS provides detailed analysis and recommendations in the anomalies for these instances. To identify an anomaly, DevOps Guru for RDS develops a baseline for database metric values. DevOps Guru for RDS then compares current metric values to the historical baseline.

Proactive insights

A proactive insight lets you know about problematic behavior before it occurs. It contains anomalies with recommendations and related metrics to help you address the issues before they become bigger problems.

Each proactive insight page provides details about one anomaly.

Reactive insights

A reactive insight identifies anomalous behavior as it occurs. It contains anomalies with recommendations, related metrics, and events to help you understand and address the issues now.

Causal anomalies

A causal anomaly is a top-level anomaly within a reactive insight. It is shown as the Primary metric on the anomaly details page in the DevOps Guru console.Database load (DB load) is the causal anomaly for DevOps Guru for RDS. For example, the insight AWS-ECS_MemoryUtilization_and_others could have several metric anomalies, one of which is Database load (DB load) for the resource AWS/RDS.

Within an insight, the anomaly Database load (DB load) can occur for multiple Amazon RDS DB instances. The severity of the anomaly might be different for each DB instance. For example, the severity for one DB instance might be high while the severity for the others is low. The console defaults to the anomaly with the highest severity.

Contextual anomalies

A contextual anomaly is a finding within Database load (DB load) that is related to a reactive insight. It is displayed in the Related metrics section of the anomaly details page in the DevOps Guru console. Each contextual anomaly describes a specific Amazon RDS performance issue that requires investigation. For example, a causal anomaly can include the following contextual anomalies:

  • CPU capacity exceeded – The CPU run queue or CPU utilization are above normal.

  • Database memory low – Processes don't have enough memory.

  • Database connections spiked – The number of database connections is above normal.


Each insight has at least one suggested action. The following examples are recommendations generated by DevOps Guru for RDS:

  • Tune SQL IDs list_of_IDs to reduce CPU usage, or upgrade the instance type to increase CPU capacity.

  • Review the associated spike of current database connections. Consider tuning the application pool settings to avoid frequent dynamic allocation of new database connections.

  • Look for SQL statements that perform excessive memory operations, such as in-memory sorting or large joins.

  • Investigate the heavy I/O usage for the following SQL IDs: list_of_IDs.

  • Check for statements that create large amounts of temporary data, for example those that perform large sorts or use large temporary tables.

  • Check applications to see what is causing the increase in database workload.

  • Consider enabling the MySQL Performance Schema.

  • Check for long-running transactions and end them with a commit or rollback.

  • Configure the idle_in_transaction_session_timeout parameter to end any session that has been in the 'idle in transaction' state for longer than the specified time.