Key concepts for DevOps Guru for RDS - Amazon DevOps Guru

Key concepts for DevOps Guru for RDS

An insight is generated by DevOps Guru when it detects anomalous behavior in your operational applications. An insight contains anomalies for one or more resources. An anomaly represents one or more related metrics detected by DevOps Guru that are unexpected or unusual.

An insight has a severity of high, medium, or low. The insight severity is determined by the most severe anomaly that contributed to creating the insight. For example, if the insight AWS-ECS_MemoryUtilization_and_others includes one anomaly with low severity and another with high severity, the overall severity of the insight is high.

If Amazon Aurora DB instances have Performance Insights turned on, DevOps Guru for RDS provides detailed analysis and recommendations in the anomalies for these instances. To identify an anomaly, DevOps Guru for RDS develops a baseline for database metric values. The baseline for a metric is the 95th percentile of its value over 1 week of your database history.

Causal anomalies

A causal anomaly is a top-level anomaly within an insight. Database load (DB load) is the causal anomaly for DevOps Guru for RDS. For example, the insight AWS-ECS_MemoryUtilization_and_others could have several metric anomalies, one of which is Database load (DB load) for the resource AWS/RDS.

Within an insight, the anomaly Database load (DB load) can occur for multiple Amazon Aurora DB instances. The severity of the anomaly might be different for each DB instance. For example, the severity for one DB instance might be high while the severity for the others is low. The console defaults to the anomaly with the highest severity.

Contextual anomalies

A contextual anomaly is a finding within Database load (DB load). Each contextual anomaly describes a specific Amazon Aurora performance issue that requires investigation. For example, a causal anomaly can include the following contextual anomalies:

  • CPU capacity exceeded – The CPU run queue or CPU utilization are above normal.

  • Database memory low – Processes don't have enough memory.

  • Database connections spiked – The number of database connections is above normal.

Recommendations

A contextual anomaly has at least one suggested action. The following examples are recommendations generated by DevOps Guru for RDS:

  • Tune SQL IDs list_of_IDs to reduce CPU usage, or upgrade the instance type to increase CPU capacity.

  • Review the associated spike of current database connections. Consider tuning the application pool settings to avoid frequent dynamic allocation of new database connections.

  • Look for SQL statements that perform excessive memory operations, such as in-memory sorting or large joins.

  • Investigate the heavy I/O usage for the following SQL IDs: list_of_IDs.

  • Check for statements that create large amounts of temporary data, for example those that perform large sorts or use large temporary tables.

  • Consider tuning application pool settings to avoid frequent dynamic allocation of new database connections.

  • Check applications to see what is causing the increase in database workload.

  • Consider enabling the MySQL Performance Schema.