OPS09-BP03 Collect and analyze operations metrics
Perform regular, proactive reviews of metrics to identify trends and determine where appropriate responses are needed.
You should aggregate log data from the processing of your operations activities and operations API calls, into a service such as CloudWatch Logs. Generate metrics from observations of necessary log content to gain insight into the performance of operations activities.
On AWS, you can
export
your log data to Amazon S3 or
send
logs directly to
Amazon S3
Common anti-patterns:
-
Consistent delivery of new features is considered a key performance indicator. You have no method to measure how frequently deployments occur.
-
You log deployments, rolled back deployments, patches, and rolled back patches to track you operations activities, but no one reviews the metrics.
-
You have a recovery time objective to restore a lost database within fifteen minutes that was defined when the system was deployed and had no users. You now have ten thousand users and have been operating for two years. A recent restore took over two hours. This was not recorded and no one is aware.
Benefits of establishing this best practice: By collecting and analyzing your operations metrics, you gain understanding of the health of your operations and can gain insight to trends that have may an impact on your operations or the achievement of your business outcomes.
Level of risk exposed if this best practice is not established: High
Implementation guidance
-
Collect and analyze operations metrics: Perform regular proactive reviews of metrics to identify trends and determine where appropriate responses are needed.
Resources
Related documents: