Stage 3: Inspect, adapt and iterate
After you implement your observability system, we recommend that you continually review,
assess, learn, adapt, and improve your implementation. You can use the AWS Observability Maturity Model
Implement regular reviews
Observability is an iterative process. It requires regular audits and assessments of existing components, and changes and enhancements to drive continual improvement. We recommend that you perform regular reviews to reevaluate SLOs, alert thresholds, dashboards, metric granularity, retention policies, sampling strategies, and so on to ensure that these are driving value for your teams and business. By connecting observability costs to specific teams and services, you can enable data-driven decisions about coverage and resource allocation.
At Amazon, we conduct weekly Operational Readiness Reviews (ORRs) to audit teams' processes and observability postures against best practices. This is a non-blocking exercise that aligns with the number of services and frequency of releases at Amazon.
Depending on the size of your organization, you can also have a business as usual
(BAU) roster, where one member of each team is responsible for reporting on anomalies
and trends, uncovering unknown-unknowns, removing unwanted instrumentation and alerts,
improving dashboards, and ensuring that the observability solution continues to work for
the team and is aligned to the team's objectives and success metrics. This could also be
an opportunity to reassess the alerting strategy to be more responsive, proactive, and
closer to the user. The goal with these reviews is to create a virtuous cycle, as shown
in the following illustration, and to improve the maturity of your observability posture
maturity, as described in the AWS Observability Maturity Model

Identify the playbooks that are accessed most frequently and consider improving your application or adding more instrumentation. Identify the runbooks that are executed most frequently and consider automating those runbooks.
The learnings from these reviews are also shared with the observability squad and specialists, to highlight improvements in central programs and the observability platform. For example, depending on the frequency of deployment-triggered events, you might decide to prioritize the improvement of the deployment pipeline over other components. If the MTTR is higher because of monitoring gaps, you can prioritize improving the observability platform and its configuration.
Celebrate wins
Share success stories from teams that use observability tools. For example, highlight the success of a team that used observability metrics to implement an alternative solution that is more efficient and leads to lower latency or cost. Communicating this success underscores the importance of observability and motivates other teams to improve their observability posture and strive for similar success.
Learn from incidents
Conduct blameless post-incident exercises similar to the correction of errors
(COE)