[O.SI.1] Center observability strategies around business and technical outcomes
Category: FOUNDATIONAL
To maximize the impact of observability, it should be closely aligned with both business and technical goals. This means not only monitoring system performance, uptime, or error rates but also understanding how these factors directly or indirectly influence business outcomes such as revenue, customer satisfaction, and market growth.
Adopting the ethos that "Everything fails, all the time", famously stated by Werner Vogels, Amazon Chief Technology Officer, a successful observability strategy acknowledges this reality and continuously iterates, adapting to changes in business environments, technical architecture, user behaviors, and customer needs. It is the shared responsibility of teams, leadership, and stakeholders to establish relevant performance-related metrics to collect to measure established key performance indicators (KPIs) and desired business outcomes. Effective KPIs must be based on the desired business and technical outcomes and be relevant to the system being monitored.
An observability strategy must also identify the metrics, logs, traces, and events necessary for collection and analysis and prescribes appropriate tools and processes for gathering this data. To enhance operational efficiency, the strategy should propose guidelines for generating actionable alerts and define escalation procedures. This way, teams can augment these guidelines to suit their unique needs and contexts.
Use technical KPIs, such as the
four
golden signals
For example, one of the most important business-related KPIs for Amazon's e-commerce segment is orders per minute. A dip below the expected value for this metric could signify issues affecting customer experience or transactions, which could affect revenue and customer satisfaction. Within Amazon, teams and leaders meet regularly during weekly business reviews (WBRs) to assess the validity and quality of these metrics against organizational goals. By continuously assessing metrics against business and technical strategies, teams can proactively address potential issues before they affect the bottom line.
Related information:
-
AWS Well-Architected Sustainability Pillar: SUS02-BP02 Align SLAs with sustainability goals
-
Instrumenting distributed systems for operational visibility
-
The Importance of Key Performance Indicators (KPIs) for Large-Scale Cloud Migrations
-
Amazon's approach to high-availability deployment: Standard metrics