Operational Readiness Reviews (ORR)
Publication date: June 30, 2022 (Document history)
Amazon Web Services (AWS) created the Operational Readiness Review (ORR) to distill the learnings from AWS operational incidents into curated questions with best practice guidance. This document is intended to help you understand how the AWS ORR program was built and guide you in creating your own ORR program as part of the AWS Well-Architected Framework. Creating an ORR program can help supplement Well-Architected reviews by including lessons learned that are specific to your business, culture, tools, and governance rules.
Introduction
At AWS, we strive to build and operate highly resilient services, keeping in mind that
everything fails all the time
However, we also want to stop preventable, known risks that we’ve identified in the COE process from occurring in other workloads. We, like so many of our customers, can’t afford to slow the pace of innovation; developer speed and agility is critical to our business. And given AWS’s decentralized operational culture, we needed to create a scalable, self-service mechanism to share and enforce the best practices learned from our COE analysis without slowing builders down.
To do that, we created the Operational Readiness Review (ORR). The ORR program distills the learnings from AWS operational incidents into curated questions with best practices guidance. This enables builders to create highly available, resilient systems without sacrificing agility and speed. ORR questions uncover risks and educate service teams on the implementation of best practices to prevent the reoccurrence of incidents through removing common causes of impact. We generate different checklist templates from these questions based on the workload being reviewed and the outcome we want to achieve. Teams perform self-assessments on operational risks to achieve operational excellence by reviewing the appropriate ORR checklist throughout the complete lifecycle of their service, from inception to post-release operations. ORRs helps us achieve shorter, fewer, and smaller incidents. It uses a data-driven approach for reducing risk and improving the availability and resilience of our systems.
The focus of this paper is to help you understand how to build an ORR program and develop
your own checklist questions to support both the Operational
Excellence and Reliability pillars
of the AWS Well-Architected
Framework
Are you Well-Architected?
The AWS Well-Architected
Framework