OPS02-BP03 Operations activities have identified owners responsible for their performance - AWS Well-Architected Framework

OPS02-BP03 Operations activities have identified owners responsible for their performance

Understand who has responsibility to perform specific activities on defined workloads and why that responsibility exists. Understanding who has responsibility to perform activities informs who will conduct the activity, validate the result, and provide feedback to the owner of the activity.

Desired outcome:

Your organization clearly defines responsibilites to perform specific activities on defined workloads and respond to events generated by the workload. The organization documents ownership of processes and fulfillment and makes this information discoverable. You review and update responsibilities when organizational changes take place, and teams track and measure the performance of defect and inefficiency identification activities. You implement feedback mechanisms to track defects and improvements and support iterative improvement.

Common anti-patterns:

  • You do not document responsibilities.

  • Fragemented scripts exist on isolated operator workstations. Only a few individuals know how to use them or informally refer to them as team knowledge.

  • A legacy process is due for update, but no one knows who owns the process, and the original author is no longer part of the organization.

  • Processes and scripts can't be discovered, and they are not readily available when required (for example, in response to an incident).

Benefits of establishing this best practice:

  • You understand who is responsible to perform an activity, who to notify when action is needed, and who performs the action, validates the result, and provides feedback to the owner of the activity.

  • Processes and procedures boost your efforts to operate your workloads.

  • New team members become effective more quickly.

  • You reduce the time it takes to mitigate incidents.

  • Different teams use the same processes and procedures to perform tasks in a consistent manner.

  • Teams can scale their processes with repeatable processes.

  • Standardized processes and procedures help mitigate the impact of transferring workload responsibilties between teams.

Level of risk exposed if this best practice is not established: High

Implementation guidance

To begin to define responsibilities, start with existing documentation, like responsibility matrices, processes and procedures, roles and responsibilities, and tools and automation. Review and host discussions on the responsibilities for documented processes. Review with teams to identify misalignments between document responsibilities and processes. Discuss services offered with internal customers of that team to identify expectations gaps between teams.

Analyze and address the discrepancies. Identify opportunities to improvement, and look for frequently requested, resource-intensive activities, which are typically strong candidates for improvement. Explore best practices, patterns, and prescriptive guidance to simplify and standardize improvements. Record improvement opportunities, and track the improvements to completion.

Over time, these procedures should be evolved to be run as code, reducing the need for human intervention. For example, procedures can be initiated as AWS Lambda functions, AWS CloudFormation templates, or AWS Systems Manager Automation documents. Verify that these procedures are version-controlled in appropriate repositories, and include suitable resource tagging so that teams can readily identify owners and documentation. Document the responsibility for carrying out the activities, and then monitor the automations for successful initiation and operation, as well as performance of the desired outcomes.

Customer example

AnyCompany Retail defines ownership as the team or individual that owns processes for an application or groups of applications that share common architectural practices and technologies. Initially, the company documents the processes and procedures as step-by-step guides in the document management system. They make the procedures discoverable using tags on the AWS account that hosts the application and on specific groups of resources within the account, using AWS Organizations to manage their AWS accounts. Over time, AnyCompany Retail converts these processes to code and defines resources using infrastructure as code (through services like CloudFormation or AWS Cloud Development Kit (AWS CDK) templates). The operational processes become Automation documents in AWS Systems Manager or AWS Lambda functions, which can be initiated as scheduled tasks in response to events such as Amazon CloudWatch alarms or Amazon EventBridge events or by requests within an IT service management (ITSM) platform. All process have tags to identify who owns them. Teams manage documentation for the automation and process within the wiki pages generated by the code repository for the process.

Implementation steps

  1. Document the existing processes and procedures.

    1. Review and verify that they are up-to-date.

    2. Verify that each process or procedure has an owner.

    3. Place the procedures under version control.

    4. Where possible, share processes and procedures across workloads and environments that share architectural designs.

  2. Establish mechanisms for feedback and improvement.

    1. Define policies for how frequently processes should be reviewed.

    2. Define processes for reviewers and approvers.

    3. Implement issues or a ticketing queue to provide and track feedback.

    4. Wherever possible, provide pre-approval and risk classification for processes and procedures from a change approval board (CAB).

  3. Make process and procedures accessible and discoverable by users who need to run them.

    1. Use tags to indicate where the process and procedures can accessed for the workload.

    2. Use meaningful error and event messaging to indicate the appropriate process or proceedure to address the issue.

    3. Use wikis or document management to make processes and procedures consistently searchable across the organization.

  4. Automate when it is appropriate to do so.

    1. Where services and technologies provide an API, develop automations.

    2. Verify that processes are well-understood, and develop the user stories and requirements to automate those processes.

    3. Measure the successful use of processes and procedures, with issue tracking to support iterative improvement.

Level of effort for the implementation plan: Medium

Resources

Related best practices:

Related documents:

Related videos:

Related examples: