OPS07-BP03 Use runbooks to perform procedures
A runbook is a documented process to achieve a specific outcome. Runbooks consist of a series of steps that someone follows to get something done. Runbooks have been used in operations going back to the early days of aviation. In cloud operations, we use runbooks to reduce risk and achieve desired outcomes. At its simplest, a runbook is a checklist to complete a task.
Runbooks are an essential part of operating your workload. From onboarding a new team member to deploying a major release, runbooks are the codified processes that provide consistent outcomes no matter who uses them. Runbooks should be published in a central location and updated as the process evolves, as updating runbooks is a key component of a change management process. They should also include guidance on error handling, tools, permissions, exceptions, and escalations in case a problem occurs.
As your organization matures, begin automating runbooks. Start with runbooks that are short and frequently used. Use scripting languages to automate steps or make steps easier to perform. As you automate the first few runbooks, you'll dedicate time to automating more complex runbooks. Over time, most of your runbooks should be automated in some way.
Desired outcome: Your team has a collection of step-by-step guides for performing workload tasks. The runbooks contain the desired outcome, necessary tools and permissions, and instructions for error handling. They are stored in a central location (version control system) and updated frequently. For example, your runbooks provide capabilities for your teams to monitor, communicate, and respond to AWS Health events for critical accounts during application alarms, operational issues, and planned lifecycle events.
Common anti-patterns:
-
Relying on memory to complete each step of a process.
-
Manually deploying changes without a checklist.
-
Different team members performing the same process but with different steps or outcomes.
-
Letting runbooks drift out of sync with system changes and automation.
Benefits of establishing this best practice:
-
Reducing error rates for manual tasks.
-
Operations are performed in a consistent manner.
-
New team members can start performing tasks sooner.
-
Runbooks can be automated to reduce toil.
Level of risk exposed if this best practice is not established: Medium
Implementation guidance
Runbooks can take several forms depending on the maturity level of your organization. At a minimum, they should consist of a step-by-step text document. The desired outcome should be clearly indicated. Clearly document necessary special permissions or tools. Provide detailed guidance on error handling and escalations in case something goes wrong. List the runbook owner and publish it in a central location. Once your runbook is documented, validate it by having someone else on your team run it. As procedures evolve, update your runbooks in accordance with your change management process.
Your text runbooks should be automated as your organization matures. Using services like AWS Systems Manager automations, you can transform flat text into automations that can be run against your workload. These automations can be run in response to events, reducing the operational burden to maintain your workload. AWS Systems Manager Automation also provides a low-code visual design experience to create automation runbooks more easily.
Customer example
AnyCompany Retail must perform database schema updates during software deployments. The Cloud Operations Team worked with the Database Administration Team to build a runbook for manually deploying these changes. The runbook listed each step in the process in checklist form. It included a section on error handling in case something went wrong. They published the runbook on their internal wiki along with their other runbooks. The Cloud Operations Team plans to automate the runbook in a future sprint.
Implementation steps
If you don't have an existing document repository, a version control repository is a great place to start building your runbook library. You can build your runbooks using Markdown. We have provided an example runbook template that you can use to start building runbooks.
# Runbook Title ## Runbook Info | Runbook ID | Description | Tools Used | Special Permissions | Runbook Author | Last Updated | Escalation POC | |-------|-------|-------|-------|-------|-------|-------| | RUN001 | What is this runbook for? What is the desired outcome? | Tools | Permissions | Your Name | 2022-09-21 | Escalation Name | ## Steps 1. Step one 2. Step two
-
If you don't have an existing documentation repository or wiki, create a new version control repository in your version control system.
-
Identify a process that does not have a runbook. An ideal process is one that is conducted semiregularly, short in number of steps, and has low impact failures.
-
In your document repository, create a new draft Markdown document using the template. Fill in Runbook Title and the required fields under Runbook Info.
-
Starting with the first step, fill in the Steps portion of the runbook.
-
Give the runbook to a team member. Have them use the runbook to validate the steps. If something is missing or needs clarity, update the runbook.
-
Publish the runbook to your internal documentation store. Once published, tell your team and other stakeholders.
-
Over time, you'll build a library of runbooks. As that library grows, start working to automate runbooks.
Level of effort for the implementation plan: Low. The minimum standard for a runbook is a step-by-step text guide. Automating runbooks can increase the implementation effort.
Resources
Related best practices:
Related documents:
Related videos:
Related examples:
-
Well-Architected Labs: Automating operations with Playbooks and Runbooks
-
AWS Systems Manager: Restore a root volume from the latest snapshot runbook
-
Building an AWS incident response runbook using Jupyter notebooks and CloudTrail Lake
-
Rubix - A Python library for building runbooks in Jupyter Notebooks
Related services: