Reliability pillar
The reliability pillar focuses on workloads performing their intended functions and how they can recover quickly from failure to meet demands. The following recommendations can help you meet the reliability design principles and architectural best practices for AWS Managed Microsoft AD.
Key focus areas
Distributed system design
Recovery planning
Adapting to changing requirements
Automatically recover from failure
Make sure that your IP subnet allocation accounts for expansion and availability.
Activate multi-Regional replication. For more information, see Lab 4 – Enable Multi-Region with AWS Managed Microsoft AD
in the Active Directory on AWS Immersion Day Workshop.
Test recovery procedures
Practice restoring the directory from a snapshot. For more information, see Restoring your directory from a snapshot and Creating a snapshot of your directory in the AWS Directory Service Administration Guide.
Scale horizontally to increase aggregate workload availability, and don't guess capacity
Automate AWS Managed Microsoft AD scaling based on utilization metrics. For more information, see How to automate AWS Managed Microsoft AD scaling based on utilization metrics
on the AWS Blog. Load test before rolling changes out to production. For more information, see How to use the Active Directory Performance Testing Tool on Windows Server 2012
on the Microsoft Blog.
Manage change in automation
Apply infrastructure as a code (IaC) to deploy AWS Managed Microsoft AD. For more information, see the GitHub quickstart-microsoft-activedirectory
Automate Microsoft Active Directory operations procedures whenever possible. For example, it’s a best practice to automate the management of user objects, group objects, and Group Policy Objects (GPOs).
Manage quotas and constraints
Monitor and manage AWS Managed Microsoft AD quotas. For more information, watch the View and manage quotas for AWS services using service quotas
video on the AWS YouTube channel. Make sure that a sufficient gap exists between the current quotas and the maximum usage to accommodate failover.
Accommodate fixed service quotas and constraints through your architecture.