Operational excellence - Best Practices for Deploying Microsoft SQL Server on Amazon EC2

Operational excellence

Most of the discussions explored in this whitepaper pertain to the best practices available for deploying Microsoft SQL Server in AWS. However, another crucial dimension is operating and maintaining these workloads post-deployment.

As a general principle, the best practice is to assume that failures and incidents happen all the time. It’s important to be prepared and equipped to respond to these incidents. This objective is composed of three subobjectives:

  • Observe and detect anomaly

  • Detect the root cause

  • Act to resolve the problem

AWS provides tools and services for each of these purposes.

Observability and root cause analysis

Amazon CloudWatch is a service that enables real-time monitoring of AWS resources and other applications. You can use CloudWatch to collect and track metrics, which are variables you can measure for your resources and applications.

Amazon CloudWatch Application Insights for .NET and SQL Server is a feature of Amazon CloudWatch that is designed to enable operational excellence for Microsoft SQL Server and .NET applications. Once enabled, it identifies and sets up key metrics and logs across your application resources and technology stack. It continuously monitors the metrics and logs to detect anomalies and errors, while using artificial intelligence and machine learning (AI/ML) to correlate detected errors and anomalies.

When errors and anomalies are detected, Application Insights generates CloudWatch Events. To aid with troubleshooting, it creates automated dashboards for the detected problems, which include correlated metric anomalies and log errors, along with additional insights to point you to the potential root cause.

Using AWS Launch Wizard, you can choose to enable Amazon CloudWatch for Application Insights with a single click. The AWS Launch Wizard handles all the configuration necessary to make your MSSQL instance observable through Amazon CloudWatch Application Insights.

Reducing mean time to resolution (MTTR)

The automated dashboards generated by Amazon CloudWatch Application Insights help you to take swift remedial actions to keep your applications healthy and to prevent impact to the end users of your application. It also creates OpsItems so you can resolve problems using AWS Systems Manager OpsCenter.

AWS Systems Manager is a service that enables you to view and control your infrastructure in AWS, on premises, and in other clouds. OpsCenter is a capability of AWS Systems Manager, designed to reduce the mean time to resolution. OpsCenter also provides Systems Manager Automation documents (runbooks) that you can use to fully or partially automate resolution of issues.

Patch management

AWS Systems Manager Patch Manager is a comprehensive patch management solution, fully integrated with native Windows APIs, and supporting Windows Server and Linux operating systems, as well as Microsoft applications, including Microsoft SQL Server.

Systems Manager Patch Manager integrates with AWS Systems Manager Maintenance Windows, allowing you to define a predictable schedule to prevent potential disruption of business operations.

You can also use AWS Systems Manager Configuration Compliance dashboards to quickly see patch compliance state or other configuration inconsistencies across your fleet.

Conclusion

This whitepaper described a number of best practices for deploying Microsoft SQL Server workloads on AWS. It discussed how AWS services can be used to compliment Microsoft SQL Server features to address different requirements.

AWS offers the greatest breadth and depth of services in the cloud, and Amazon EC2 is the most flexible option for deploying Microsoft SQL Server workloads. Each solution and associated trade-offs may be embraced according to particular business requirements.

The five pillars of AWS Well-Architected Framework (reliability, security, performance, cost optimization, and operational excellence) are explored as applicable to SQL Server workloads and AWS services supporting each requirement are introduced.