Migrating Windows failover clusters - AWS Prescriptive Guidance

Migrating Windows failover clusters

A Microsoft failover cluster is a group of servers with mostly shared storage between them. You can use failover clusters to facilitate high availability for your applications and services. You can also migrate your failover clusters to the AWS Cloud to benefit from its reliability, performance, and lower TCO.

Windows failover clusters work differently in the cloud than in on-premises environments. It's important to note that only multi-subnet clusters can be deployed in the cloud. Unlike in on-premises environments, the IP address in a Windows failover cluster is assigned to an Elastic Network Adapter (ENA) rather than at the operating system level. In an on-premises environment, the operating system handles IP address assignment, but a cloud provider (AWS) handles the IP address assignment in the cloud. Because failover clustering is an operating system level feature it can't take control of the IP failover. Therefore, the same IP can't fail over between nodes. To work around that, you can use multi-subnet clusters where clusters fail over to a secondary IP. The secondary IP is assigned to ENA in another subnet and can come online. For more information, see Failover Clustering Networking Basics and Fundamentals in the Microsoft documentation.

Migrating a Windows failover cluster to AWS can be a complex process, but with careful planning and implementation it can be done with minimal disruption to your business operations. For example, every application is configured differently on a failover cluster, so it's imperative to understand its needs and then find out how they can be met in the cloud beforehand. The process involves the following steps:

  • Ensuring that all cluster nodes are running the same version of Windows and all necessary updates

  • Configuring the cluster quorum

  • Ensuring that all applications and data are backed up and can be restored during the migration

Assess

The assess phase is a critical step in the process of migrating a failover cluster to AWS. During this phase, you gather information about your current environment, determine the feasibility of migrating to AWS, and identify any potential challenges or risks. We recommend that you follow these steps during the assess phase:

  • Assess the readiness of your applications – Determine whether your applications can be migrated to AWS without modifications or if they need to be updated or rewritten to take advantage of cloud-native services.

  • Evaluate your networking and security requirements Determine your network and security requirements, including the configuration of firewalls, load balancers, and VPNs.

  • Assess your data migration requirements Determine how your data gets migrated to AWS, including the size and location of your data, the time required for the migration, and any data transfer costs. In an on-premises environment, you might be using diverse storage technologies like JBOD, NAS, and SAN. Each one can present data to your application through different access methods, such as SAN Fiber Channel, iSCSI, SAS, or SMB/NFS shares.

  • Identify potential risks and challenges Identify any potential risks or challenges that could impact the migration process, such as downtime, compatibility issues, or data loss.

  • Estimate costs Estimate the cost of migrating to AWS, including the cost of EC2 instances, storage, data transfer, and any other AWS services required.

  • Create a migration plan – Based on the information gathered during the assess phase, create a detailed migration plan that includes timelines, required resources, and the steps involved in migrating to AWS.

Evaluate your current environment

Assess your current environment, including the hardware and software configurations, to determine what needs to be migrated to AWS. Identify any dependencies between applications, servers, and databases.

Determine your migration strategy

Consider your options for migrating to AWS, including a lift-and-shift approach or re-architecting your environment to take advantage of cloud-native services.

  • Traditional failover cluster migration – If you're manually configuring a Microsoft failover cluster from scratch, you can follow the steps in part 1 of Launch Microsoft SQL Server on Amazon EC2 Windows instance (YouTube). Shared storage is one of the most important considerations for a failover cluster migration. Amazon EBS multi-attach doesn’t support SCSI-3 Persistent Reservation, but Amazon FSx for Windows File Server and FSx for NetApp ONTAP both work well as shared storage options. One of the most common use cases is using an Always On Failover Cluster Instance for a SQL Server cluster with Amazon FSx for Windows File Server. For more information, see the Simplify your Microsoft SQL Server high availability deployments using Amazon FSx for Windows File Server post in the AWS Storage Blog. The next step is bringing the nodes to the cloud. This can be achieved by using Application Migration Service. For more information, see the Migrating your Microsoft Windows clusters to AWS using CloudEndure Migration post in the AWS Storage Blog. Then, you can configure a clustered role for your application to provide high availability.

  • Migrating with virtually no downtime using a stretch cluster – A stretch cluster could be a good fit if you have a business-critical application to migrate to the cloud and can't afford downtime. With a Microsoft stretch cluster, Site A and Site B must communicate with each other over a network but they can have their own individual shared storage. You can use this to your advantage in a migration scenario. For example, your source (whether it's on-premises or in another provider's cloud) can be Site A, which has network connectivity with an Amazon VPC where you deploy site B. After Site B is up and running, you can cut over to site B. The data replication mechanism is critical in this approach because your source storage technology might have limiting factors in terms of what replication method could work.

  • Migrating a failover cluster deployed on VMware on-premises to VMware in the cloud on AWS VMware Cloud on AWS has native support for SCSI-3 Persistent Reservation. This makes it possible to host a failover cluster on a virtual machine disk (VMDK) on VMware Cloud on AWS. For more information, see Migrating SQL Server FCI cluster with shared disks to VMware Cloud on AWS in the VMware documentation.

    Notice

    As of April 30, 2024, VMware Cloud on AWS is no longer resold by AWS or its channel partners. The service will continue to be available through Broadcom. We encourage you to reach out to your AWS representative for details.

  • Migrating a SQL Server FCI by using Amazon EBS Multi-Attach volumes – You can use Amazon EBS Multi-Attach and NVMe reservations to create SQL Server Failover Cluster Instances (FCIs) with Amazon EBS io2 volumes as the shared storage on Windows Server failover clusters. These volumes can be attached only to instances that are in the same Availability Zone. Deploying Windows Server failover clusters by using Amazon EBS io2 volumes requires the latest Windows drivers that translate SCSI reservation commands to NVMe reservation commands. For more information about migrating your on-premises SQL Server FCIs to AWS in a single Availability Zone by using this approach, see the AWS blog post How to deploy a SQL Server failover cluster with Amazon EBS Multi-Attach on Windows Server.

The assess phase is critical for ensuring a successful migration of your failover cluster to AWS. If you take the time to gather information and identify potential challenges, you can develop a comprehensive migration plan that minimizes downtime, reduces risk, and ensures a smooth transition to AWS.

Mobilize

During the migration of a failover cluster to AWS, the mobilize phase involves preparing the cluster for migration to AWS and testing it to ensure its functioning properly. The mobilize phase includes the following steps:

  1. Prepare the target environment – In this step, you create the AWS resources needed to host the failover cluster. This involves setting up an Amazon VPC, subnets, security groups, and other necessary resources.

  2. Prepare the source environment In this step, you prepare the existing failover cluster for migration. This can involve making changes to the network configuration, configuring replication, or installing necessary software.

  3. Validate the cluster – After both the source and target environments are prepared, you can perform a validation test to ensure that the cluster is functioning properly. This involves running a series of tests to ensure that the cluster can fail over to the target environment successfully.

  4. Create a replication link – After the validation test, you can create a replication link between the source and target environments. This ensures that any changes made to the source environment are replicated to the target environment.

  5. Monitor replication After the replication link is established, monitor the replication process to ensure that all changes are being replicated properly.

  6. Fail over the cluster – After verifying that replication is working correctly, perform the final failover to the target environment. This involves stopping the cluster services on the source environment and starting them on the target environment.

  7. Test the failover After the failover is complete, perform a test to ensure that the applications and services running on the cluster are functioning properly in the new environment

Migrate

Migrating a Microsoft failover cluster can be a complex process that requires careful planning and implementation to ensure a successful outcome. It's essential to thoroughly assess the existing environment, identify potential issues, and develop a comprehensive migration plan that includes testing and validation before making any changes to the production environment. During the migration phase, it's important to closely monitor the process and address any issues or unexpected behavior promptly. Communication and collaboration between all stakeholders— including IT teams, business users, and vendors—are crucial for a smooth migration process.

Additionally, it's important to consider the impact of the migration on any third-party applications or services that are running on the failover cluster. Identify any dependencies and test those applications thoroughly to ensure that they continue to function as expected after the migration. Another key aspect of the migration phase is to establish a rollback plan in case of any unforeseen issues or failures during the migration process. This plan ideally includes steps to revert the migration and restore the original environment, while minimizing any impact on the production environment.

Finally, after the migration is complete and the failover cluster is successfully running on the new environment, it's important to perform post-migration validation and testing to confirm that everything is working as intended. This includes monitoring performance, validating failover capabilities, and ensuring that all applications and services are functioning properly.