High Availability System Deployment - SAP NetWeaver on AWS

High Availability System Deployment

High availability (HA) system: used for business-critical applications. With this option, all the services that are single points of failure are deployed across multiple Availability Zones for fault tolerance.

For SAP NetWeaver, the key single points of failure are:

  • the central services (ASCS/SCS)

  • the global and transport filesystems

To protect against hardware failure of Amazon EC2 within an Availability Zone, you can enable EC2 instance recovery. See Recover Your Instance for more details on this feature. You can use scripts to start the SAP NetWeaver application automatically after instance recovery. You can further configure SAP application work processes to reconnect to your database after recovery. Consult the documentation for further restrictions. This option is not application aware and does not protect the application against Availability Zone failure, which makes it a good option for non-production systems. It also can be used for production systems but you might want to consider a Multi-AZ solution for this situation as well.

For HA solutions, it’s important to be aware of two concepts within a VPC: shared storage and the Overlay IP address.

Shared Storage

EBS volumes are specific to a single Availability Zone and can only be attached to a single EC2 instance at a time. However, in distributed or HA deployments, shared storage is required for the global and transport filesystems. On AWS, this storage can be provided by building an NFS server or by using Amazon FSx. Amazon FSx provides shared file storage with full support for the SMB protocol, Windows NTFS, Active Directory integration, and Distributed File System (DFS).

If using such a solution in the context of a high availability installation, the shared storage solution you choose could introduce a single point of failure without appropriate protection. This can be protected against by:

  • Clustering the NFS server providing the shared filesystem

  • Clustering the host that is sharing the filesystems

  • Using Amazon FSx. For workloads that require Multi-AZ redundancy to tolerate temporary AZ unavailability, you can create multiple file systems in separate AZs. Amazon FSx supports Microsoft’s Distributed File System (DFS) Replication and Namespaces. DFS Replication allows you to automatically replicate data between two file systems, and DFS Namespaces allows you to configure automatic failover.

Overlay IP Address

A subnet is dedicated to a single Availability Zone. Therefore, when a cluster solution fails over the active node to the second Availability Zone, the IP address will change. Clients will need a single target address to consistently connect to the active node. In AWS, SAP cluster solutions achieve this by using an Overlay IP address.

Each Amazon VPC has a default route table. A route table contains a set of rules, called routes, that are used to determine where network traffic is directed. Each subnet in your VPC must be associated with a route table—the table controls the routing for the subnet. A subnet can be associated with only one route table at a time, but you can associate multiple subnets with the same route table.

This concept allows directing the traffic to any instance in an Amazon VPC, regardless of which subnet or Availability Zone it is in. Changing this routing entry for the subnets in a given VPC allows redirecting traffic when needed. This concept is known as Overlay IP routing in AWS, and can be used dynamically by updating the route table to point to the active node.

Because the Overlay IP address itself needs to be outside the CIDR block of the VPC, it is, by default, not routable from outside the VPC itself. This means that external access, such as from on-premises systems, is not possible without further configuration. To enable external access, you can use our Network Load Balancer or Route 53 agent.

See Figure 3 for an architecture diagram of a high availability solution that outlines these features.

                             Multi-AZ HA SAP Deployment
Figure 3: Multi-AZ HA SAP Deployment

You can use a high availability (HA) clustering solution for autonomous failover of the central services across Availability Zones. There are multiple SAP-certified options for this clustering software on Windows listed on the SAP website, and it’s also possible to build and automate your own solution. HA solutions that have been tested and are known to work on AWS include:

Support and certification

SAP clustering software is supported by the cluster software vendors themselves, not by SAP. SAP only certifies the solution. Any custom-built solution is not certified and will need to be supported by the solution builder.

In this guide, we focus on the distributed installation type on Windows in AWS. More details on how to deploy and operate SIOS, Veritas, and WSFC clusters are available on their respective websites linked above. For effective use of WSFC, Windows Server 2016, or later, is required.

The key features to be aware of with the WSFC solution are:

  • ASCS and a separate ERS instance set up within Windows Cluster Manager

  • Scale-Out File Server is a feature that is designed to provide scale-out file shares that are continuously available for file-based server application storage

  • Storage Spaces Direct uses standard servers with local-attached drives to create highly available, highly scalable software-defined storage. This requires a minimum of Windows Server 2016 and NVMe storage (so nitro-generation EC2 instances are required).

  • Amazon FSx for Windows File Server

Also read the High Availability with Microsoft Failover Clustering section of the SAP NetWeaver installation guide.