Best Practice 11.2 – Define an approach to maintain availability - SAP Lens

Best Practice 11.2 – Define an approach to maintain availability

Maintain availability by having a resilient architecture that can sustain the failure of a single technical component or AWS service. Implement mechanisms, which could include redundant capacity, load balancing, and software clusters.

Suggestion 11.2.1 – Avoid failures due to exhausted resources or service deterioration

Investigate over-provisioning of resources, proactive monitoring of growth, and throttling usage by setting limits.

The operational excellence pillar covers the different ways in which you can understand the state of your SAP application and ensure that the appropriate actions are taken, see [Operational Excellence]: 1 - Design SAP workload to allow understanding and reaction to its state.

The performance pillar can assist with guidance on right-sizing and scaling capacity [Performance]: 16 - Understand ongoing performance and optimization options.

Suggestion 11.2.2 – Have a strategy for scheduled maintenance

If your business has a requirement to minimize scheduled outages, you should develop a strategy for maintenance at all levels – SAP application, database, operating system, and AWS. Consider the following:

You should also evaluate the elastic capabilities of AWS services to reduce the overall downtime of scheduled maintenance by temporarily increasing performance. For example, scaling up the size of the Amazon EC2 instance running your database to provide more CPU and storage throughput for upgrade activities, or switching your EBS volumes type from gp2 to io2 to improve storage throughput during a database reorganization.

Suggestion 11.2.3 – Protect SAP single points of failure with software clusters or other mechanisms

You can use a high availability (HA) clustering solution for autonomous failover of SAP single points of failure (SAP Central Services and database) across Availability Zones.

There are multiple SAP-certified clustering solutions listed on the SAP website. SAP clustering solutions are supported by the cluster software vendors themselves, not by SAP. SAP only certifies the solution. Any custom-built solution is not certified and will need to be supported by the solution builder.

If you choose not to use a clustering solution for your single points of failure, consider scripting or runbooks to minimize the errors associated with restoring services.

Suggestion 11.2.4 – Consider redundant capacity or automatic scaling for components that support it

Evaluate static, dynamic, or scheduled capacity changes to match your usage. Examine the minimum capacity requirements and how they would be impacted by failures and maintenance. Overprovision where appropriate to allow time to recover from failure.

If you need to maintain 100% capacity in the event of an AZ failure, then you should consider deploying the application tier across three AZs, each with 50% of the total required capacity.

In addition to deploying the SAP Application Server Layer across multiple AZs, you could consider scaling solutions such as the one described in the following SAP on AWS Blog post that leverages the capabilities of Amazon EC2 Auto Scaling.

Suggestion 11.2.5 – Ensure the availability of capacity for all identified failure scenarios

The following are examples of failure scenarios that could be used to guide your analysis. Granularity and coverage of the scenarios, classification, and impact will vary depending on your requirements and architecture.

Failure scenario examples Comparative Risk of Occurrence
Planned / Controlled Maintenance Planned
Resource exhausted or compromised (High CPU utilization / File system full / Out of memory / Storage issues) Medium
Distributed stateless component failure (for example, web dispatchers) Medium
Distributed stateful component failure (for example, application servers) Medium
Single point of failure (Database / SAP Central Services) Medium
AZ / Network failure Low
Core service failure (DNS / Amazon EFS / API calls) Low / Medium
Corruption / Accidental deletion / Malicious activities / Faulty code deployment Low
Region failure Very Low

Further guidance on capacity reservations is available in [Reliability]: Suggestion 10.2.5 - Investigate strategies for ensuring capacity and in the AWS whitepaper: Architecture Guidance for Availability and Reliability of SAP on AWS.

You can review what Reserved Instances you have available within your AWS account using the Reserved Instances section of the Amazon EC2 console. You can review what On-Demand Capacity Reservations you have available using the Capacity Reservations section of the Amazon EC2 console.

Suggestion 11.2.6 – Use AWS services that have inherent availability where applicable

Several AWS services have inherent availability as part of their design and run across multiple Availability Zones for high availability. Some of the relevant services used in an SAP context include:

In addition, components that use stateless services, such as bastian hosts or SAProuter, can use Auto Scaling Groups to achieve high availability.

Suggestion 11.2.7 -– Follow AWS best practices to ensure network connectivity

Evaluate one or more of the following AWS best practices to ensure the resilience of network connectivity to the AWS Region in use:

If your cluster solution relies on an overlay IP consider the following to enable access from outside of the VPC: