Set up disaster recovery for SAP on IBM Db2 on AWS
Created by Ambarish Satarkar (AWS) and Debasis Sahoo (AWS)
Environment: Production | Technologies: Databases; Operations | Workload: SAP |
AWS services: Amazon EC2; AWS Elastic Disaster Recovery |
Summary
This pattern outlines the steps to set up a disaster recovery (DR) system for SAP workloads with IBM Db2 as the database platform, running on the Amazon Web Services (AWS) Cloud. The objective is to provide a low-cost solution for providing business continuity in the event of an outage.
The pattern uses the pilot light approach
This solution is scalable. You can extend it to a full-scale disaster recovery environment as needed.
Prerequisites and limitations
Prerequisites
An SAP instance running on an Amazon Elastic Compute Cloud (Amazon EC2) instance
An IBM Db2 database
An operating system that is supported by the SAP Product Availability Matrix (PAM)
Different physical database hostnames for production and standby database hosts
An Amazon Simple Storage Service (Amazon S3) bucket in each AWS Region with Cross-Region Replication (CRR) enabled
Product versions
IBM Db2 Database version 11.5.7 or later
Architecture
Target technology stack
Amazon EC2
Amazon Simple Storage Service (Amazon S3)
Amazon Virtual Private Cloud (VPC peering)
Amazon Route 53
IBM Db2 High Availability Disaster Recovery (HADR)
Target architecture
This architecture implements a DR solution for SAP workloads with Db2 as the database platform. The production database is deployed in AWS Region 1 and a standby database is deployed in a second Region. The standby database is referred to as the DR system. Db2 Database supports multiple standby databases (up to three). It uses Db2 HADR for setting up the DR database and automating log shipping between the production and standby databases.
In the event of a disaster that makes Region 1 unavailable, the standby database in the DR Region takes over the production database role. SAP application servers can be built in advance or by using AWS Elastic Disaster Recovery
Db2 HADR implements a production-standby setup, where production acts as the primary server, and all users are connected to it. All transactions are written to log files, which are transferred to the standby server by using TCP/IP. The standby server updates its local database by rolling forward the transferred log records, which helps to ensure that it is kept in sync with the production server.
VPC peering is used so that instances in the production Region and DR Region can communicate with each other. Amazon Route 53 routes end users to internet applications.
Create an AMI of the application server in Region 1 and copy the AMI
to Region 2. Use the AMI to launch servers in Region 2 in the event of a disaster. Set up Db2 HADR replication between the production database (in Region 1) and the standby database (in Region 2).
Change the EC2 instance type to match the production instance in the event of a disaster.
In Region 1,
LOGARCHMETH1
is set todb2remote: S3 path
.In Region 2,
LOGARCHMETH1
is set todb2remote: S3 path
.Cross-Region Replication is performed between the S3 buckets.
Tools
AWS services
Amazon Elastic Compute Cloud (Amazon EC2) provides scalable computing capacity in the AWS Cloud. You can launch as many virtual servers as you need and quickly scale them up or down.
Amazon Route 53 is a highly available and scalable DNS web service.
Amazon Simple Storage Service (Amazon S3) is a cloud-based object storage service that helps you store, protect, and retrieve any amount of data.
Amazon Virtual Private Cloud (Amazon VPC) helps you launch AWS resources into a virtual network that you’ve defined. This virtual network resembles a traditional network that you’d operate in your own data center, with the benefits of using the scalable infrastructure of AWS. This pattern uses VPC peering.
Best practices
The network plays a key role in deciding the HADR replication mode. For DR across AWS Regions, we recommend that you use Db2 HADR ASYNC or SUPERASYNC mode.
For more information about replication modes for Db2 HADR, see the IBM documentation
. You can use the AWS Management Console or the AWS Command Line Interface (AWS CLI) to create a new AMI of your existing SAP system. You can then use the AMI to recover your existing SAP system or to create a clone.
AWS Systems Manager Automation can help with the common maintenance and deployment tasks of EC2 instances and other AWS resources.
AWS provides multiple native services to monitor and manage your infrastructure and applications on AWS. Services such as Amazon CloudWatch and AWS CloudTrail can be used to monitor your underlying infrastructure and API operations, respectively. For more details, see SAP on AWS – IBM Db2 HADR with Pacemaker.
Epics
Task | Description | Skills required |
---|---|---|
Check the system and logs. |
| AWS administrator, SAP Basis administrator |
Task | Description | Skills required |
---|---|---|
Create the SAP and database servers. |
The rollforward pending state is set by default after the full backup is restored. The rollforward pending state indicates that the database is in the process of being restored and that some changes might need to be applied. For more information, see the IBM documentation | SAP Basis administrator |
Check the configuration. |
| AWS administrator, SAP Basis administrator |
Set up replication from the production DB to the DR DB (using ASYNC mode). |
| SAP Basis administrator |
Task | Description | Skills required |
---|---|---|
Plan the production business downtime for the DR test. | Make sure that you plan the required business downtime on production environment for testing the DR failover scenario. | SAP Basis administrator |
Create a test user. | Create a test user (or any test changes) that can be validated in the DR host to confirm log replication after DR failover. | SAP Basis administrator |
On the console, stop the production EC2 instances. | Ungraceful shutdown is initiated in this step to mimic a disaster scenario. | AWS systems administrator |
Scale up the DR EC2 instance to match the requirements. | On the EC2 console, change the instance type in the DR Region.
| SAP Basis Admin |
Initiate takeover. | From the DR system (
Optionally, you can set the following parameters to adjust database memory allocation automatically based on the instance type. The
Verify the change by using the following commands.
| SAP Basis administrator |
Launch the application server for SAP in the DR Region. | Using the AMI that you made of the production system, launch a new additional application server | SAP Basis administrator |
Perform validation before starting the SAP application. |
| AWS administrator, SAP Basis administrator |
Start the SAP application on the DR system. | Start the SAP application on the DR system by using
| SAP Basis administrator |
Perform SAP validation. | This is performed as a DR test to provide evidence or to check the data replication success to the DR Region. | Test engineer |
Task | Description | Skills required |
---|---|---|
Start the production SAP and database servers. | On the console, start the EC2 instances that host SAP and the database in the production system. | SAP Basis administrator |
Start the production database and set up HADR. | Log in to production system (
Verify that the HADR status is
If the database is not inconsistent and is not at | SAP Basis administrator |
Fail back the database to the production Region. | In a normal business-as-usual scenario, this step is performed in a scheduled downtime. Applications running on the DR system are stopped, and the database is failed back to the production Region (Region 1) to resume operations from the production Region.
| SAP Basis administrator |
Perform validation before starting the SAP application. |
| AWS administrator, SAP Basis administrator |
Start the SAP application. |
| SAP Basis administrator |
Troubleshooting
Issue | Solution |
---|---|
Key log files and commands to troubleshoot HADR-related issues |
|
SAP note for troubleshooting HADR issues on Db2 UDB | Refer to SAP Note 1154013 - DB6: DB problems in HADR environment |
Related resources
Additional information
Using this pattern, you can set up a disaster recovery system for an SAP system running on the Db2 database. In a disaster situation, business should be able to continue within your defined recovery time objective (RTO) and recovery point objective (RPO) requirements:
RTO is the maximum acceptable delay between the interruption of service and restoration of service. This determines what is considered an acceptable time window when service is unavailable.
RPO is the maximum acceptable amount of time since the last data recovery point. This determines what is considered an acceptable loss of data between the last recovery point and the interruption of service.
For FAQs related to HADR, see SAP note #1612105 - DB6: FAQ on Db2 High Availability Disaster Recovery (HADR)