Summary Prerequisites and limitations Architecture Tools Best practices Epics Troubleshooting Related resources Additional information

Set up disaster recovery for SAP on IBM Db2 on AWS

Created by Ambarish Satarkar (AWS) and Debasis Sahoo (AWS)

Summary

This pattern outlines the steps to set up a disaster recovery (DR) system for SAP workloads with IBM Db2 as the database platform, running on the Amazon Web Services (AWS) Cloud. The objective is to provide a low-cost solution for providing business continuity in the event of an outage.

The pattern uses the pilot light approach. By implementing pilot light DR on AWS, you can reduce downtime and maintain business continuity. The pilot light approach focuses on setting up a minimal DR environment in AWS, including an SAP system and a standby Db2 database, that is synchronized with the production environment.

This solution is scalable. You can extend it to a full-scale disaster recovery environment as needed.

Prerequisites and limitations

Prerequisites

An SAP instance running on an Amazon Elastic Compute Cloud (Amazon EC2) instance
An IBM Db2 database
An operating system that is supported by the SAP Product Availability Matrix (PAM)
Different physical database hostnames for production and standby database hosts
An Amazon Simple Storage Service (Amazon S3) bucket in each AWS Region with Cross-Region Replication (CRR) enabled

Product versions

IBM Db2 Database version 11.5.7 or later

Architecture

Target technology stack

Amazon EC2
Amazon Simple Storage Service (Amazon S3)
Amazon Virtual Private Cloud (VPC peering)
Amazon Route 53
IBM Db2 High Availability Disaster Recovery (HADR)

Target architecture

This architecture implements a DR solution for SAP workloads with Db2 as the database platform. The production database is deployed in AWS Region 1 and a standby database is deployed in a second Region. The standby database is referred to as the DR system. Db2 Database supports multiple standby databases (up to three). It uses Db2 HADR for setting up the DR database and automating log shipping between the production and standby databases.

In the event of a disaster that makes Region 1 unavailable, the standby database in the DR Region takes over the production database role. SAP application servers can be built in advance or by using AWS Elastic Disaster Recovery or an Amazon Machine Image (AMI) to meet the recovery time objective (RTO) requirements. This pattern uses an AMI.

Db2 HADR implements a production-standby setup, where production acts as the primary server, and all users are connected to it. All transactions are written to log files, which are transferred to the standby server by using TCP/IP. The standby server updates its local database by rolling forward the transferred log records, which helps to ensure that it is kept in sync with the production server.

VPC peering is used so that instances in the production Region and DR Region can communicate with each other. Amazon Route 53 routes end users to internet applications.

Db2 on AWS with cross-Region replication

Create an AMI of the application server in Region 1 and copy the AMI to Region 2. Use the AMI to launch servers in Region 2 in the event of a disaster.
Set up Db2 HADR replication between the production database (in Region 1) and the standby database (in Region 2).
Change the EC2 instance type to match the production instance in the event of a disaster.
In Region 1, LOGARCHMETH1 is set to db2remote: S3 path.
In Region 2, LOGARCHMETH1 is set to db2remote: S3 path.
Cross-Region Replication is performed between the S3 buckets.

Tools

AWS services

Amazon Elastic Compute Cloud (Amazon EC2) provides scalable computing capacity in the AWS Cloud. You can launch as many virtual servers as you need and quickly scale them up or down.
Amazon Route 53 is a highly available and scalable DNS web service.
Amazon Simple Storage Service (Amazon S3) is a cloud-based object storage service that helps you store, protect, and retrieve any amount of data.
Amazon Virtual Private Cloud (Amazon VPC) helps you launch AWS resources into a virtual network that you’ve defined. This virtual network resembles a traditional network that you’d operate in your own data center, with the benefits of using the scalable infrastructure of AWS. This pattern uses VPC peering.

Best practices

The network plays a key role in deciding the HADR replication mode. For DR across AWS Regions, we recommend that you use Db2 HADR ASYNC or SUPERASYNC mode.
For more information about replication modes for Db2 HADR, see the IBM documentation.
You can use the AWS Management Console or the AWS Command Line Interface (AWS CLI) to create a new AMI of your existing SAP system. You can then use the AMI to recover your existing SAP system or to create a clone.
AWS Systems Manager Automation can help with the common maintenance and deployment tasks of EC2 instances and other AWS resources.
AWS provides multiple native services to monitor and manage your infrastructure and applications on AWS. Services such as Amazon CloudWatch and AWS CloudTrail can be used to monitor your underlying infrastructure and API operations, respectively. For more details, see SAP on AWS – IBM Db2 HADR with Pacemaker.

Epics

Task	Description	Skills required
Check the system and logs.	Confirm that the production SAP on Db2 system is set up. Confirm that log backup is turned on and configured to save the logs in the S3 bucket. This can be checked by the Db2 parameter `LOGARCHMETH1`. Create an AMI of the additional application server.	AWS administrator, SAP Basis administrator

Task	Description	Skills required
Create the SAP and database servers.	To deploy the infrastructure for the DR Region, use an AWS CloudFormation script or use an AMI of the production instance. As a part of the pilot light approach, you can use a smaller EC2 instance in the same family as the production instance. For example, if your production instance type is `r6i.12xlarge`, you can use the `r6i.xlarge` instance type for the DR build. However, make sure that you allocate the same storage capacity on the DR instance to restore the production database backup. Create Amazon Elastic File System (Amazon EFS) mount points for `/sapmnt/<SID>/`, and make sure that it is set to be replicated from the primary system. Take a FULL database backup (online or offline) from the production system. You will use this backup to build the DR database. In the DR system, use the SAP Software Provisioning Manager (SWPM) system copy method with Using system copy with backup/restore for HA/DR purposes to build the DR SAP system. When asked by SWPM, restore the database in DR with the backup that you took from the production. The DR database will be in the rollforward pending state. The rollforward pending state is set by default after the full backup is restored. The rollforward pending state indicates that the database is in the process of being restored and that some changes might need to be applied. For more information, see the IBM documentation.	SAP Basis administrator
Check the configuration.	To set up log archiving for HADR, both the production and DR databases must be able to retrieve logs automatically from all log archive locations. Verify that the `LOGARCHMETH1` parameter in the DR database is set to the same location as in the production database. If the same location is not accessible because of Regional limitations, ensure that the DR system can automatically fetch logs from the primary system. To enable TCP/IP ports for database replication enablement, modify `/etc/services` in the production and DR hosts by adding the following two entries. In the code, `<SID>` refers to the System ID (SID) of the Db2 database (for example, `PR1`). `<SID>_HADR_1 55001/tcp # DB2 HADR Port1 <SID>_HADR_2 55002/tcp # DB2 HADR Port2` Confirm that both ports allow inbound and outbound traffic between both the primary and the standby. Check `/etc/hosts` in the production and DR hosts to confirm that hostnames for both production and standby hosts are pointing to the correct IP addresses.	AWS administrator, SAP Basis administrator
Set up replication from the production DB to the DR DB (using ASYNC mode).	In the production database, run the following commands to update the parameters. db2 UPDATE DB CFG FOR <SID> USING HADR_LOCAL_HOST HOST1 db2 UPDATE DB CFG FOR <SID> USING HADR_LOCAL_SVC <SID>_HADR_1 db2 UPDATE DB CFG FOR <SID> USING HADR_REMOTE_HOST HOST2 db2 UPDATE DB CFG FOR <SID> USING HADR_REMOTE_SVC <SID>_HADR_2 db2 UPDATE DB CFG FOR <SID> USING HADR_REMOTE_INST db2<sid> db2 UPDATE DB CFG FOR <SID> USING HADR_TIMEOUT 120 db2 UPDATE DB CFG FOR <SID> USING HADR_SYNCMODE ASYNC db2 UPDATE DB CFG FOR <SID> USING HADR_SPOOL_LIMIT 1000 db2 UPDATE DB CFG FOR <SID> USING HADR_PEER_WINDOW 240 db2 UPDATE DB CFG FOR <SID> USING indexrec RESTART logindexbuild ON In the DR database, run the following commands to update the parameters. db2 UPDATE DB CFG FOR <SID> USING HADR_LOCAL_HOST HOST2 db2 UPDATE DB CFG FOR <SID> USING HADR_LOCAL_SVC <SID>_HADR_2 db2 UPDATE DB CFG FOR <SID> USING HADR_REMOTE_HOST HOST1 db2 UPDATE DB CFG FOR <SID> USING HADR_REMOTE_SVC <SID>_HADR_1 db2 UPDATE DB CFG FOR <SID> USING HADR_REMOTE_INST db2<sid> db2 UPDATE DB CFG FOR <SID> USING HADR_TIMEOUT 120 db2 UPDATE DB CFG FOR <SID> USING HADR_SYNCMODE ASYNC db2 UPDATE DB CFG FOR <SID> USING HADR_SPOOL_LIMIT 1000 db2 UPDATE DB CFG FOR <SID> USING HADR_PEER_WINDOW 240 db2 UPDATE DB CFG FOR <SID> USING indexrec RESTART logindexbuild ON These parameters are required to provide HADR-related information to both databases. In the Db2 database, HADR gets activated based on the values for each of the previously set parameters. For more information about these parameters, see the IBM documentation. Start HADR first on the newly created standby database by using the following command. `db2 deactivate db <SID> db2 start hadr on db <SID> as standby` Start HADR on the production database by using the following command. `db2 deactivate db <SID> db2 start hadr on db <SID> as primary` Check whether the production and standby Db2 databases are in sync and log shipping is ongoing. To monitor HADR replication status, use the following `db2pd` command. `db2pd -d <SID> -hadr` For more information about monitoring HADR, see the IBM documentation.	SAP Basis administrator

Task	Description	Skills required
Plan the production business downtime for the DR test.	Make sure that you plan the required business downtime on production environment for testing the DR failover scenario.	SAP Basis administrator
Create a test user.	Create a test user (or any test changes) that can be validated in the DR host to confirm log replication after DR failover.	SAP Basis administrator
On the console, stop the production EC2 instances.	Ungraceful shutdown is initiated in this step to mimic a disaster scenario.	AWS systems administrator
Scale up the DR EC2 instance to match the requirements.	On the EC2 console, change the instance type in the DR Region. Stop the instance: If the instance is running, you must stop it before you can change its instance type. On the EC2 console, select the instance, and choose Stop. Modify the instance type: On the EC2 console, select the instance, and choose Actions, Instance Settings, Change Instance Type. Select the instance type that matches the primary instance, and choose Apply. Start the instance: After the instance type change is complete, start the instance from the EC2 console by selecting the instance and choosing Start. To start the Db2 database, use the following command. `db2start db2 start HADR on db <SID> as standby`	SAP Basis Admin
Initiate takeover.	From the DR system (`host2`), initiate the take-over process and bring up the DR database as the primary. `db2 takeover hadr on database <SID> by force` Optionally, you can set the following parameters to adjust database memory allocation automatically based on the instance type. The `INSTANCE_MEMORY` value can be decided based on the dedicated portion of memory to be allocated to the Db2 database. `db2 update db cfg for <SID> using INSTANCE_MEMORY <FIXED VALUE> IMMEDIATE; db2 get db cfg for <SID> \| grep -i DATABASE_MEMORY AUTOMATIC IMMEDIATE; db2 update db cfg for <SID> using self_tuning_mem ON IMMEDIATE;` Verify the change by using the following commands. `db2 get db cfg for <SID> \| grep -i MEMORY db2 get db cfg for <SID> \| grep -i self_tuning_mem`	SAP Basis administrator
Launch the application server for SAP in the DR Region.	Using the AMI that you made of the production system, launch a new additional application server in the DR Region.	SAP Basis administrator
Perform validation before starting the SAP application.	Validate the `/etc/hosts` and `/etc/fstab` entries. Mount `/sapmnt/<SID>/` on the DR system. Validate that the DR file system `/sapmnt/<SID>/` is synced with the production `/sapmnt/<SID>/`. Log in to `<sid>adm` user, run `R3trans -d`, and verify the output in the `trans.log` file. The `trans.log` file is generated in the same location where you ran the `R3trans -d` command.	AWS administrator, SAP Basis administrator
Start the SAP application on the DR system.	Start the SAP application on the DR system by using `<sid>adm` user. Use the following code, in which `XX` represents the instance number of your SAP ABAP SAP Central Services (ASCS) server, and `YY` represents the instance number of your SAP application server. `sapconrol -nr XX -function StartService <SID> sapconrol -nr XX -function StartSystem sapconrol -nr YY -function StartService <SID> sapconrol -nr YY -function StartSystem`	SAP Basis administrator
Perform SAP validation.	This is performed as a DR test to provide evidence or to check the data replication success to the DR Region.	Test engineer

Task	Description	Skills required
Start the production SAP and database servers.	On the console, start the EC2 instances that host SAP and the database in the production system.	SAP Basis administrator
Start the production database and set up HADR.	Log in to production system (`host1`) and verify that the DB is in recovery mode by using the following command. `db2start db2 start HADR on db P3V as standby db2 connect to <SID>` Verify that the HADR status is `connected`. Replication status should be `peer`. `db2pd -d <SID> -hadr` If the database is not inconsistent and is not at `connected` and `peer` status, a backup and restore might be required to bring the database (on `host1`) in sync with the currently active database (`host2` in the DR Region). In that case, restore the DB backup from the database in the `host2` DR Region to the database in the `host1` production Region.	SAP Basis administrator
Fail back the database to the production Region.	In a normal business-as-usual scenario, this step is performed in a scheduled downtime. Applications running on the DR system are stopped, and the database is failed back to the production Region (Region 1) to resume operations from the production Region. Log in to the SAP application server in the DR Region, and stop the SAP application. Unmount `/sapmnt/<SID>` from the DR system, making sure that the changes are reverse-replicated to `/sapmnt/<SID>` of the production system. Log in to the database server (`host1`) in the production Region, and perform the takeover. `db2 takeover hadr on database <SID>` Check the HADR status: `HADR_ROLE` should be `PRIMARY` on `host1` and `StandBy` on `host2`. `db2pd -d <SID> -hadr`	SAP Basis administrator
Perform validation before starting the SAP application.	Validate the `/etc/hosts` and `/etc/fstab` entries. Mount `/sapmnt/<SID>/` on the production system. Make sure it is in sync with the DR system `/sapmnt/<SID>/`. Log in to `<sid>adm` user, run `R3trans -d`, and verify the output in the `trans.log` file. The `trans.log` file is generated in the same location where you ran the `R3trans -d` command.	AWS administrator, SAP Basis administrator
Start the SAP application.	Start the SAP application on the production system using `<sid>adm` user. Use the following code, in which `XX` represents the instance number of your SAP ASCS server, and `YY` represents the instance number of your SAP application server. `sapconrol -nr XX -function StartService <SID> sapconrol -nr XX -function StartSystem sapconrol -nr YY -function StartService <SID> sapconrol -nr YY -function StartSystem` To confirm that application servers are available, log in to SAP and perform checks by using the SICK and SM51 transactions.	SAP Basis administrator

Troubleshooting

Issue	Solution
Key log files and commands to troubleshoot HADR-related issues	`db2 get db cfg \| grep -i hadr` `db2pd -d sid -hadr` `Db2diag.log` (This file is generally located inside the `db2dump` directory, and the `db2dump` path is defined by the parameter `DIAGPATH`.)
SAP note for troubleshooting HADR issues on Db2 UDB	Refer to SAP Note 1154013 - DB6: DB problems in HADR environment. (You need SAP portal credentials to access this note.)

Related resources

Additional information

Using this pattern, you can set up a disaster recovery system for an SAP system running on the Db2 database. In a disaster situation, business should be able to continue within your defined recovery time objective (RTO) and recovery point objective (RPO) requirements:

RTO is the maximum acceptable delay between the interruption of service and restoration of service. This determines what is considered an acceptable time window when service is unavailable.
RPO is the maximum acceptable amount of time since the last data recovery point. This determines what is considered an acceptable loss of data between the last recovery point and the interruption of service.

For FAQs related to HADR, see SAP note #1612105 - DB6: FAQ on Db2 High Availability Disaster Recovery (HADR). (You need SAP portal credentials to access this note.)

Warning Javascript is disabled or is unavailable in your browser.

To use the Amazon Web Services Documentation, Javascript must be enabled. Please refer to your browser's Help pages for instructions.

Document Conventions

Send notifications for RDS for SQL Server using an on-premises SMTP server

Set up a CI/CD pipeline for database migration by using Terraform