Step 1: Db2 Virtual Hostname Step 2: AWS Overlay IP Step 3: AWS Resources Step 4: SAP Netweaver and IBM Db2 Deployment Step 5: Db2 HADR Setup Step 6: Pacemaker Cluster Setup Step 6a. Setup on RHEL Step 6b. Setup on SLES Step 7: Post Setup Configuration Step 8: Testing and Validation

Deployment

This section discusses high level deployment process and steps. Table 2 lists the steps in the order they should be done, and each step’s purpose.

Table 2 – Steps to set up AWS resources to deploy IBM Db2 HADR with Pacemaker for SAP NetWeaver on Amazon EC2 instances

Activity	Purpose
Step 1: Db2 Virtual Hostname	Decide on the virtual hostname for your Db2 database (for example, `dbhadb2`).
Step 2: AWS Overlay IP	Decide on the Overlay IP for the dbhadb2 name (for example, `192.168.1.90`).
Step 3: Provision AWS Resources	Provision AWS resources and configure security.
Step 4: SAP App and Db2 Install	Install SAP tier and Db2 primary and standby databases.
Step 5: Db2 HADR Setup	Configure Db2 HADR replication.
Step 6: Pacemaker Cluster Setup	Configure the Pacemaker cluster.
Step 7: Post Setup Configuration	Post tasks and manual configuration.
Step 8: Testing and Validation	Quality Assurance.

Step 1: Db2 Virtual Hostname

Decide on the virtual hostname for your Db2 database. For example, if your virtual hostname is dbhadb2, it would be configured in the SAP and dbclient profiles. See Step 7: Post Setup Configuration in this document for more information.

Step 2: AWS Overlay IP

Decide on IP address to use for your Overlay IP. An Overlay IP address is an AWS-specific routing entry which can send network traffic to an instance, no matter which AZ the instance is located in.

One key requirement for the Overlay IP is that it should not be used elsewhere in your VPC or on-premises. It should be part of the private IP address range defined in RFC1918. For example, if your VPC is configured in the range of 10.0.0.0/8 or 172.16.0.0/12, you can use the Overlay IP from the range of 192.168.0.0/16. Based on the number of HA setups you plan to have in your landscape, you can reserve the IP address by reserving a block from the private IP address to ensure there is no overlap.

AWS worked on creating a resource agent, aws-vpc-move-ip, which is available along with the Linux Pacemaker. This agent updates the route table of the VPC where you have configured the cluster to always point to the primary DB.

All traffic within the VPC can reach the Overlay IP address via the route table. Traffic from outside the VPC, whether that is from another VPC or on-premises will require AWS Transit Gateway (TGW) or AWS NLB to reach the Overlay IP address. For more information on how to direct traffic to an Overlay IP address via AWS TGW or AWS NLB, see SAP on AWS High Availability with Overlay IP Address Routing.

Step 3: AWS Resources

Deciding the right storage layout is important to ensure you can meet required IO. EBS gp2 volumes balance price and performance for a wide variety of workloads, while io1 volumes provide the highest performance consistently for mission-critical applications. With these two options, you can choose a storage solution that meets your performance and cost requirements. For more information, see Amazon EBS features for more information.

Step 4: SAP Netweaver and IBM Db2 Deployment

See the SAP Standard Installation guide based on your installation release to get the technical steps and prerequisites for SAP installation.

Step 4a: Create EC2 Instances for Deploying SAP NetWeaver ASCS

See SAP NetWeaver Environment Setup for Linux on AWS to learn how to set up an EC2 instance for SAP NetWeaver.

Step 4b: Create EC2 Instances for IBM Db2 Primary and Standby Databases

Deploy two EC2 instances, one in each AZ, for your primary and standby databases.

Step 4c: Disable Source/destination Check for the EC2 Instance Hosting the IBM Db2 Primary and Standby Databases

You need to disable source/destination check for your EC2 instance hosting primary and standby databases. This is required to route traffic via Overlay IP. See Changing the source or destination checking to learn more about how to disable source/destination check for your EC2 instance. You can use the following command line interface (CLI) to achieve this.

       aws ec2 modify-instance-attribute --profile cluster --instance-id EC2-instance-id --no-source-dest-check

Step 4d: AWS IAM Role

For the Pacemaker setup, create two policies and attach them to the IAM role, which is attached to the Db2 primary and standby instance. This allows your EC2 instance to call the APIs which run during the failover process by Pacemaker.

STONITH – Allows Ec2 instance to start, stop and reboot instances.

 {
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "Stmt1424870324000",
            "Effect": "Allow",
            "Action": [
                "ec2:DescribeInstances",
                "ec2:DescribeInstanceAttribute",
                "ec2:DescribeTags"
            ],
            "Resource": "*"
        },
        {
            "Sid": "Stmt1424870324001",
            "Effect": "Allow",
            "Action": [
                "ec2:ModifyInstanceAttribute",
                "ec2:RebootInstances",
                "ec2:StartInstances",
                "ec2:StopInstances"
            ],
            "Resource": [
                "arn:aws:ec2:region-name:account-id:instance/i-node1",
                "arn:aws:ec2:region-name:account-id:instance/i-node2"
            ]
        }
    ]
}

Overlay IP – Allows the Ec2 instance to update the route table in case of failover.

       {
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "VisualEditor0",
            "Effect": "Allow",
            "Action": "ec2:ReplaceRoute",
            "Resource": "arn:aws:ec2:region-name:account-id:route-table/rtb-XYZ"
        },
        {
            "Sid": "VisualEditor1",
            "Effect": "Allow",
            "Action": "ec2:DescribeRouteTables",
            "Resource": "*"
        }
    ]
}

Replace the following variables with the appropriate names:

region-name: the name of the AWS region.
account-id: The name of the AWS account in which the policy is used.
rtb-XYZ: The identifier of the routing table which needs to be updated.
i-node1: Instance ID for the Db2 primary instance.
i-node2: Instance id for the Db2 standby instance.

Step 4e: SAP Application and Db2 Software Install

Prerequisites:

Before starting the installation, update the /etc/hosts files of database, ASCS, and application servers with the hostname and IP address of your database, ASCS and application servers. This ensures that all your instances can resolve each other’s address during installation and configuration.
You need to install the following packages in your instances: tcsh.x86_64, ksh.x86_64, libaio.x86_64, libstdc++.x86_64.
Comment out the 5912 port entry in the /etc/services file (if it exists), as this port is used for the Db2 installation:

     #fis             5912/tcp          # Flight Information Services
     #fis             5912/udp          # Flight Information Services
     #fis             5912/sctp         # Flight Information Services

SAP application and Db2 software installation (high-level instructions):

Install SAP ASCS using software provisioning manager (SWPM) on the Amazon EC2 instance. Choose the installation option depending on the scenario; for example, distributed or HA in case you plan to install ERS for app layer high availability.

Figure 3 – Install ASCS
Install the primary database using SWPM on the Amazon EC2 instance hosted in AZ1.

Figure 4 – Install the primary database
Take a backup of the primary database.
Install the PAS instance. This can be the same EC2 instance used in step 1 if you want to install ASCS and PAS on one host.

Figure 5 – Install the PAS instance
For the standby DB installation:

Use the SAP homogeneous system copy procedure from SWPM with the option of System copy > Target systems > Distributed > Database instance.
In the SWPM parameter screen, when asked for system load type, choose Homogenous System and the backup/restore option.
When prompted by the SWPM, restore the backup you took during step 3 on the standby DB. You can exit the installer, because the subsequent installation is already completed on the primary database server.

You should now have your ASCS, primary Db2 database, and PAS (if running on a different host than ASCS) installed and running in AZ1 of your setup. In AZ2, you should have the standby Db2 database installed and running. You can optionally install an additional application server if required to support the workload. We recommend that you distribute your application servers in both AZs.

Step 5: Db2 HADR Setup

The following steps explain how to set up HADR between the primary and standby databases. For additional references, see:

Table 3 details the steps for setup. Update the configuration commands according to your environment.

Table 3 – Db2 HADR setup

System ID (SID)	STJ
Primary Db2 database hostname	dbprim00
Standby Db2 database hostname	dbsec00
Overlay IP	192.168.1.81

To set up Db2 HADR:

Find the state of the primary database before HADR configuration by executing the following command:

      > db2 get db cfg for STJ | grep HADR
      HADR database role                                      = STANDARD
      HADR local host name                  (HADR_LOCAL_HOST) =
      HADR local service name                (HADR_LOCAL_SVC) =
      HADR remote host name                (HADR_REMOTE_HOST) =
      HADR remote service name              (HADR_REMOTE_SVC) =
      HADR instance name of remote server  (HADR_REMOTE_INST) =
      HADR timeout value                       (HADR_TIMEOUT) = 120
      HADR target list                     (HADR_TARGET_LIST) =
      HADR log write synchronization mode     (HADR_SYNCMODE) = NEARSYNC
      HADR spool log data limit (4KB)      (HADR_SPOOL_LIMIT) = AUTOMATIC(0)
      HADR log replay delay (seconds)     (HADR_REPLAY_DELAY) = 0
      HADR peer window duration (seconds)  (HADR_PEER_WINDOW) = 0
      HADR SSL certificate label             (HADR_SSL_LABEL) =

The HADR_LOCAL_SVC and HADR_REMOTE_SVC parameters require an entry in your /etc/services file. If the entry is unavailable, update the /etc/services file to include the entry. Here is a sample /etc/services file entry. The SID is STJ.
```
     #  grep -i hadr /etc/services
     STJ_HADR_1   5950/tcp   # DB2 HADR log shipping
     STJ_HADR_2   5951/tcp   # DB2 HADR log shipping
```

Complete the following steps in your primary database (in this case, dbprim00) as the Db2 instance owner (in this case, db2stj):

     db2 update db cfg for STJ using HADR_LOCAL_HOST dbprim00
     db2 update db cfg for STJ using HADR_LOCAL_SVC STJ_HADR_1
     db2 update db cfg for STJ  using HADR_REMOTE_HOST dbsec00
     db2 update db cfg for STJ  using HADR_REMOTE_SVC STJ_HADR_2
     db2 update db cfg for STJ  using HADR_REMOTE_INST db2stj
     db2 update db cfg for STJ  using HADR_TIMEOUT 120
     db2 update db cfg for STJ  using HADR_SYNCMODE NEARSYNC
     db2 update db cfg for STJ  using HADR_PEER_WINDOW 300
     db2 update db cfg for STJ  using LOGINDEXBUILD ON

Here is the state after the configuration was updated:

      > db2 get db cfg for STJ | grep HADR
      HADR database role                                      = STANDARD
      HADR local host name                  (HADR_LOCAL_HOST) = dbprim00
      HADR local service name                (HADR_LOCAL_SVC) = STJ_HADR_1
      HADR remote host name                (HADR_REMOTE_HOST) = dbsec00
      HADR remote service name              (HADR_REMOTE_SVC) = STJ_HADR_2
      HADR instance name of remote server  (HADR_REMOTE_INST) = db2stj
      HADR timeout value                       (HADR_TIMEOUT) = 120
      HADR target list                     (HADR_TARGET_LIST) =
      HADR log write synchronization mode     (HADR_SYNCMODE) = NEARSYNC
      HADR spool log data limit (4KB)      (HADR_SPOOL_LIMIT) = AUTOMATIC(0)
      HADR log replay delay (seconds)     (HADR_REPLAY_DELAY) = 0
      HADR peer window duration (seconds)  (HADR_PEER_WINDOW) = 300
      HADR SSL certificate label             (HADR_SSL_LABEL) =

Run the following steps in your standby database (in this case dbsec00) as the Db2 instance owner (in this case, db2stj).

     db2 update db cfg for STJ using HADR_LOCAL_HOST dbsec00
     db2 update db cfg for STJ using HADR_LOCAL_SVC STJ_HADR_2
     db2 update db cfg for STJ  using HADR_REMOTE_HOST dbprim00
     db2 update db cfg for STJ  using HADR_REMOTE_SVC STJ_HADR_1
     db2 update db cfg for STJ  using HADR_REMOTE_INST db2stj
     db2 update db cfg for STJ  using HADR_TIMEOUT 120
     db2 update db cfg for STJ  using HADR_SYNCMODE NEARSYNC
     db2 update db cfg for STJ  using HADR_PEER_WINDOW 300
     db2 update db cfg for STJ  using LOGINDEXBUILD ON

Here’s an example configuration:

      > db2 get db cfg for STJ | grep HADR
      HADR database role                                      = STANDBY
      HADR local host name                  (HADR_LOCAL_HOST) = dbsec00
      HADR local service name                (HADR_LOCAL_SVC) = STJ_HADR_2
      HADR remote host name                (HADR_REMOTE_HOST) = dbprim00
      HADR remote service name              (HADR_REMOTE_SVC) = STJ_HADR_1
      HADR instance name of remote server  (HADR_REMOTE_INST) = db2stj
      HADR timeout value                       (HADR_TIMEOUT) = 120
      HADR target list                     (HADR_TARGET_LIST) =
      HADR log write synchronization mode     (HADR_SYNCMODE) = NEARSYNC
      HADR spool log data limit (4KB)      (HADR_SPOOL_LIMIT) = AUTOMATIC(1200000)
      HADR log replay delay (seconds)     (HADR_REPLAY_DELAY) = 0
      HADR peer window duration (seconds)  (HADR_PEER_WINDOW) = 300
      HADR SSL certificate label             (HADR_SSL_LABEL) =

When using Linux pacemaker, use the following Db2 HADR parameters:
- HADR peer window duration (seconds) (HADR_PEER_WINDOW) = 300
- HADR timeout value (HADR_TIMEOUT) = 60
We recommend that you tune these parameters after testing the failover and takeover functionality. Because individual configuration can vary, the parameter might need adjustment.
After your primary and standby databases have been configured, start HADR on the standby server as the HADR standby.
```
     db2 start hadr on database STJ as standby
```

Start HADR on the primary database.

     db2 start hadr on database STJ as primary
     DB20000I  The START HADR ON DATABASE command completed successfully.

     db2pd -hadr -db STJ | head -20

     Database Member 0 -- Database STJ -- Active --

                            HADR_ROLE = PRIMARY
     REPLAY_TYPE = PHYSICAL
                        HADR_SYNCMODE = NEARSYNC
                           STANDBY_ID = 1
                        LOG_STREAM_ID = 0
                           HADR_STATE = PEER
                           HADR_FLAGS = TCP_PROTOCOL
                  PRIMARY_MEMBER_HOST = dbprim00
                     PRIMARY_INSTANCE = db2stj
				                              ...
			HADR_CONNECT_STATUS = CONNECTED

Step 6: Pacemaker Cluster Setup

In this section we’ll discuss the cluster setup using Linux Pacemaker on both RHEL and SLES OS. The Linux Pacemaker works as a failover Orchestrator. It monitors both the primary and standby databases, and in the event of primary database server failure it initiates an automatic HADR takeover by the standby server. The resource agents this configuration uses are as following:

STONITH resource agent for fencing.
The db2 database resource, which is configured in a Primary/Standby configuration.
The AWS-vpc-move-ip resource, which is built by the AWS team to handle the overlay IP switch from the Db2 primary instance to standby in the event of failure.

As mentioned in the Operating System section of this document, you need the correct subscription to download these resource agents.

Important: Change the shell environment for the db2<sid> user to /bin/ksh.

To change the shell environment:

Shut down both the database servers using db2stop while logged in as db2<sid>.
Install Kornshell (ksh) (if it’s not already installed).
Run sudo usermod -s /bin/ksh db2<sid>.

Step 6a. Setup on RHEL

This section focuses on setting up the Pacemaker cluster on the RHEL operating system.

To set up the pacemaker cluster on RHEL:

Basic cluster configuration – Install the required cluster packages using both database nodes.
```
     yum install –y pcs pacemaker fence-agents-aws
     yum install –y resource-agents
```
Start the cluster services.
```
      systemctl start pcsd.service
      systemctl enable pcsd.service
```
Note: If you have subscribed to RHEL for SAP with HA and US products from AWS Marketplace, run mkdir -p /var/log/pcsd /var/log/cluster before starting pcsd.service.
Reset the password for user hacluster on both the DB nodes.
```
     passwd hacluster
```

Authorize the cluster. Make sure that both nodes are able to communicate with each other using the hostname.

     [root@dbprim00 ~] pcs cluster auth dbprim00 dbsec00
     Username: hacluster
     Password:
     dbprim00: Authorized
     dbsec00: Authorized
     [root@dbprim00 ~]

Create the cluster.

     [root@dbprim00 ~] pcs cluster setup --name db2ha dbprim00 dbsec00
     Destroying cluster on nodes: dbprim00, dbsec00...
     dbsec00: Stopping Cluster (pacemaker)...
     dbprim00: Stopping Cluster (pacemaker)...
     dbprim00: Successfully destroyed cluster
     dbsec00: Successfully destroyed cluster

     Sending 'pacemaker_remote authkey' to 'dbprim00', 'dbsec00'
     dbprim00: successful distribution of the file 'pacemaker_remote authkey'
     dbsec00: successful distribution of the file 'pacemaker_remote authkey'
     Sending cluster config files to the nodes...
     dbprim00: Succeeded
     dbsec00: Succeeded

     Synchronizing pcsd certificates on nodes dbprim00, dbsec00...
     dbprim00: Success
     dbsec00: Success
     Restarting pcsd on the nodes in order to reload the certificates...
     dbprim00: Success
     dbsec00: Success
     [root@dbprim00 ~] pcs cluster enable --all
     dbprim00: Cluster Enabled
     dbsec00: Cluster Enabled
     [root@dbprim00 ~] pcs cluster start --all
     dbsec00: Starting Cluster...
     dbprim00: Starting Cluster...
     [root@dbprim00 ~]

Note: Adjust the corosync timeout.

Go to /etc/corosync/corosync.conf and add or modify the token value of totem to 30000.

      [root@dbprim00 corosync] more /etc/corosync/corosync.conf
      totem {
           version: 2
           cluster_name: db2ha
           secauth: off
           transport: udpu
           token: 30000
      }

      nodelist {
           node {
                ring0_addr: dbprim00
                nodeid: 1
           }

           node {
                ring0_addr: dbsec00
                nodeid: 2
           }
      }

      quorum {
           provider: corosync_votequorum
           two_node: 1
      }

      logging {
           to_logfile: yes
           logfile: /var/log/cluster/corosync.log
           to_syslog: yes
      }

Run pcs cluster sync to sync the changes on the standby database node.

     [root@dbprim00 corosync] pcs cluster sync
     dbprim00: Succeeded
     dbsec00: Succeeded

Run pcs cluster reload corosync to make the changes effective.

     [root@dbprim00 corosync] pcs cluster reload corosync
     Corosync reloaded

To ensure of the changes are in place, run corosync-cmapctl | grep totem.token.

      [root@dbprim00 corosync] corosync-cmapctl | grep totem.token
      runtime.config.totem.token (u32) = 30000
      runtime.config.totem.token_retransmit (u32) = 7142
      runtime.config.totem.token_retransmits_before_loss_const (u32) = 4
      totem.token (u32) = 30000

Before creating any resource, put the cluster in maintenance mode.

     [root@dbprim00 ~] pcs property set maintenance-mode=’true’

Create the STONITH resource. You will need the EC2 instance IDs for this operation. The default pcmk action is reboot. Replace the instance ID for dbprim00 and dbsec00 with the instance IDs of your setup.

If you want to have the instance remain in a stopped state until it has been investigated and then manually started, add pcmk_reboot_action=off. This setting is also required if you are running the Db2 on Amazon EC2 Dedicated Hosts.
```
     [root@dbprim00 ~] pcs stonith create clusterfence fence_aws
     region=us-east-1 pcmk_host_map="dbprim00:i-09d1b1f105f71e5ed;dbsec00:i-
     0c0d3444601b1d8c5" power_timeout=240 pcmk_reboot_timeout=480 pcmk_reboot_retries=4 op start timeout=300
     op monitor timeout=60
```

Create the Db2 resource.

     [root@dbprim00 ~] pcs resource create Db2_HADR_STJ db2
     instance=db2stj dblist=STJ master meta notify=true resource-
     stickiness=5000 op demote timeout=240 op promote timeout=240 op
     start timeout=240 op stop timeout=240 op monitor interval=20s
     timeout=120s op monitor interval=22s role=Master timeout=120s

Note: The timeout values here are default, which works for most deployments. We recommend that you test the timeouts in the QA setup extensively based on the test cases mentioned in the Appendix, and then tune it accordingly.

Create the Overlay IP resource agent. First, add the Overlay IP in the primary node.

     [root@dbprim00 ~] ip address add 192.168.1.81/32 dev eth0
     [root@dbprim00 ~] ip addr show
     1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
         link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
         inet 127.0.0.1/8 scope host lo
             valid_lft forever preferred_lft forever
         inet6 ::1/128 scope host
             valid_lft forever preferred_lft forever
     2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9001 qdisc mq state UP group default qlen 1000
         link/ether 0e:10:e3:7b:6f:4f brd ff:ff:ff:ff:ff:ff
         inet 10.0.1.116/24 brd 10.0.1.255 scope global noprefixroute dynamic eth0
             valid_lft 2885sec preferred_lft 2885sec
         inet 192.168.1.81/32 scope global eth0
             valid_lft forever preferred_lft forever
         inet6 fe80::c10:e3ff:fe7b:6f4f/64 scope link
             valid_lft forever preferred_lft forever
     [root@dbprim00 ~]

Update the route table with the Overlay IP pointing to the Db2 primary instance:

     aws ec2 create-route --route-table-id rtb-xxxxxxxx --destination-
     cidr-block Overlay IP --instance-id i-xxxxxxxx
     [root@dbprim00 ~] aws ec2 create-route --route-table-id rtb-
     dbe0eba1 --destination-cidr-block 192.168.1.81/32 --instance-id i-09d1b1f105f71e5ed
     {
         "Return": true
     }

     [root@dbprim00 ~] pcs resource create db2-oip aws-vpc-move-ip
     ip=192.168.1.81 interface=eth0 routing_table=rtb-dbe0eba1

Note: If you are using a different route table for both the subnets where you are deploying the primary and standby databases, you can specify them using a comma (,) in the resource creation command:

pcs resource create db2-oip aws-vpc-move-ip ip=192.168.1.81 interface=eth0 routing_table=rtb-xxxxx1,rtb-xxxxx2

Create a colocation constraint to bind the Overlay IP resource agent with the Db2 primary instance.

     [root@dbprim00 ~] pcs constraint colocation add db2-oip with master Db2_HADR_STJ-master 2000

You can now remove the maintenance mode by using the following code:

      [root@dbprim00 ~] pcs property set maintenance-mode=’false’

This is the final configuration of the cluster:

      [root@dbprim00 ~] pcs config show
      Cluster Name: db2ha
      Corosync Nodes:
        dbprim00 dbsec00
      Pacemaker Nodes:
        dbprim00 dbsec00

      Resources:
        Master: Db2_HADR_STJ-master
          Resource: Db2_HADR_STJ (class=ocf provider=heartbeat type=db2)
            Attributes: dblist=STJ instance=db2stj
            Meta Attrs: notify=true resource-stickiness=5000
            Operations: demote interval=0s timeout=120 (Db2_HADR_STJ-demote-interval-0s)
      monitor interval=20 timeout=60 (Db2_HADR_STJ-monitor-interval-20)
      monitor interval=22 role=Master timeout=60 (Db2_HADR_STJ-monitor-interval-22)
      notify interval=0s timeout=10 (Db2_HADR_STJ-notify-interval-0s)
      promote interval=0s timeout=120 (Db2_HADR_STJ-promote-interval-0s)
      start interval=0s timeout=120 (Db2_HADR_STJ-start-interval-0s)
      stop interval=0s timeout=120 (Db2_HADR_STJ-stop-interval-0s)
      Resource: db2-oip (class=ocf provider=heartbeat type=aws-vpc-move-ip)
      Attributes: interface=eth0 ip=192.168.1.81 routing_table=rtb-dbe0eba1
      Operations: monitor interval=60 timeout=30 (db2-oip-monitor-interval-60)
      start interval=0s timeout=180 (db2-oip-start-interval-0s)
      stop interval=0s timeout=180 (db2-oip-stop-interval-0s)

      Stonith Devices:
        Resource: clusterfence (class=stonith type=fence_aws)
          Attributes: pcmk_host_map=dbprim00:i-09d1b1f105f71e5ed;dbsec00:i-0c0d3444601b1d8c5 pcmk_reboot_retries=4 pcmk_reboot_timeout=480 power_timeout=240 region=us-east-1
          Operations: monitor interval=60s (clusterfence-monitor-interval-60s)
      Fencing Levels:

      Location Constraints:
      Ordering Constraints:
      Colocation Constraints:
          db2-oip with Db2_HADR_STJ-master (score:2000) (rsc-role:Started) (with-rsc-role:Master)
        Ticket Constraints:

      Alerts:
        No alerts defined

      Resources Defaults:
        No defaults set
      Operations Defaults:
        No defaults set

      Cluster Properties:
        cluster-infrastructure: corosync
        cluster-name: db2ha
        dc-version: 1.1.18-11.el7_5.4-2b07d5c5a9
        have-watchdog: false

        Quorum:
        Options:
     [root@dbprim00 ~]

Step 6b. Setup on SLES

This section focuses on setting up the Pacemaker cluster on the SLES operating system.

Prerequisite: You need to complete this on both the Db2 primary and standby instances.

To create an AWS CLI profile:

The SLES operating system’s resource agents use the AWS Command Line Interface (CLI). You need to create the AWS CLI profile for the root account on both instances: one with the default profile and the other with an arbitrary profile name (in this example, cluster) which creates output in text format. The region of the instance must be added as well.

Replace the string region-name with your target region in the following example.

     dbprim00:~  aws configure
     {aws} Access Key ID [None]:
     {aws} Secret Access Key [None]:
     Default region name [None]: us-east-1
     Default output format [None]:
     dbprim00:~  aws configure --profile cluster
     {aws} Access Key ID [None]:
     {aws} Secret Access Key [None]:
     Default region name [None]: us-east-1
     Default output format [None]: text

You don’t need to provide the Access Key and Secret Access key, because access is controlled by the IAM role you created earlier in the setup.

Add a second private IP address.
You are required to add a second private IP address for each cluster instance. Adding a second IP address to the instance allows the SUSE cluster to implement a two-ring corosync configuration. The two-ring corosync configuration allows the cluster nodes to communicate with each other using the secondary IP address if there is an issue communicating with each other over the primary IP address.

See To assign a secondary private IPv4 address to a network interface.
Add a tag with an arbitrary "key" (in this case, pacemaker). The value of this tag is the hostname of the respective Db2 instance. This is required to enable AWS CLI to use filters in the API calls.
Disable the source/destination check.
Ensure that the source/destination check is disabled, as described in Step 4c.
Avoid deletion of cluster-managed IP addresses on the eth0 interface.
Check if the package cloud-netconfig-ec2 is installed with the following command:
```
     dbprim00:~  zypper info cloud-netconfig-ec2
```
Update the file /etc/sysconfig/network/ifcfg-eth0 if this package is installed. Change the following line to a "no" setting or add the following line if the package is not yet installed:
```
     dbprim00:~  CLOUD_NETCONFIG_MANAGE='no'
```
Set up NTP (best with YaST). Use AWS time service at 169.254.169.123, which is accessible from all EC2 instances. Enable ongoing synchronization.

Activate the public cloud module to get updates for the AWS CLI:

     dbprim00:~  SUSEConnect --list-extensions
     dbprim00:~  SUSEConnect -p sle-module-public-cloud/12/x86_64
     Registering system to registration proxy https://smt-ec2.susecloud.net
     Updating system details on https://smt-ec2.susecloud.net ...
      Activating sle-module-public-cloud 12 x86_64 ...
     -> Adding service to system ...
     -> Installing release package ...
     Successfully registered system

Update your packages on both the with the command:
```
      dbprim00:~  zypper -n update
```

Install the resource agent pattern ha_sles.

      dbprim00:~  zypper install -t pattern ha_sles

To configure pacemaker: Configuration of the corosync.conf file:

Use the following configuration in the /etc/corosync/corosync.conf file on both the Db2 primary and standby instances:

     #Read the corosync.conf.5 manual page
     totem {
        version: 2
        rrp_mode: passive
        token: 30000
        consensus: 36000
        token_retransmits_before_loss_const: 10
        max_messages: 20
        crypto_cipher: none
        crypto_hash: none
        clear_node_high_bit: yes
        interface {
           ringnumber: 0
           bindnetaddr: <ip-local-node>
           mcastport: 5405
           ttl: 1
       }
       transport: udpu
     }
      logging {
          fileline: off
          to_logfile: yes
          to_syslog: yes
          logfile: /var/log/cluster/corosync.log
          debug: off
          timestamp: on
          logger_subsys {
             subsys: QUORUM
             debug: off
        }
      }
      nodelist {
        node {
        ring0_addr: <ip-node-1>
        ring1_addr: <ip2-node-1>
        nodeid: 1
        }
        node {
        ring0_addr: <ip-node-2>
        ring1_addr: <ip2-node-2>
        nodeid: 2
        }
     }

     quorum {
        #Enable and configure quorum subsystem (default: off)
        #see also corosync.conf.5 and votequorum.5
        provider: corosync_votequorum
        expected_votes: 2
        two_node: 1
     }

Replace the variables ip-node-1 / ip2-node-1 and ip-node-2 / ip2-node-2 with the IP addresses of your Db2 primary and standby instances, respectively. Replace ip-local-node with the IP address of the instance on which the file is being created.

The chosen settings for crypto_cipher and crypto_hash are suitable for clusters in AWS. They may be modified according to SUSE’s documentation if you want strong encryption of cluster communication.

Start the cluster services and enable them on both the Db2 primary and standby instances.

          dbprim00:~ # systemctl start pacemaker
dbprim00:~ # systemctl enable pacemaker
Created symlink from /etc/systemd/system/multi-user.target.wants/pacemaker.service to /usr/lib/systemd/system/pacemaker.service.

Check the configuration with the following command:

          dbprim00:~  corosync-cfgtool -s
Printing ring status.
Local node ID 1
RING ID 0
        id      = 10.0.1.17
        status  = ring 0 active with no faults
RING ID 1
        id      = 10.0.1.62
        status  = ring 1 active with no faults
dbprim00:~

Cluster status:
dbprim00:~  crm_mon -1
Stack: corosync
Current DC: dbsec00 (version 1.1.19+20181105.ccd6b5b10-3.13.1-1.1.19+20181105.ccd6b5b10) - partition with quorum
Last updated: Fri Apr 17 14:09:56 2020
Last change: Fri Apr 17 13:38:59 2020 by hacluster via crmd on dbsec00

2 nodes configured
0 resources configured

Online: [ dbprim00 dbsec00 ]

No active resources

To prepare the cluster for adding resources:

To avoid cluster starting partially defined resources, set the cluster to maintenance mode. This deactivates all monitor actions.

     dbprim00:~  crm configure property maintenance-mode="true"
dbprim00:~  crm status
Stack: corosync
Current DC: dbprim00 (version 1.1.19+20181105.ccd6b5b10-3.16.1-1.1.19+20181105.ccd6b5b10) - partition with quorum
Last updated: Fri Apr 17 14:30:51 2020
Last change: Fri Apr 17 14:30:50 2020 by root via cibadmin on dbprim00

2 nodes configured
0 resources configured

              *** Resource management is DISABLED ***
  The cluster will not attempt to start, stop or recover services

Online: [ dbprim00 dbsec00 ]

No resources

Configuring AWS-specific settings:

      dbprim00:/ha-files  vi crm-bs.txt
      dbprim00:/ha-files  more crm-bs.txt
      property cib-bootstrap-options: \
      stonith-enabled="true" \
      stonith-action="off" \
      stonith-timeout="600s"
      rsc_defaults rsc-options: \
      resource-stickiness=1 \
      migration-threshold=3
      op_defaults op-options: \
      timeout=600 \
      record-pending=true

The off setting forces the agents to shut down the instance. You have the option of changing it to reboot if required.

Add the following configuration to the cluster:

     dbprim00:~  crm configure load update crm-bs.txt

To configure the AWS-specific STONITH resource:

Create a file with the following content:
```
      primitive res_AWS_STONITH stonith:external/ec2 \
      op start interval=0 timeout=180 \
      op stop interval=0 timeout=180 \
      op monitor interval=180 timeout=60 \
      params tag=pacemaker profile=cluster
```
The EC2 tag pacemaker entry needs to match the tag chosen for the EC2 instances, and the name of the profile needs to match the previously configured AWS profile as part of the prerequisite section.

Add the file to the configuration:

      dbprim00:/ha-files  vi aws-stonith.txt
dbprim00:/ha-files  more aws-stonith.txt
primitive res_{aws}_STONITH stonith:external/ec2 \
op start interval=0 timeout=180 \
op stop interval=0 timeout=180 \
op monitor interval=120 timeout=60 \
params tag=pacemaker profile=cluster

dbprim00:/ha-files  crm configure load update aws-stonith.txt

Create the Db2 Primary/Standby resource.

Create a file with the following content. Change the value for SID, as per your configuration.

     primitive rsc_db2_db2stj_STJ db2 \
        params instance="db2stj" dblist="STJ" \
        op start interval="0" timeout="130" \
        op stop interval="0" timeout="120" \
        op promote interval="0" timeout="120" \
        op demote interval="0" timeout="120" \
        op monitor interval="30" timeout="60" \
        op monitor interval="31" role="Master" timeout="60"
ms msl_db2_db2stj_STJ rsc_db2_db2stj_STJ \
        meta target-role="Started" notify="true"

Add the file to the configuration:

     dbprim00:/ha-files  vi db2res.txt
dbprim00:/ha-files  more db2res.txt
primitive rsc_db2_db2stj_STJ db2 \
        params instance="db2stj" dblist="STJ" \
        op start interval="0" timeout="130" \
        op stop interval="0" timeout="120" \
        op promote interval="0" timeout="120" \
        op demote interval="0" timeout="120" \
        op monitor interval="30" timeout="60" \
        op monitor interval="31" role="Master" timeout="60"
dbprim00:/ha-files  crm configure load update db2res.txt

Create the Overlay IP resource agent.

First, add the Overlay IP in the Db2 primary instance.

      dbprim00:~ ip address add 192.168.1.81/32 dev eth0
dbprim00:~ ip addr show
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host
       valid_lft forever preferred_lft forever
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9001 qdisc mq state UP group default qlen 1000
    link/ether 0e:73:7f:b5:b2:95 brd ff:ff:ff:ff:ff:ff
    inet 10.0.1.17/24 brd 10.0.1.255 scope global eth0
       valid_lft forever preferred_lft forever
    inet 10.0.1.62/32 scope global eth0
       valid_lft forever preferred_lft forever
    inet 192.168.1.81/32 scope global eth0
       valid_lft forever preferred_lft forever
    inet6 fe80::c73:7fff:feb5:b295/64 scope link
       valid_lft forever preferred_lft forever

[root@dbprim00 ~]

Update the route table with the Overlay IP pointing to the Db2 primary instance.

     aws ec2 create-route --route-table-id rtb-xxxxxxxx --destination-cidr-block Overlay IP --instance-id i-xxxxxxxx

Note: If you are using different route table for both the subnets where you are deploying the primary and standby database, you can specify them with a comma (,) in the command preceding this note.

     dbprim00:~  aws ec2 create-route --route-table-id rtb-dbe0eba1 --
     destination-cidr-block 192.168.1.81/32 --instance-id i-05fc8801284585362
     {
         "Return": true
     }

The aws-vpc-move-ip resource agent call the AWS command from the location /usr/bin, so ensure that there is a soft link pointing to the location where you have the awscli installed.

     dbprim00:/usr/bin  which aws
     /usr/local/bin/aws
     dbprim00:/usr/bin  ls -ltr aws
     lrwxrwxrwx 1 root root 18 Apr 18 17:44 aws -> /usr/local/bin/aws

Create the file with the following content, and replace the Overlay IP and the route table ID based on your configuration. If you have multiple route tables associated with the subnet to which your instances belong, you can use a comma-separated list of routing tables.

Note: Make sure you use the same profile name (which is cluster for this setup) that you used while configuring the AWS CLI.

     dbprim00:/ha-files  vi aws-move-ip.txt
     dbprim00:/ha-files  more aws-move-ip.txt
     primitive res_AWS_IP ocf:suse:aws-vpc-move-ip \
          params ip=192.168.1.81 routing_table=rtb-dbe0eba1 interface=eth0
     profile=cluster \
          op start interval=0 timeout=180 \
          op stop interval=0 timeout=180 \
          op monitor interval=60 timeout=60
     dbprim00:/ha-files  crm configure load update aws-move-ip.txt

Create a colocation constraint to bind the Overlay IP resource agent with the Db2 primary instance.

      dbprim00:/ha-files  more crm-cs.txt
      colocation col_db2_db2stj_STJ 2000: res_AWS_IP:Started \
      msl_db2_db2stj_STJ:Master
      dbprim00:/ha-files  crm configure load update crm-cs.txt
      dbprim00:/ha-files

Adjust the resource-stickiness and migration-threshold values.

     dbprim00:~  crm configure rsc_defaults resource-stickiness=1000
     dbprim00:~  crm configure rsc_defaults migration-threshold=5000

You can now remove maintenance-mode.

      dbprim00:~  crm configure property maintenance-mode="false"

Final configuration of the cluster:

      dbprim00:/ha-files  crm status
      Stack: corosync
      Current DC: dbsec00 (version 1.1.19+20181105.ccd6b5b10-3.16.1-1.1.19+20181105.ccd6b5b10) - partition with quorum
      Last updated: Sat Apr 18 18:45:53 2020
      Last change: Sat Apr 18 16:01:26 2020 by root via cibadmin on dbprim00

      2 nodes configured
      4 resources configured

      Online: [ dbprim00 dbsec00 ]

      Full list of resources:

      res_AWS_STONITH        (stonith:external/ec2): Started dbprim00
      Master/Slave Set: msl_db2_db2stj_STJ [rsc_db2_db2stj_STJ]
      Masters: [ dbprim00 ]
      Slaves: [ dbsec00 ]
      res_AWS_IP     (ocf::suse:aws-vpc-move-ip):    Started dbprim00

      dbprim00:/ha-files  crm configure show
      node 1: dbprim00
      node 2: dbsec00
      primitive res_AWS_IP ocf:suse:aws-vpc-move-ip \
      params ip=192.168.1.81 routing_table=rtb-dbe0eba1 interface=eth0 profile=cluster \
      op start interval=0 timeout=180 \
      op stop interval=0 timeout=180 \
      op monitor interval=60 timeout=60
      primitive res_AWS_STONITH stonith:external/ec2 \
      op start interval=0 timeout=180 \
      op stop interval=0 timeout=180 \
      op monitor interval=120 timeout=60 \
      params tag=pacemaker profile=cluster
      primitive rsc_db2_db2stj_STJ db2 \
      params instance=db2stj dblist=STJ \
      op start interval=0 timeout=130 \
      op stop interval=0 timeout=120 \
      op promote interval=0 timeout=120 \
      op demote interval=0 timeout=120 \
      op monitor interval=30 timeout=60 \
      op monitor interval=31 role=Master timeout=60
      ms msl_db2_db2stj_STJ rsc_db2_db2stj_STJ \
      meta target-role=Started notify=true
      colocation col_db2_db2stj_STJ 2000: res_AWS_IP:Started msl_db2_db2stj_STJ:Master
      property cib-bootstrap-options: \
      have-watchdog=false \
      dc-version="1.1.19+20181105.ccd6b5b10-3.16.1-1.1.19+20181105.ccd6b5b10" \
      cluster-infrastructure=corosync \
      maintenance-mode=false \
      stonith-enabled=true \
      stonith-action=off \
      stonith-timeout=600s
      rsc_defaults rsc-options: \
      resource-stickiness=1000 \
      migration-threshold=5000
      op_defaults op-options: \
      timeout=600 \
      record-pending=true

Step 7: Post Setup Configuration

To enable SAP to connect to the Db2 virtual name, post-setup configuration tasks must be performed.

To perform post-setup configuration tasks:

Edit your SAP profile files:

    > vi DEFAULT.PFL

    SAPDBHOST = dbhadb2
    j2ee/dbhost = dbhadb2

    rsdb/reco_trials  = 10
    rsdb/reco_sleep_time  = 10

Update the two parameters (SAPDBHOST and j2ee/dbhost) to the virtual name you chose for your database server. You will have to update the rsdb/reco* parameters to greater than failover duration to avoid DB disconnect in case of failover. We recommend that you test these values in QA before setting it up in production.

Edit your Db2 client file:

     > cd /sapmnt/STJ/global/db6
     sappas01:stjadm 15> more db2cli.ini
     ; Comment lines start with a semi-colon.
     [STJ]
     Database=STJ
     Protocol=tcpip
     Hostname=dbhadb2
     Servicename=5912
     [COMMON]
     Diagpath=/usr/sap/STJ/SYS/global/db6/db2dump

Make sure the hostname parameter matches your Db2 virtual hostname.

After you change the entries and save your file, test your connection to the database server:

     sappas01:stjadm 17> R3trans -d
     This is R3trans version 6.26 (release 745 - 13.04.18 - 20:18:04).
     unicode enabled version
     R3trans finished (0000).
     sappas01:stjadm 18> startsap
     Checking db Database
     Database is running
     -------------------------------------------
     Starting Startup Agent sapstartsrv
     OK
     Instance Service on host sappas01 started
     -------------------------------------------
     starting SAP Instance D00
     Startup-Log is written to /home/stjadm/startsap_D00.log
     -------------------------------------------
     /usr/sap/STJ/D00/exe/sapcontrol -prot NI_HTTP -nr 00 -function Start
     Instance on host sappas01 started

Figure 6 – SAP system status information

You can check get the status/information of HADR in the transaction DB02/dbacockpit > Configuration > Overview.

Figure 7 – Transaction DB02 database instance information

Figure 8 – Transaction DB02 HADR information

Step 8: Testing and Validation

We recommend you define your failure scenarios and test them on your cluster. Unless otherwise specified, all tests are done with the primary node running on the primary server (dbprim00) and the standby node running on the standby server (dbsec00).

Prerequisite: Before running any tests, please ensure that:

There is no error or failed action in the Pacemaker. This can be tested using pcs status. In case there is any failed action, check the cause in /var/log/cluster/corosync.log in the node on which it has failed, and then take the corrective action. You can clean the failed action using pcs/crm resource cleanup.
There is no unintended location constraint set up. Using the pcs/crm resource, move the master from primary to standby to set a location constraint on the primary node which prevents any resource from starting on it. This can be identified using the pcs/crm constraint show. Note the ID of the location constraint, and then run pcs/crm constraint delete <id> to remove it.
The Db2 HADR synchronization is working. This can be checked using db2pd -hadr -db <DBSID> and comparing the LOG_FILE, PAGE, and POS for primary and standby.
Refer to Appendix 1 for detailed test cases on RHEL setup.
Refer to Appendix 2 for detailed test cases on SLES Setup

Warning Javascript is disabled or is unavailable in your browser.

To use the Amazon Web Services Documentation, Javascript must be enabled. Please refer to your browser's Help pages for instructions.

Document Conventions

Business Continuity

Operations