Crash the primary database on node 2 - SAP HANA on AWS

Crash the primary database on node 2

Description — Simulate a complete breakdown of the primary database system.

Run node — Primary SAP HANA database node (on node 2).

Run steps:

  • Stop the primary database (on node 2) system using the following command as <sid>adm.

    sechana:~ # su - hdbadm hdbadm@sechana:/usr/sap/HDB/HDB00> HDB kill -9 hdbenv.sh: Hostname sechana defined in $SAP_RETRIEVAL_PATH=/usr/sap/ HDB/HDB00/sechana differs from host name defined on command line. hdbenv.sh: Error: Instance not found for host -9 killing HDB processes: kill -9 30751 /usr/sap/HDB/HDB00/sechana/trace/hdb.sapHDB_HDB00 -d -nw -f /usr/sap/HDB/HDB00/sechana/daemon.ini pf=/usr/sap/HDB/SYS/profile/HDB_HDB00_sechana kill -9 30899 hdbnameserver kill -9 31166 hdbcompileserver kill -9 31168 hdbpreprocessor kill -9 31209 hdbindexserver -port 30003 kill -9 31211 hdbxsengine -port 30007 kill -9 31721 hdbwebdispatcher kill orphan HDB processes: kill -9 30899 [hdbnameserver] <defunct> kill -9 31209 [hdbindexserver] <defunct>

Expected result:

  • The cluster detects stopped primary SAP HANA database (on node 2) and promotes the secondary SAP HANA database (on node 1) to take over as primary.

    sechana:~ # crm status Stack: corosync Current DC: prihana (version 1.1.18+20180430.b12c320f5-3.24.1-b12c320f5) - partition with quorum Last updated: Thu Nov 12 12:04:01 2020 Last change: Thu Nov 12 12:03:53 2020 by root via crm_attribute on prihana 2 nodes configured 6 resources configured Online: [ prihana sechana ] Full list of resources: res_AWS_STONITH (stonith:external/ec2): Started prihana res_AWS_IP (ocf::suse:aws-vpc-move-ip): Started prihana Clone Set: cln_SAPHanaTopology_HDB_HDB00 [rsc_SAPHanaTopology_HDB_HDB00] Started: [ prihana sechana ] Master/Slave Set: msl_SAPHana_HDB_HDB00 [rsc_SAPHana_HDB_HDB00] Masters: [ prihana ] Slaves: [ sechana ] Failed Actions: * rsc_SAPHana_HDB_HDB00_monitor_60000 on sechana 'master (failed)' (9): call=66, status=complete, exitreason='', last-rc-change='Thu Nov 12 11:58:53 2020', queued=0ms, exec=0ms
  • The overlay IP address is migrated to the new primary (on node 1).

  • With the AUTOMATIC_REGISTER parameter set to "true", the cluster restarts the failed SAP HANA database and automatically registers it against the new primary.

Recovery procedure:

  • Clean up the cluster “failed actions” on node 2 as root.

    sechana:~ # crm resource cleanup rsc_SAPHana_HDB_HDB00 sechana Cleaned up rsc_SAPHana_HDB_HDB00:0 on sechana Cleaned up rsc_SAPHana_HDB_HDB00:1 on sechana Waiting for 1 replies from the CRMd. OK
  • After resource cleanup, the cluster “failed actions” are cleaned up.