This module is enhancing the architecture by additional SAP HANA system increasing the Availability.
High Availability scenario is based on the Pacemaker cluster automating the takeover to secondary SAP HANA system.
Different options how Cluster IP can be configured are presented - each having its own advantages and disadvantages.
- Module: High Availability
This is the very basic option how to increase SAP HANA Availability by adding secondary SAP HANA system in separate Availability Zone and configuring synchronous SAP HANA System Replication (see Administration Guide: Replication Modes for SAP HANA System Replication for additional information).
Following two Replication Modes are acceptable for Availability management:
- Synchronous on disk (
SYNC
) - Synchronous in-memory (
SYNCMEM
)
Synchronous on disk (SYNC
) Replication Mode is having higher latency impact because it waits for disk write operation on secondary SAP HANA system to complete. The advantage is that Recovery Point Objective (RPO) is guaranteed to be zero (no data loss possible as long as secondary system is connected). This option is recommended in situations where we have potential Single Point of Failure (SPOF) shared between both primary and secondary SAP HANA system.
Synchronous in-memory (SYNCMEM
) Replication Mode is having Recovery Point Objective (RPO) only "close to zero" because information on secondary SAP HANA database is written to disk asynchronously. The advantage is improved performance because the latency impact is reduced by disk write operation. However, this Replication Mode can lead to data loss in case that both primary and secondary SAP HANA system will fail at the same time - therefore it recommended only in scenarios where there is no Single Point of Failure (SPOF) shared between both primary and secondary system - for example in combination with Availability Zones.
Note that Full Sync Option
as described in Administration Guide: Full Sync Option for SAP HANA System Replication is not suitable for any High Availability usage. This is because any failure (either of primary or secondary SAP HANA System) will result in remaining SAP HANA System to be blocked.
Asynchronous (ASYNC
) Replication Mode is not acceptable because it cannot guarantee Recovery Point Objective (RPO) to be zero or "close to zero".
In this scenario the takeover is executed manually by SAP HANA administrator (see Administration Guide: Performing a Takeover for additional information) and therefore the Recovery Time Objective (RTO) depends mainly on monitoring lag and reaction time of support teams.
There are two techniques how to ensure that application connectivity to SAP HANA database is not disrupted following the takeover operation (see Administration Guide: Client Connection Recovery After Takeover for additional information):
- IP redirection (referred as Cluster IP)
- DNS redirection
IP redirection (repointing Cluster IP to new primary SAP HANA system after takeover) or DNS redirection is executed manually following the takeover action.
The implementation details for Cluster IP are platform specific and are described in Platform Specific Architecture part of the documentation.
Additional Information:
- How To SAP HANA System Replication Whitepaper
- SAP Note 2407186: How-To Guides & Whitepapers For SAP HANA High Availability
In order to decrease the Recovery Time Objective (RTO) the takeover process must be automated. This can be done using Pacemaker as cluster management solution.
Additional Information:
- ClusterLabs: Pacemaker Documentation
- SLES12 SP4: High Availability Extension - Administration Guide
- SLES15 GA: High Availability Extension - Administration Guide
- RHEL7: High Availability Add-On Reference
When dealing with Pacemaker cluster there are two topics that are impacting the final architecture:
- Implementation of Cluster IP
- Fencing Mechanism
Technical implementation of Cluster IP is specific to given infrastructure and therefore is in detail described in Platform Specific Architecture part of the documentation. In this section of the Reference Architecture we will only discuss generic concepts how to design Cluster IP configuration.
Fencing is critical mechanism protecting data from being corrupted. What is fencing and how it works is explained here:
Sections below are explaining two basic options how to implement Fencing Mechanism:
- IPMI-like Fencing
- SBD (Storage Based Death) Fencing
Recommendation which option to use on each platform is described in Platform Specific Architecture part of the documentation.
IPMI-like fencing approach is based on direct access to Management Interface of given server which is also called IPMI (Intelligent Platform Management Interface) and which is having ability to power down the given server.
Here are some example implementations of IPMI-like agents:
- Amazon Web Services (AWS): external/ec2
- Microsoft Azure: fence_azure_arm
- Google Cloud Platform (GCP): external/gcpstonith
- On-premise Bare Metal: external/ipmi
All of these agents are having same purpose - to kill the Virtual Machine or Bare Metal Server as soon as technically possible. The goal is not to perform graceful shutdown but to immediately terminate the server to ensure that it is down before secondary server can takeover.
In cluster configuration each SAP HANA server needs to have its own IPMI-like fencing agent configured and fully operational. The IPMI-like fencing agent is always called from the remote side (for example secondary systems are fencing off primary systems).
SBD (Storage Based Death) fencing is based on different approach.
The SBD device is shared raw disk (can be connected via Fibre Channel, Fibre Channel over Ethernet or iSCSI) that is used to send messages to other nodes.
When SBD device is initiated it will overwrite the beginning of the disk device with messaging slot structure. This structure is used by individual nodes to send messages to other nodes. Each cluster node is running SBD daemon that is watching the slot dedicated to given node and performing associated actions.
In case of fencing event the node that is triggering the fencing will write the "poison pill" message to the slot associated with target system. SBD daemon on target system is monitoring given slot and once it will read the "poison pill" message it will execute suicide action (self-fencing itself from the cluster by instant powering off).
Here is implementation of SBD agent:
Recommended amount of SBD devices is either:
- Three SBD devices - each in separate Availability Zone (visualized on picture above)
- One SBD device - in 3rd Availability Zone (other than used by SAP HANA VMs)
Additional Information:
As said above Fencing concept is vital to protect the data from corruption. However, in case of communication issues between cluster nodes there is risk that individual nodes will continuously keep fencing each other by taking over the role of primary system.
Therefore, there is second equally important concept called Quorum that is deciding which subgroup of cluster nodes is entitled to become primary.
Most simple implementation of Quorum is to use odd number of nodes and base the Quorum logic on majority of nodes in subgroup.
This method is applicable to SAP HANA Scale-Out configurations where we have same number of nodes on each Availability Zone and we need one additional VM in 3rd Availability Zone to act as "Majority Maker" helping to decide which side will become primary.
Recommended settings are described in SLES12 SP2: SAP HANA System Replication Scale-Out - Cluster Bootstrap and more.
Concept of using "Majority Maker" is also applicable to SAP HANA Single-Node implementations (two node clusters). However, such small clusters can be also implemented using special two_node
Quorum approach as described in SLES12 SP4: High Availability Extension - Corosync Configuration for Two-Node Clusters:
quorum {
provider: corosync_votequorum
expected_votes: 2
two_node: 1
}
Note that setting two_node: 1
value will implicitly configure wait_for_all: 1
.
The configuration is explained in votequorum
man pages Man Pages: VOTEQUORUM(5) and in following document New quorum features in Corosync 2.
Effectively the configuration setting is adjusting how Quorum is calculated. When cluster is started for the first time (both nodes down) then wait_for_all: 1
parameter is ensuring that both nodes need to be available to achieve the quorum. This is critical to protect the consistency of data as explained in following blogs:
- Be Prepared for Using Pacemaker Cluster for SAP HANA – Part 1: Basics
- Be Prepared for Using Pacemaker Cluster for SAP HANA – Part 2: Failure of Both Nodes
However, if the cluster is already active (and Quorum was achieved) then parameter two_node: 1
will ensure that in case that one node will fail the other node is still having Quorum even if it is not having majority.
In case of split-brain situation (when both nodes are active however unable to communicate), both nodes will have Quorum and both nodes will race to fence the other node. The node that will win the race will be primary.
This option is applicable only to Single-Node scenario where we have one SAP HANA System in each Availability Zone. In such case "Majority Maker" VM in 3rd Availability Zone is not required (although possible).
The simplest SAP HANA High Availability scenario needs only one Cluster IP that is following Active Nameserver of primary SAP HANA system (which is where System Database is available).
Each tenant in SAP HANA system is having its own ports that can be used to directly connect to given Tenant Database. Although this direct connection is possible it is recommended to connect indirectly by specifying the port for the System Database (3xx13
for ODBC/JDBC/SQLDBC access) and the Tenant Database name. SAP HANA system will ensure that connection is internally rerouted to target Tenant Database.
In case of takeover event the Pacemaker cluster will ensure that Cluster IP is moved to new primary SAP HANA system.
As explained above the technical implementation of Cluster IP is in detail covered in Platform Specific Architecture part of the documentation.
Additional Information:
- Administration Guide: Server Components of the SAP HANA Database
- Tenant Databases: Connections for Tenant Databases
- Tenant Databases: Scale-Out Architecture of Tenant Databases
- Administration Guide: Connections from Database Clients and Web Clients to SAP HANA
- Administration Guide: Connections for Distributed SAP HANA Systems
- TCP/IP Ports of All SAP Products
Traditional implementation of Cluster IP is based on ARP cache invalidation. On primary server Pacemaker Cluster will define Cluster IP address by using command ip addr add
combined with ARP cache invalidation via arping
(see ClusterLabs / resource-agents / heartbeat / IPaddr2). During the takeover the Pacemaker Cluster will remove Cluster IP address from old primary server by using command ip addr del
and it will recreate it on new primary server using commands mentioned above. The key requirement here is that both primary and secondary server are in same subnet so that Cluster IP address can be moved between them.
Historically secondary SAP HANA System was closed, and connection attempts were rejected (this is still valid for Operation Modes delta_datashipping
or logreplay
).
Since SAP HANA 2.0 new operation mode logreplay_readaccess
is available which is offering capability to open secondary SAP HANA System for read-only access.
As explained in SAP HANA Administration Guide: Connection Types and in Administration Guide: Virtual IP Address Handling secondary Cluster IP following Active Nameserver of secondary SAP HANA system is required.
During normal operation both Cluster IP addresses are anti-collocated to each other - primary Cluster IP address is following primary SAP HANA System and secondary Cluster IP address is following secondary SAP HANA System.
As part of takeover event the Pacemaker cluster will switch location of both IP addresses along with change of primary and secondary roles of SAP HANA Systems.
Additional Information:
- Administration Guide: Active/Active (Read Enabled)
- Administration Guide: Connection Types
- Administration Guide: Virtual IP Address Handling
SAP HANA is offering option to move Tenant Database from existing SAP HANA System to new SAP HANA System having different SID
and system_number
.
Architecture documented in previous section is having one big limitation related to Tenant Move operation. The design is supporting multiple Tenant Databases on one SAP HANA cluster however, all tenants are accessed over one shared Cluster IP.
In such configuration when Tenant Database is moved, all applications connecting to this Tenant Database must be reconfigured to use Cluster IP of target SAP HANA cluster.
To make Tenant Move operation as seamless as possible each tenant needs to have its own Cluster IP that will be moved to target SAP HANA cluster along with given tenant.
All tenant-specific Cluster IPs are implemented in same way as System Database Cluster IP, they are following Active Nameserver of primary SAP HANA system - which is where System Database, used to connect to individual tenants, is available.
Second challenge that needs to be addressed is port used for connecting to System Database (3xx13
for ODBC/JDBC/SQLDBC access). This port is dependent on system_number
of given SAP HANA System and therefore can differ. Solution to this problem is to allocate additional port (same across all SAP HANA Systems) on which System Database Tenant will listen. The procedure is described in Administration Guide: Configure Host-Independent Tenant Addresses.
The procedure how to relocate Tenant Database to new SAP HANA System is described in SAP HANA Tenant Move.
Additional Information: