Skip to content

Commit

Permalink
Merge pull request #214 from fmherschel/angi-ScaleOut
Browse files Browse the repository at this point in the history
Angi scale out 1.2.3
  • Loading branch information
fmherschel authored Nov 6, 2023
2 parents 9744cc2 + 318b731 commit 921a656
Show file tree
Hide file tree
Showing 14 changed files with 153 additions and 561 deletions.
2 changes: 1 addition & 1 deletion SAPHanaSR-angi.spec
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,7 @@ License: GPL-2.0
Group: Productivity/Clustering/HA
AutoReqProv: on
Summary: Resource agents to control the HANA database in system replication setup
Version: 1.2.2
Version: 1.2.3
Release: 0
Url: https://www.suse.com/c/fail-safe-operation-of-sap-hana-suse-extends-its-high-availability-solution/

Expand Down
39 changes: 21 additions & 18 deletions man/SAPHanaSR-ScaleOut.7
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
.\" Version: 1.001
.\"
.TH SAPHanaSR-ScaleOut 7 "18 Sep 2023" "" "SAPHanaSR-angi"
.TH SAPHanaSR-ScaleOut 7 "31 Oct 2023" "" "SAPHanaSR-angi"
.\"
.SH NAME
SAPHanaSR-ScaleOut \- Tools for automating SAP HANA system replication in
Expand Down Expand Up @@ -140,7 +140,7 @@ to repair a local failure.
5. \fBRecovering from failure of master nameserver\fR
.br
If the master nameserver of an HANA database system fails, the HANA will start
the nameserver on another node. Therefore usually up to two nodes are
the nameserver on another node. Therefor usually up to two nodes are
configured as additional nameserver candidates. At least one of them should be
a standby node to optimize failover time. The Linux cluster will detect the
change and move the IP address to the new active master nameserver.
Expand Down Expand Up @@ -210,14 +210,14 @@ cluster tests.
.PP
\fB*\fR Be patient. For detecting the overall HANA status, the Linux cluster
needs a certain amount of time, depending on the HANA and the configured
intervalls and timeouts.
intervals and timeouts.
.PP
\fB*\fR Before doing anything, always check for the Linux cluster's idle status,
left-over migration constraints, and resource failures as well as the HANA
landscape status, and the HANA SR status.
.PP
\fB*\fR Manually activating an HANA primary creates risk of a dual-primary
situation. The user is responsible for data integrity.
situation. The user is responsible for data integrity. See also susTkOver.py(7).
.PP
.\"
.SH REQUIREMENTS
Expand All @@ -239,19 +239,21 @@ the Linux system. If users are resolved by remote service, local caching is
neccessary. Substitute user (su) to sidadm needs to work reliable and without
customized actions or messages. Supported shell is bash.
.PP
3. Strict time synchronization between the cluster nodes, e.g. NTP.
3. Strict time synchronization between the cluster nodes, e.g. NTP. All nodes of
the Linux cluster have configured the same timezone.
.PP
4. For scale-out there is no other SAP HANA system (like QA) on the nodes
which needs to be stopped during takeover. Both HANA database systems are
running memory-preload. Also MCOS is currently not supported for scale-out.
.PP
5. Only one system replication between the two SAP HANA database systems in
the Linux cluster. Maximum one system replication to an HANA database outside
the Linux cluster.
5. Only one system replication between the two SAP HANA databases in the Linux
cluster. Maximum one system replication to an HANA database outside the Linux
cluster.
.PP
6. The replication mode is either sync or syncmem. Replication mode async is
not supported. The operation modes delta_datashipping, logreplay and
logreplay_readaccess are supported.
6. The replication mode is either sync or syncmem for the controlled replication.
Replication mode async is not supported. The operation modes delta_datashipping,
logreplay and logreplay_readaccess are supported. The operation mode logreplay
is default.
.PP
7. Both SAP HANA database systems have the same SAP Identifier (SID) and
Instance Number (INO).
Expand All @@ -275,18 +277,19 @@ that one does not interfere with the Linux cluster. All three clusters
12. The RAs SAPHanaController and SAPHanaToplogy need to be installed on all
cluster nodes, even the majority maker.
.PP
13. Colocation constraints between the SAPHanaController or SAPHana RA and
13. Colocation constraints between the SAPHanaController RA and
other resources are allowed only if they do not affect the RA's scoring.
The location scoring finally depends on system replication status an must not
be over-ruled by additional constraints. Thus it is not allowed to define rules
forcing a SAPHanaController or SAPHana master to follow another resource.
forcing a SAPHanaController promoted instance to follow another resource.
.PP
14. The Linux cluster needs to be up and running to allow HA/DR provider events
being written into CIB attributes. The current HANA SR status might differ
from CIB srHook attribute after cluster maintenance.
.PP
15. Once an HANA system replication site is known to the Linux cluster, that
exact site name has to be used whenever the site is registered manually.
exact site name has to be used whenever the site is registered manually. At any
time only one site is configured as primary replication source.
.PP
16. In two-node HANA scale-out systems only one master nameserver candidate is
configured.
Expand All @@ -301,7 +304,7 @@ Tennant-specific takeover groups are not supported. Sharing standby nodes
across sites is not supported.
.PP
20. In MDC configurations the HANA database is treated as a single system
including all database containers. Therefore, cluster takeover decisions are
including all database containers. Therefor, cluster takeover decisions are
based on the complete status independent of the status of individual containers.
.PP
21. If a third HANA site is connected by system replication, that HANA is not
Expand All @@ -324,7 +327,7 @@ need read/write access and sufficient space in the Linux /tmp filesystem.
26. SAP HANA Native Storage Extension (NSE) is supported.
Important is that this feature does not change the HANA topology or interfaces.
In opposite to Native Storage Extension, the HANA Extension Nodes are changing
the topology and thus currently are not supported.
the topology and thus currently are not supported.
Please refer to SAP documentation for details.
.PP
.\"
Expand Down Expand Up @@ -374,9 +377,9 @@ A.Briel, F.Herschel, L.Pinne.
.PP
.\"
.SH COPYRIGHT
(c) 2015-2018 SUSE Linux GmbH, Germany.
(c) 2015-2017 SUSE Linux GmbH, Germany.
.br
(c) 2019-2023 SUSE LLC
(c) 2018-2023 SUSE LLC
.br
The package SAPHanaSR-angi comes with ABSOLUTELY NO WARRANTY.
.br
Expand Down
4 changes: 2 additions & 2 deletions man/SAPHanaSR-ScaleOut_basic_cluster.7
Original file line number Diff line number Diff line change
Expand Up @@ -158,7 +158,7 @@ op_defaults $id="op_defaults-options" \\
timeout="300s"
.RE
.PP
The following example is for SLE-HA 12 SP4 and 15 SP1 with disk-based SBD:
The following example is for SLE-HA 15 SP1 with disk-based SBD:
.PP
.RS 2
.br
Expand Down Expand Up @@ -196,7 +196,7 @@ op_defaults op-options: \\
record-pending=true
.RE
.PP
.\" TODO example for SLE-HA 15 SP1 and 12 SP5 with disk-based and diskless SBD.
.\" TODO example for SLE-HA 15 SP5 with disk-based and diskless SBD.

\fB* crm SBD stonith configuration\fR

Expand Down
19 changes: 11 additions & 8 deletions man/SAPHanaSR.7
Original file line number Diff line number Diff line change
Expand Up @@ -196,7 +196,8 @@ Linux system. If users are resolved by remote service, local caching is
neccessary. Substitute user (su) to sidadm needs to work reliable and without
customized actions or messages. Supported shell is bash.
.PP
3. Strict time synchronization between the cluster nodes, e.g. NTP.
3. Strict time synchronization between the cluster nodes, e.g. NTP. All nodes of
the Linux cluster have configured the same timezone.
.PP
4. For scale-up the following SAP HANA SR scenarios are possible with the
SAPHanaSR-angi package:
Expand All @@ -215,13 +216,14 @@ SAPHanaSR-angi package:
.br
Note: For MCOS, there must be no constraints between HANA SR pairs.
.PP
5. Only one system replication between the two SAP HANA database in the Linux cluster.
Maximum one system replication to an HANA database outside the Linux cluster.
See also item 12 below.
5. Only one system replication between the two SAP HANA databases in the Linux
cluster. Maximum one system replication to an HANA database outside the Linux
cluster. See also item 12 below.
.PP
6. The replication mode is either sync or syncmem for the controlled replication.
Replication mode async is not supported. The operation modes delta_datashipping,
logreplay and logreplay_readaccess are supported.
logreplay and logreplay_readaccess are supported. The operation mode logreplay
is default.
.PP
7. Both SAP HANA database systems have the same SAP Identifier (SID) and
Instance Number (INO).
Expand All @@ -237,7 +239,7 @@ However, all nodes in one Linux cluster have to use the same style.
9. Automated start of SAP HANA database systems during system boot must be switched
off.
.PP
10. The RA's monitoring operations have to be active.
10. The RAs' monitoring operations have to be active.
.PP
11. Using HA/DR provider hook for srConnectionChanged() by enabling susHanaSR.py
or susHanaSrMultiTarget.py is mandatory.
Expand All @@ -256,15 +258,16 @@ being written into CIB attributes. The current HANA SR status might differ
from CIB srHook attribute after cluster maintenance.
.PP
15. Once an HANA system replication site is known to the Linux cluster, that
exact site name has to be used whenever the site is registered manually.
exact site name has to be used whenever the site is registered manually. At any
time only one site is configured as primary replication source.
.PP
16. Reliable access to the /hana/shared/ filesystem is crucial for HANA and the
Linux cluster.
.PP
17. HANA feature Secondary Time Travel is not supported.
.PP
18. In MDC configurations the HANA database is treated as a single system
including all database containers. Therefore, cluster takeover decisions are
including all database containers. Therefor, cluster takeover decisions are
based on the complete status independent of the status of individual containers.
.PP
19. If a third HANA site is connected by system replication, that HANA is not
Expand Down
72 changes: 45 additions & 27 deletions man/SAPHanaSR_maintenance_examples.7
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
.\" Version: 1.001
.\"
.TH SAPHanaSR_maintenance_examples 7 "12 Jun 2023" "" "SAPHanaSR"
.TH SAPHanaSR_maintenance_examples 7 "31 Oct 2023" "" "SAPHanaSR"
.\"
.SH NAME
SAPHanaSR_maintenance_examples \- maintenance examples for SAPHanaController.
Expand Down Expand Up @@ -41,12 +41,11 @@ REQUIREMENTS below.
This might be convenient when performing administrative actions or cluster tests. It does not replace the afore mentioned checks. See also cs_show_saphanasr_status(8).
.PP
.RS 4
# watch -n 9 "(crm_mon -1r;SAPHanaSR-showAttr;cs_clusterstate -i)|egrep -v'(^$|configured|###)'"
# watch -n9 "crm_mon -1r --include=none,nodes,resources,failures;echo;SAPHanaSR-showAttr;cs_clusterstate -i|grep -v '#'"
.RE
.PP
\fB*\fR Initiate an administrative takeover of the HANA primary from one node to the other by using the Linux cluster.


This procedure does not work for scale-out. On scale-up, it will stop the HANA primary.
This might take a while. If you want to avoid waiting for the stopped primary,
use the below procedure which suspends the primary.
Expand Down Expand Up @@ -90,9 +89,10 @@ The status of HANA databases, system replication and Linux cluster has to be
checked.
The SAP HANA resources are set into maintenance, an sr_takeover is performed,
the old primary is registered as new secondary.
Therefore the correct secondary site name has to be used, see later example.
Therefor the correct secondary site name has to be used, see later example.
Finally the SAP HANA resources are given back to the Linux cluster.
See also section REQUIREMENTS below and later example on determining the correct site name.
See also section REQUIREMENTS below and later example on determining the correct
site name.
.PP
.RS 2
1. On either node
Expand Down Expand Up @@ -192,9 +192,10 @@ The status of HANA databases, system replication and Linux cluster has to be
checked.
The SAP HANA resources are set into maintenance, an sr_takeover is performed
with suspending the primary, the old primary is registered as new secondary.
Therefore the correct secondary site name has to be used.
Therefor the correct secondary site name has to be used.
Finally the SAP HANA resources are given back to the Linux cluster.
See also section REQUIREMENTS below and later example on determining the correct site name.
See also section REQUIREMENTS below and later example on determining the correct
site name.
.PP
.RS 2
1. Check status of Linux cluster and HANA, show current site names.
Expand Down Expand Up @@ -274,48 +275,60 @@ This is an advanced task.
.\"
\fB*\fR Overview on maintenance procedure for Linux, HANA remains running, on pacemaker-2.0.

It is necessary to wait for each step to complete and to check the result. It
also is necessary to test and document the whole procedure before applying in production.
See also section REQUIREMENTS below and example on checking status of HANA and cluster above.
.\" TODO details
.PP
.RS 2
1. Check status of Linux cluster and HANA, see above.
.br
2. Set the Linux cluster into maintenance mode, on either node.
2. Set HANA multistate resource into maintenance mode.
.RE
.RS 4
# crm maintenance on
# crm resource maintenance mst_... on
.RE
.RS 2
3. Stop Linux Cluster on all nodes. Make sure to do that on all nodes.
3. Set the Linux cluster into maintenance mode, on either node.
.RE
.RS 3
# crm cluster stop
.RS 4
# crm maintenance on
.RE
.RS 2
4. Perform Linux maintenance.
.br
5. Start Linux cluster on all nodes. Make sure to do that on all nodes.
4. Stop Linux Cluster on all nodes. Make sure to do that on all nodes.
.RE
.RS 4
# crm cluster start
# crm cluster run "crm cluster stop"
.RE
.RS 2
6. Let Linux cluster detect status of HANA resource, on either node.
.PP
5. Perform Linux maintenance.
.PP
6. Start Linux cluster on all nodes. Make sure to do that on all nodes.
.RE
.RS 4
# crm resource refresh cln_...
.br
# crm resource refresh mst_...
# crm cluster run "crm cluster start"
.RE
.RS 2
7. Set cluster ready for operations, on either node.
.RE
.RS 4
# crm maintenance off
.\" TODO delete property, optional?
.RE
.RS 2
8. Check status of Linux cluster and HANA, see above.
8. Let Linux cluster detect status of HANA multistate resource, on either node.
.RE
.RS 4
# crm resource refresh mst_...
.RE
.RS 2
9. Set HANA multistate resource ready for operations, on either node.
.RE
.RS 4
# crm maintenance mst_... off
.RE
.RS 2
10. Check status of Linux cluster and HANA, see above.
.RE
.PP
\fB*\fR Overview on simple procedure for stopping and temporarily disabling the Linux cluster,
Expand Down Expand Up @@ -501,12 +514,17 @@ landscape status, and the HANA SR status.
.PP
\fB*\fR Maintenance attributes for cluster, nodes and resources must not be mixed.
.PP
\fB*\fR The Linux cluster needs to be up and running to allow HA/DR provider events being written into CIB attributes.
The current HANA SR status might differ from CIB srHook attribute after Linux cluster maintenance.
\fB*\fR The Linux cluster needs to be up and running to allow HA/DR provider events
being written into CIB attributes. The current HANA SR status might differ from CIB
srHook attribute after Linux cluster maintenance.
.PP
\fB*\fR Manually activating an HANA primary, like start of HANA primary or takeover
outside the cluster creates risk of a duplicate-primary situation. The user is
responsible for data integrity, particularly when activating an HANA primary. See
also susTkOver.py(7).
.PP
\fB*\fR Manually activating an HANA primary, like start of HANA primary or takeover outside
the cluster creates risk of a duplicate-primary situation. The user is responsible for data
integrity, particularly when activating an HANA primary.
\fB*\fR HANA site names are discovered automatically when the RAs are activated the
very first time. That exact site names have to be used later for all manual tasks.
.PP
.\"
.SH BUGS
Expand Down
7 changes: 0 additions & 7 deletions man/ocf_suse_SAPHana.7
Original file line number Diff line number Diff line change
Expand Up @@ -443,13 +443,6 @@ SAPHana parameter PREFER_SITE_TAKEOVER.
.\" TODO: crm_mon --failcounts
.RE
.PP
* Check for working NTP service on SLE-HA 12:
.PP
.RS 2
# ntpq -p
.\" TODO: ntp output
.RE
.PP
* Check for working NTP service on SLE-HA 15:
.PP
.RS 2
Expand Down
2 changes: 1 addition & 1 deletion man/ocf_suse_SAPHanaFilesystem.7
Original file line number Diff line number Diff line change
Expand Up @@ -362,7 +362,7 @@ Note: The ACCEPT needs to be replaced by appropriate action.
.PP
4. On secondary, unblock /hana/shared/SLE/ filesystem.
.RS 2
# iptables -D OUTPUT -p tcp -m multiport --ports 2049 -j ACCEPT
# iptables -D OUTPUT -p tcp -m multiport --ports 2049 -j DROP
.RE
.PP
5. Check HANA and Linux cluster for clean idle state.
Expand Down
Loading

0 comments on commit 921a656

Please sign in to comment.