Merge pull request #214 from fmherschel/angi-ScaleOut

Angi scale out 1.2.3
SUSE · Nov 6, 2023 · 921a656 · 921a656
2 parents 9744cc2 + 318b731
commit 921a656
Show file tree

Hide file tree

Showing 14 changed files with 153 additions and 561 deletions.
diff --git a/SAPHanaSR-angi.spec b/SAPHanaSR-angi.spec
@@ -21,7 +21,7 @@ License:        GPL-2.0
 Group:          Productivity/Clustering/HA
 AutoReqProv:    on
 Summary:        Resource agents to control the HANA database in system replication setup
-Version:        1.2.2
+Version:        1.2.3
 Release:        0
 Url:            https://www.suse.com/c/fail-safe-operation-of-sap-hana-suse-extends-its-high-availability-solution/
 

diff --git a/man/SAPHanaSR-ScaleOut.7 b/man/SAPHanaSR-ScaleOut.7
@@ -1,6 +1,6 @@
 .\" Version: 1.001 
 .\"
-.TH SAPHanaSR-ScaleOut 7 "18 Sep 2023" "" "SAPHanaSR-angi"
+.TH SAPHanaSR-ScaleOut 7 "31 Oct 2023" "" "SAPHanaSR-angi"
 .\"
 .SH NAME
 SAPHanaSR-ScaleOut \- Tools for automating SAP HANA system replication in
@@ -140,7 +140,7 @@ to repair a local failure.
 5. \fBRecovering from failure of master nameserver\fR
 .br
 If the master nameserver of an HANA database system fails, the HANA will start
-the nameserver on another node. Therefore usually up to two nodes are
+the nameserver on another node. Therefor usually up to two nodes are
 configured as additional nameserver candidates. At least one of them should be
 a standby node to optimize failover time. The Linux cluster will detect the
 change and move the IP address to the new active master nameserver.
@@ -210,14 +210,14 @@ cluster tests.
 .PP
 \fB*\fR Be patient. For detecting the overall HANA status, the Linux cluster
 needs a certain amount of time, depending on the HANA and the configured
-intervalls and timeouts.
+intervals and timeouts.
 .PP
 \fB*\fR Before doing anything, always check for the Linux cluster's idle status,
 left-over migration constraints, and resource failures as well as the HANA
 landscape status, and the HANA SR status.
 .PP
 \fB*\fR Manually activating an HANA primary creates risk of a dual-primary
-situation. The user is responsible for data integrity.
+situation. The user is responsible for data integrity. See also susTkOver.py(7).
 .PP
 .\"
 .SH REQUIREMENTS
@@ -239,19 +239,21 @@ the Linux system. If users are resolved by remote service, local caching is
 neccessary. Substitute user (su) to sidadm needs to work reliable and without
 customized actions or messages. Supported shell is bash.
 .PP
-3. Strict time synchronization between the cluster nodes, e.g. NTP.
+3. Strict time synchronization between the cluster nodes, e.g. NTP. All nodes of
+the Linux cluster have configured the same timezone.
 .PP
 4. For scale-out there is no other SAP HANA system (like QA) on the nodes
 which needs to be stopped during takeover. Both HANA database systems are
 running memory-preload. Also MCOS is currently not supported for scale-out.
 .PP
-5. Only one system replication between the two SAP HANA database systems in
-the Linux cluster. Maximum one system replication to an HANA database outside
-the Linux cluster.
+5. Only one system replication between the two SAP HANA databases in the Linux
+cluster. Maximum one system replication to an HANA database outside the Linux
+cluster.
 .PP
-6. The replication mode is either sync or syncmem. Replication mode async is
-not supported. The operation modes delta_datashipping, logreplay and
-logreplay_readaccess are supported.
+6. The replication mode is either sync or syncmem for the controlled replication.
+Replication mode async is not supported. The operation modes delta_datashipping,
+logreplay and logreplay_readaccess are supported. The operation mode logreplay
+is default.
 .PP
 7. Both SAP HANA database systems have the same SAP Identifier (SID) and
 Instance Number (INO).
@@ -275,18 +277,19 @@ that one does not interfere with the Linux cluster. All three clusters
 12. The RAs SAPHanaController and SAPHanaToplogy need to be installed on all
 cluster nodes, even the majority maker.
 .PP
-13. Colocation constraints between the SAPHanaController or SAPHana RA and
+13. Colocation constraints between the SAPHanaController RA and
 other resources are allowed only if they do not affect the RA's scoring.
 The location scoring finally depends on system replication status an must not
 be over-ruled by additional constraints. Thus it is not allowed to define rules
-forcing a SAPHanaController or SAPHana master to follow another resource.
+forcing a SAPHanaController promoted instance to follow another resource.
 .PP
 14. The Linux cluster needs to be up and running to allow HA/DR provider events
 being written into CIB attributes. The current HANA SR status might differ
 from CIB srHook attribute after cluster maintenance.
 .PP
 15. Once an HANA system replication site is known to the Linux cluster, that
-exact site name has to be used whenever the site is registered manually.
+exact site name has to be used whenever the site is registered manually. At any
+time only one site is configured as primary replication source.
 .PP
 16. In two-node HANA scale-out systems only one master nameserver candidate is
 configured.
@@ -301,7 +304,7 @@ Tennant-specific takeover groups are not supported. Sharing standby nodes
 across sites is not supported.
 .PP
 20. In MDC configurations the HANA database is treated as a single system
-including all database containers. Therefore, cluster takeover decisions are
+including all database containers. Therefor, cluster takeover decisions are
 based on the complete status independent of the status of individual containers.
 .PP
 21. If a third HANA site is connected by system replication, that HANA is not
@@ -324,7 +327,7 @@ need read/write access and sufficient space in the Linux /tmp filesystem.
 26. SAP HANA Native Storage Extension (NSE) is supported.
 Important is that this feature does not change the HANA topology or interfaces.
 In opposite to Native Storage Extension, the HANA Extension Nodes are changing
-the topology and thus currently are not supported.           
+the topology and thus currently are not supported. 
 Please refer to SAP documentation for details.
 .PP 
 .\"
@@ -374,9 +377,9 @@ A.Briel, F.Herschel, L.Pinne.
 .PP
 .\"
 .SH COPYRIGHT
-(c) 2015-2018 SUSE Linux GmbH, Germany.
+(c) 2015-2017 SUSE Linux GmbH, Germany.
 .br
-(c) 2019-2023 SUSE LLC
+(c) 2018-2023 SUSE LLC
 .br
 The package SAPHanaSR-angi comes with ABSOLUTELY NO WARRANTY.
 .br

diff --git a/man/SAPHanaSR-ScaleOut_basic_cluster.7 b/man/SAPHanaSR-ScaleOut_basic_cluster.7
@@ -158,7 +158,7 @@ op_defaults $id="op_defaults-options" \\
  timeout="300s"
 .RE
 .PP
-The following example is for SLE-HA 12 SP4 and 15 SP1 with disk-based SBD:
+The following example is for SLE-HA 15 SP1 with disk-based SBD:
 .PP
 .RS 2
 .br
@@ -196,7 +196,7 @@ op_defaults op-options: \\
  record-pending=true
 .RE
 .PP
-.\" TODO example for SLE-HA 15 SP1 and 12 SP5 with disk-based and diskless SBD.
+.\" TODO example for SLE-HA 15 SP5 with disk-based and diskless SBD.
 
 \fB* crm SBD stonith configuration\fR
 

diff --git a/man/SAPHanaSR.7 b/man/SAPHanaSR.7
@@ -196,7 +196,8 @@ Linux system. If users are resolved by remote service, local caching is
 neccessary. Substitute user (su) to sidadm needs to work reliable and without
 customized actions or messages. Supported shell is bash.
 .PP
-3. Strict time synchronization between the cluster nodes, e.g. NTP.
+3. Strict time synchronization between the cluster nodes, e.g. NTP. All nodes of
+the Linux cluster have configured the same timezone.
 .PP
 4. For scale-up the following SAP HANA SR scenarios are possible with the
 SAPHanaSR-angi package:
@@ -215,13 +216,14 @@ SAPHanaSR-angi package:
 .br
 Note: For MCOS, there must be no constraints between HANA SR pairs.
 .PP
-5. Only one system replication between the two SAP HANA database in the Linux cluster.
-Maximum one system replication to an HANA database outside the Linux cluster.
-See also item 12 below.
+5. Only one system replication between the two SAP HANA databases in the Linux
+cluster. Maximum one system replication to an HANA database outside the Linux
+cluster. See also item 12 below.
 .PP
 6. The replication mode is either sync or syncmem for the controlled replication.
 Replication mode async is not supported. The operation modes delta_datashipping,
-logreplay and logreplay_readaccess are supported.
+logreplay and logreplay_readaccess are supported. The operation mode logreplay
+is default.
 .PP
 7. Both SAP HANA database systems have the same SAP Identifier (SID) and
 Instance Number (INO).
@@ -237,7 +239,7 @@ However, all nodes in one Linux cluster have to use the same style.
 9. Automated start of SAP HANA database systems during system boot must be switched
 off.
 .PP
-10. The RA's monitoring operations have to be active.
+10. The RAs' monitoring operations have to be active.
 .PP
 11. Using HA/DR provider hook for srConnectionChanged() by enabling susHanaSR.py
 or susHanaSrMultiTarget.py is mandatory.
@@ -256,15 +258,16 @@ being written into CIB attributes. The current HANA SR status might differ
 from CIB srHook attribute after cluster maintenance.
 .PP
 15. Once an HANA system replication site is known to the Linux cluster, that
-exact site name has to be used whenever the site is registered manually.
+exact site name has to be used whenever the site is registered manually. At any
+time only one site is configured as primary replication source.
 .PP
 16. Reliable access to the /hana/shared/ filesystem is crucial for HANA and the
 Linux cluster.
 .PP
 17. HANA feature Secondary Time Travel is not supported.
 .PP
 18. In MDC configurations the HANA database is treated as a single system
-including all database containers. Therefore, cluster takeover decisions are
+including all database containers. Therefor, cluster takeover decisions are
 based on the complete status independent of the status of individual containers.
 .PP
 19. If a third HANA site is connected by system replication, that HANA is not

diff --git a/man/SAPHanaSR_maintenance_examples.7 b/man/SAPHanaSR_maintenance_examples.7
@@ -1,6 +1,6 @@
 .\" Version: 1.001 
 .\"
-.TH SAPHanaSR_maintenance_examples 7 "12 Jun 2023" "" "SAPHanaSR"
+.TH SAPHanaSR_maintenance_examples 7 "31 Oct 2023" "" "SAPHanaSR"
 .\"
 .SH NAME
 SAPHanaSR_maintenance_examples \- maintenance examples for SAPHanaController.
@@ -41,12 +41,11 @@ REQUIREMENTS below.
 This might be convenient when performing administrative actions or cluster tests. It does not replace the afore mentioned checks. See also cs_show_saphanasr_status(8).
 .PP
 .RS 4
-# watch -n 9 "(crm_mon -1r;SAPHanaSR-showAttr;cs_clusterstate -i)|egrep -v'(^$|configured|###)'"
+# watch -n9 "crm_mon -1r --include=none,nodes,resources,failures;echo;SAPHanaSR-showAttr;cs_clusterstate -i|grep -v '#'"
 .RE
 .PP
 \fB*\fR Initiate an administrative takeover of the HANA primary from one node to the other by using the Linux cluster.
 
-
 This procedure does not work for scale-out. On scale-up, it will stop the HANA primary.
 This might take a while. If you want to avoid waiting for the stopped primary,
 use the below procedure which suspends the primary.
@@ -90,9 +89,10 @@ The status of HANA databases, system replication and Linux cluster has to be
 checked.
 The SAP HANA resources are set into maintenance, an sr_takeover is performed,
 the old primary is registered as new secondary.
-Therefore the correct secondary site name has to be used, see later example.
+Therefor the correct secondary site name has to be used, see later example.
 Finally the SAP HANA resources are given back to the Linux cluster.
-See also section REQUIREMENTS below and later example on determining the correct site name.
+See also section REQUIREMENTS below and later example on determining the correct
+site name.
 .PP
 .RS 2
 1. On either node
@@ -192,9 +192,10 @@ The status of HANA databases, system replication and Linux cluster has to be
 checked.
 The SAP HANA resources are set into maintenance, an sr_takeover is performed
 with suspending the primary, the old primary is registered as new secondary.
-Therefore the correct secondary site name has to be used.
+Therefor the correct secondary site name has to be used.
 Finally the SAP HANA resources are given back to the Linux cluster.
-See also section REQUIREMENTS below and later example on determining the correct site name.
+See also section REQUIREMENTS below and later example on determining the correct
+site name.
 .PP
 .RS 2
 1. Check status of Linux cluster and HANA, show current site names.
@@ -274,48 +275,60 @@ This is an advanced task.
 .\"
 \fB*\fR Overview on maintenance procedure for Linux, HANA remains running, on pacemaker-2.0.
 
+It is necessary to wait for each step to complete and to check the result. It
+also is necessary to test and document the whole procedure before applying in production.
 See also section REQUIREMENTS below and example on checking status of HANA and cluster above.
 .\" TODO details
 .PP
 .RS 2
 1. Check status of Linux cluster and HANA, see above.
 .br
-2. Set the Linux cluster into maintenance mode, on either node.
+2. Set HANA multistate resource into maintenance mode.
 .RE
 .RS 4
-# crm maintenance on
+# crm resource maintenance mst_... on
 .RE
 .RS 2
-3. Stop Linux Cluster on all nodes. Make sure to do that on all nodes.
+3. Set the Linux cluster into maintenance mode, on either node.
 .RE
-.RS 3
-# crm cluster stop
+.RS 4
+# crm maintenance on
 .RE
 .RS 2
-4. Perform Linux maintenance.
-.br
-5. Start Linux cluster on all nodes. Make sure to do that on all nodes.
+4. Stop Linux Cluster on all nodes. Make sure to do that on all nodes.
 .RE
 .RS 4
-# crm cluster start
+# crm cluster run "crm cluster stop"
 .RE
 .RS 2
-6. Let Linux cluster detect status of HANA resource, on either node.
+.PP
+5. Perform Linux maintenance.
+.PP
+6. Start Linux cluster on all nodes. Make sure to do that on all nodes.
 .RE
 .RS 4
-# crm resource refresh cln_...
-.br
-# crm resource refresh mst_...
+# crm cluster run "crm cluster start"
 .RE
 .RS 2
 7. Set cluster ready for operations, on either node.
 .RE
 .RS 4
 # crm maintenance off
-.\" TODO delete property, optional?
 .RE
 .RS 2
-8. Check status of Linux cluster and HANA, see above.
+8. Let Linux cluster detect status of HANA multistate resource, on either node.
+.RE
+.RS 4
+# crm resource refresh mst_...
+.RE
+.RS 2
+9. Set HANA multistate resource ready for operations, on either node.
+.RE
+.RS 4
+# crm maintenance mst_... off
+.RE
+.RS 2
+10. Check status of Linux cluster and HANA, see above.
 .RE
 .PP
 \fB*\fR Overview on simple procedure for stopping and temporarily disabling the Linux cluster,
@@ -501,12 +514,17 @@ landscape status, and the HANA SR status.
 .PP
 \fB*\fR Maintenance attributes for cluster, nodes and resources must not be mixed.
 .PP
-\fB*\fR The Linux cluster needs to be up and running to allow HA/DR provider events being written into CIB attributes.
-The current HANA SR status might differ from CIB srHook attribute after Linux cluster maintenance.
+\fB*\fR The Linux cluster needs to be up and running to allow HA/DR provider events
+being written into CIB attributes. The current HANA SR status might differ from CIB
+srHook attribute after Linux cluster maintenance.
+.PP
+\fB*\fR Manually activating an HANA primary, like start of HANA primary or takeover
+outside the cluster creates risk of a duplicate-primary situation. The user is
+responsible for data integrity, particularly when activating an HANA primary. See
+also susTkOver.py(7).
 .PP
-\fB*\fR Manually activating an HANA primary, like start of HANA primary or takeover outside
-the cluster creates risk of a duplicate-primary situation. The user is responsible for data
-integrity, particularly when activating an HANA primary.
+\fB*\fR HANA site names are discovered automatically when the RAs are activated the
+very first time. That exact site names have to be used later for all manual tasks. 
 .PP
 .\"
 .SH BUGS

diff --git a/man/ocf_suse_SAPHana.7 b/man/ocf_suse_SAPHana.7
@@ -443,13 +443,6 @@ SAPHana parameter PREFER_SITE_TAKEOVER.
 .\" TODO: crm_mon --failcounts 
 .RE
 .PP
-* Check for working NTP service on SLE-HA 12:
-.PP
-.RS 2
-# ntpq -p
-.\" TODO: ntp output
-.RE
-.PP
 * Check for working NTP service on SLE-HA 15:
 .PP
 .RS 2

diff --git a/man/ocf_suse_SAPHanaFilesystem.7 b/man/ocf_suse_SAPHanaFilesystem.7
@@ -362,7 +362,7 @@ Note: The ACCEPT needs to be replaced by appropriate action.
 .PP
 4. On secondary, unblock /hana/shared/SLE/ filesystem.
 .RS 2
-# iptables -D OUTPUT -p tcp -m multiport --ports 2049 -j ACCEPT
+# iptables -D OUTPUT -p tcp -m multiport --ports 2049 -j DROP
 .RE
 .PP
 5. Check HANA and Linux cluster for clean idle state.