Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Angi scale out 1.2.3 #214

Merged
merged 30 commits into from
Nov 6, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
30 commits
Select commit Hold shift + click to select a range
6465a31
saphana-controller-lib: default stop timeout changed from 3600 to 600…
fmherschel Oct 18, 2023
c58228d
saphana-controller-lib: sharpened comment
fmherschel Oct 19, 2023
868367b
ocf_suse_SAPHana.7 SAPHanaSR.7 SAPHanaSR-ScaleOut.7: requirements
lpinne Oct 27, 2023
a398077
Merge branch 'angi-ScaleOut' of github.com:fmherschel/SAPHanaSR into …
lpinne Oct 27, 2023
9fb48de
SAPHanaSR-ScaleOut_basic_cluster.7: examples
lpinne Oct 27, 2023
1740f6b
angi: tools saphana_sr_tools.py - use execution-time as 'timestamp'; …
fmherschel Oct 30, 2023
0489740
angi: tools: SAPHanaSR-showAttr - provide cib-time to method print_di…
fmherschel Oct 30, 2023
aace12a
removed outdated comment
fmherschel Oct 30, 2023
4620dbb
angi: tools saphana_sr_tools.py - hardened, if gets no cib-file not c…
fmherschel Oct 30, 2023
5dd3c64
angi: tools: SAPHanaSR-replay-archive - provide cib-time to method pr…
fmherschel Oct 30, 2023
683fcea
SAPHanaSR-showAttr: remove unneded comments (outdated) and empty lines
fmherschel Oct 30, 2023
3460d14
tools: removing training blanksx
fmherschel Oct 30, 2023
96d9edc
angi: update minor version
fmherschel Oct 30, 2023
79f2f54
ocf_suse_SAPHanaTopology.7: monitor interval
lpinne Oct 31, 2023
0c999da
Merge branch 'angi-ScaleOut' of github.com:fmherschel/SAPHanaSR into …
lpinne Oct 31, 2023
4b6a312
ocf_suse_SAPHanaTopology.7: example monitor interval
lpinne Oct 31, 2023
60d4892
SAPHanaSR.7: requirements
lpinne Nov 1, 2023
3ce375c
SAPHanaSR-ScaleOut.7: requirements
lpinne Nov 1, 2023
916ff96
SAPHanaSR.7: requirements
lpinne Nov 1, 2023
b213259
SAPHanaSR-ScaleOut.7: requirements, typos
lpinne Nov 1, 2023
3132476
vc_SAPHanaSR-showAttr.adoc: deleted
lpinne Nov 1, 2023
34d90c5
SAPHanaSR_maintenance_examples.7: typos
lpinne Nov 1, 2023
22a7098
SAPHanaSR_maintenance_examples.7: requirements, typos
lpinne Nov 1, 2023
d33d61a
SAPHanaSR_maintenance_examples.7: bug 1216671, misc. examples
lpinne Nov 1, 2023
85436ec
SAPHanaSR_maintenance_examples.7: bug 1216671, misc. examples
lpinne Nov 1, 2023
3cec311
ocf_suse_SAPHanaTopology.7: examples, typos
lpinne Nov 1, 2023
790db22
/susHanaSR.py.7: typo
lpinne Nov 3, 2023
447e013
ocf_suse_SAPHanaFilesystem.7: examples
lpinne Nov 3, 2023
ba236cb
Merge branch 'main' into angi-ScaleOut
fmherschel Nov 6, 2023
318b731
Merge branch 'angi-ScaleOut' of github.com:fmherschel/SAPHanaSR into …
fmherschel Nov 6, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion SAPHanaSR-angi.spec
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,7 @@ License: GPL-2.0
Group: Productivity/Clustering/HA
AutoReqProv: on
Summary: Resource agents to control the HANA database in system replication setup
Version: 1.2.2
Version: 1.2.3
Release: 0
Url: https://www.suse.com/c/fail-safe-operation-of-sap-hana-suse-extends-its-high-availability-solution/

Expand Down
39 changes: 21 additions & 18 deletions man/SAPHanaSR-ScaleOut.7
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
.\" Version: 1.001
.\"
.TH SAPHanaSR-ScaleOut 7 "18 Sep 2023" "" "SAPHanaSR-angi"
.TH SAPHanaSR-ScaleOut 7 "31 Oct 2023" "" "SAPHanaSR-angi"
.\"
.SH NAME
SAPHanaSR-ScaleOut \- Tools for automating SAP HANA system replication in
Expand Down Expand Up @@ -140,7 +140,7 @@ to repair a local failure.
5. \fBRecovering from failure of master nameserver\fR
.br
If the master nameserver of an HANA database system fails, the HANA will start
the nameserver on another node. Therefore usually up to two nodes are
the nameserver on another node. Therefor usually up to two nodes are
configured as additional nameserver candidates. At least one of them should be
a standby node to optimize failover time. The Linux cluster will detect the
change and move the IP address to the new active master nameserver.
Expand Down Expand Up @@ -210,14 +210,14 @@ cluster tests.
.PP
\fB*\fR Be patient. For detecting the overall HANA status, the Linux cluster
needs a certain amount of time, depending on the HANA and the configured
intervalls and timeouts.
intervals and timeouts.
.PP
\fB*\fR Before doing anything, always check for the Linux cluster's idle status,
left-over migration constraints, and resource failures as well as the HANA
landscape status, and the HANA SR status.
.PP
\fB*\fR Manually activating an HANA primary creates risk of a dual-primary
situation. The user is responsible for data integrity.
situation. The user is responsible for data integrity. See also susTkOver.py(7).
.PP
.\"
.SH REQUIREMENTS
Expand All @@ -239,19 +239,21 @@ the Linux system. If users are resolved by remote service, local caching is
neccessary. Substitute user (su) to sidadm needs to work reliable and without
customized actions or messages. Supported shell is bash.
.PP
3. Strict time synchronization between the cluster nodes, e.g. NTP.
3. Strict time synchronization between the cluster nodes, e.g. NTP. All nodes of
the Linux cluster have configured the same timezone.
.PP
4. For scale-out there is no other SAP HANA system (like QA) on the nodes
which needs to be stopped during takeover. Both HANA database systems are
running memory-preload. Also MCOS is currently not supported for scale-out.
.PP
5. Only one system replication between the two SAP HANA database systems in
the Linux cluster. Maximum one system replication to an HANA database outside
the Linux cluster.
5. Only one system replication between the two SAP HANA databases in the Linux
cluster. Maximum one system replication to an HANA database outside the Linux
cluster.
.PP
6. The replication mode is either sync or syncmem. Replication mode async is
not supported. The operation modes delta_datashipping, logreplay and
logreplay_readaccess are supported.
6. The replication mode is either sync or syncmem for the controlled replication.
Replication mode async is not supported. The operation modes delta_datashipping,
logreplay and logreplay_readaccess are supported. The operation mode logreplay
is default.
.PP
7. Both SAP HANA database systems have the same SAP Identifier (SID) and
Instance Number (INO).
Expand All @@ -275,18 +277,19 @@ that one does not interfere with the Linux cluster. All three clusters
12. The RAs SAPHanaController and SAPHanaToplogy need to be installed on all
cluster nodes, even the majority maker.
.PP
13. Colocation constraints between the SAPHanaController or SAPHana RA and
13. Colocation constraints between the SAPHanaController RA and
other resources are allowed only if they do not affect the RA's scoring.
The location scoring finally depends on system replication status an must not
be over-ruled by additional constraints. Thus it is not allowed to define rules
forcing a SAPHanaController or SAPHana master to follow another resource.
forcing a SAPHanaController promoted instance to follow another resource.
.PP
14. The Linux cluster needs to be up and running to allow HA/DR provider events
being written into CIB attributes. The current HANA SR status might differ
from CIB srHook attribute after cluster maintenance.
.PP
15. Once an HANA system replication site is known to the Linux cluster, that
exact site name has to be used whenever the site is registered manually.
exact site name has to be used whenever the site is registered manually. At any
time only one site is configured as primary replication source.
.PP
16. In two-node HANA scale-out systems only one master nameserver candidate is
configured.
Expand All @@ -301,7 +304,7 @@ Tennant-specific takeover groups are not supported. Sharing standby nodes
across sites is not supported.
.PP
20. In MDC configurations the HANA database is treated as a single system
including all database containers. Therefore, cluster takeover decisions are
including all database containers. Therefor, cluster takeover decisions are
based on the complete status independent of the status of individual containers.
.PP
21. If a third HANA site is connected by system replication, that HANA is not
Expand All @@ -324,7 +327,7 @@ need read/write access and sufficient space in the Linux /tmp filesystem.
26. SAP HANA Native Storage Extension (NSE) is supported.
Important is that this feature does not change the HANA topology or interfaces.
In opposite to Native Storage Extension, the HANA Extension Nodes are changing
the topology and thus currently are not supported.
the topology and thus currently are not supported.
Please refer to SAP documentation for details.
.PP
.\"
Expand Down Expand Up @@ -374,9 +377,9 @@ A.Briel, F.Herschel, L.Pinne.
.PP
.\"
.SH COPYRIGHT
(c) 2015-2018 SUSE Linux GmbH, Germany.
(c) 2015-2017 SUSE Linux GmbH, Germany.
.br
(c) 2019-2023 SUSE LLC
(c) 2018-2023 SUSE LLC
.br
The package SAPHanaSR-angi comes with ABSOLUTELY NO WARRANTY.
.br
Expand Down
4 changes: 2 additions & 2 deletions man/SAPHanaSR-ScaleOut_basic_cluster.7
Original file line number Diff line number Diff line change
Expand Up @@ -158,7 +158,7 @@ op_defaults $id="op_defaults-options" \\
timeout="300s"
.RE
.PP
The following example is for SLE-HA 12 SP4 and 15 SP1 with disk-based SBD:
The following example is for SLE-HA 15 SP1 with disk-based SBD:
.PP
.RS 2
.br
Expand Down Expand Up @@ -196,7 +196,7 @@ op_defaults op-options: \\
record-pending=true
.RE
.PP
.\" TODO example for SLE-HA 15 SP1 and 12 SP5 with disk-based and diskless SBD.
.\" TODO example for SLE-HA 15 SP5 with disk-based and diskless SBD.

\fB* crm SBD stonith configuration\fR

Expand Down
19 changes: 11 additions & 8 deletions man/SAPHanaSR.7
Original file line number Diff line number Diff line change
Expand Up @@ -196,7 +196,8 @@ Linux system. If users are resolved by remote service, local caching is
neccessary. Substitute user (su) to sidadm needs to work reliable and without
customized actions or messages. Supported shell is bash.
.PP
3. Strict time synchronization between the cluster nodes, e.g. NTP.
3. Strict time synchronization between the cluster nodes, e.g. NTP. All nodes of
the Linux cluster have configured the same timezone.
.PP
4. For scale-up the following SAP HANA SR scenarios are possible with the
SAPHanaSR-angi package:
Expand All @@ -215,13 +216,14 @@ SAPHanaSR-angi package:
.br
Note: For MCOS, there must be no constraints between HANA SR pairs.
.PP
5. Only one system replication between the two SAP HANA database in the Linux cluster.
Maximum one system replication to an HANA database outside the Linux cluster.
See also item 12 below.
5. Only one system replication between the two SAP HANA databases in the Linux
cluster. Maximum one system replication to an HANA database outside the Linux
cluster. See also item 12 below.
.PP
6. The replication mode is either sync or syncmem for the controlled replication.
Replication mode async is not supported. The operation modes delta_datashipping,
logreplay and logreplay_readaccess are supported.
logreplay and logreplay_readaccess are supported. The operation mode logreplay
is default.
.PP
7. Both SAP HANA database systems have the same SAP Identifier (SID) and
Instance Number (INO).
Expand All @@ -237,7 +239,7 @@ However, all nodes in one Linux cluster have to use the same style.
9. Automated start of SAP HANA database systems during system boot must be switched
off.
.PP
10. The RA's monitoring operations have to be active.
10. The RAs' monitoring operations have to be active.
.PP
11. Using HA/DR provider hook for srConnectionChanged() by enabling susHanaSR.py
or susHanaSrMultiTarget.py is mandatory.
Expand All @@ -256,15 +258,16 @@ being written into CIB attributes. The current HANA SR status might differ
from CIB srHook attribute after cluster maintenance.
.PP
15. Once an HANA system replication site is known to the Linux cluster, that
exact site name has to be used whenever the site is registered manually.
exact site name has to be used whenever the site is registered manually. At any
time only one site is configured as primary replication source.
.PP
16. Reliable access to the /hana/shared/ filesystem is crucial for HANA and the
Linux cluster.
.PP
17. HANA feature Secondary Time Travel is not supported.
.PP
18. In MDC configurations the HANA database is treated as a single system
including all database containers. Therefore, cluster takeover decisions are
including all database containers. Therefor, cluster takeover decisions are
based on the complete status independent of the status of individual containers.
.PP
19. If a third HANA site is connected by system replication, that HANA is not
Expand Down
72 changes: 45 additions & 27 deletions man/SAPHanaSR_maintenance_examples.7
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
.\" Version: 1.001
.\"
.TH SAPHanaSR_maintenance_examples 7 "12 Jun 2023" "" "SAPHanaSR"
.TH SAPHanaSR_maintenance_examples 7 "31 Oct 2023" "" "SAPHanaSR"
.\"
.SH NAME
SAPHanaSR_maintenance_examples \- maintenance examples for SAPHanaController.
Expand Down Expand Up @@ -41,12 +41,11 @@ REQUIREMENTS below.
This might be convenient when performing administrative actions or cluster tests. It does not replace the afore mentioned checks. See also cs_show_saphanasr_status(8).
.PP
.RS 4
# watch -n 9 "(crm_mon -1r;SAPHanaSR-showAttr;cs_clusterstate -i)|egrep -v'(^$|configured|###)'"
# watch -n9 "crm_mon -1r --include=none,nodes,resources,failures;echo;SAPHanaSR-showAttr;cs_clusterstate -i|grep -v '#'"
.RE
.PP
\fB*\fR Initiate an administrative takeover of the HANA primary from one node to the other by using the Linux cluster.


This procedure does not work for scale-out. On scale-up, it will stop the HANA primary.
This might take a while. If you want to avoid waiting for the stopped primary,
use the below procedure which suspends the primary.
Expand Down Expand Up @@ -90,9 +89,10 @@ The status of HANA databases, system replication and Linux cluster has to be
checked.
The SAP HANA resources are set into maintenance, an sr_takeover is performed,
the old primary is registered as new secondary.
Therefore the correct secondary site name has to be used, see later example.
Therefor the correct secondary site name has to be used, see later example.
Finally the SAP HANA resources are given back to the Linux cluster.
See also section REQUIREMENTS below and later example on determining the correct site name.
See also section REQUIREMENTS below and later example on determining the correct
site name.
.PP
.RS 2
1. On either node
Expand Down Expand Up @@ -192,9 +192,10 @@ The status of HANA databases, system replication and Linux cluster has to be
checked.
The SAP HANA resources are set into maintenance, an sr_takeover is performed
with suspending the primary, the old primary is registered as new secondary.
Therefore the correct secondary site name has to be used.
Therefor the correct secondary site name has to be used.
Finally the SAP HANA resources are given back to the Linux cluster.
See also section REQUIREMENTS below and later example on determining the correct site name.
See also section REQUIREMENTS below and later example on determining the correct
site name.
.PP
.RS 2
1. Check status of Linux cluster and HANA, show current site names.
Expand Down Expand Up @@ -274,48 +275,60 @@ This is an advanced task.
.\"
\fB*\fR Overview on maintenance procedure for Linux, HANA remains running, on pacemaker-2.0.

It is necessary to wait for each step to complete and to check the result. It
also is necessary to test and document the whole procedure before applying in production.
See also section REQUIREMENTS below and example on checking status of HANA and cluster above.
.\" TODO details
.PP
.RS 2
1. Check status of Linux cluster and HANA, see above.
.br
2. Set the Linux cluster into maintenance mode, on either node.
2. Set HANA multistate resource into maintenance mode.
.RE
.RS 4
# crm maintenance on
# crm resource maintenance mst_... on
.RE
.RS 2
3. Stop Linux Cluster on all nodes. Make sure to do that on all nodes.
3. Set the Linux cluster into maintenance mode, on either node.
.RE
.RS 3
# crm cluster stop
.RS 4
# crm maintenance on
.RE
.RS 2
4. Perform Linux maintenance.
.br
5. Start Linux cluster on all nodes. Make sure to do that on all nodes.
4. Stop Linux Cluster on all nodes. Make sure to do that on all nodes.
.RE
.RS 4
# crm cluster start
# crm cluster run "crm cluster stop"
.RE
.RS 2
6. Let Linux cluster detect status of HANA resource, on either node.
.PP
5. Perform Linux maintenance.
.PP
6. Start Linux cluster on all nodes. Make sure to do that on all nodes.
.RE
.RS 4
# crm resource refresh cln_...
.br
# crm resource refresh mst_...
# crm cluster run "crm cluster start"
.RE
.RS 2
7. Set cluster ready for operations, on either node.
.RE
.RS 4
# crm maintenance off
.\" TODO delete property, optional?
.RE
.RS 2
8. Check status of Linux cluster and HANA, see above.
8. Let Linux cluster detect status of HANA multistate resource, on either node.
.RE
.RS 4
# crm resource refresh mst_...
.RE
.RS 2
9. Set HANA multistate resource ready for operations, on either node.
.RE
.RS 4
# crm maintenance mst_... off
.RE
.RS 2
10. Check status of Linux cluster and HANA, see above.
.RE
.PP
\fB*\fR Overview on simple procedure for stopping and temporarily disabling the Linux cluster,
Expand Down Expand Up @@ -501,12 +514,17 @@ landscape status, and the HANA SR status.
.PP
\fB*\fR Maintenance attributes for cluster, nodes and resources must not be mixed.
.PP
\fB*\fR The Linux cluster needs to be up and running to allow HA/DR provider events being written into CIB attributes.
The current HANA SR status might differ from CIB srHook attribute after Linux cluster maintenance.
\fB*\fR The Linux cluster needs to be up and running to allow HA/DR provider events
being written into CIB attributes. The current HANA SR status might differ from CIB
srHook attribute after Linux cluster maintenance.
.PP
\fB*\fR Manually activating an HANA primary, like start of HANA primary or takeover
outside the cluster creates risk of a duplicate-primary situation. The user is
responsible for data integrity, particularly when activating an HANA primary. See
also susTkOver.py(7).
.PP
\fB*\fR Manually activating an HANA primary, like start of HANA primary or takeover outside
the cluster creates risk of a duplicate-primary situation. The user is responsible for data
integrity, particularly when activating an HANA primary.
\fB*\fR HANA site names are discovered automatically when the RAs are activated the
very first time. That exact site names have to be used later for all manual tasks.
.PP
.\"
.SH BUGS
Expand Down
7 changes: 0 additions & 7 deletions man/ocf_suse_SAPHana.7
Original file line number Diff line number Diff line change
Expand Up @@ -443,13 +443,6 @@ SAPHana parameter PREFER_SITE_TAKEOVER.
.\" TODO: crm_mon --failcounts
.RE
.PP
* Check for working NTP service on SLE-HA 12:
.PP
.RS 2
# ntpq -p
.\" TODO: ntp output
.RE
.PP
* Check for working NTP service on SLE-HA 15:
.PP
.RS 2
Expand Down
2 changes: 1 addition & 1 deletion man/ocf_suse_SAPHanaFilesystem.7
Original file line number Diff line number Diff line change
Expand Up @@ -362,7 +362,7 @@ Note: The ACCEPT needs to be replaced by appropriate action.
.PP
4. On secondary, unblock /hana/shared/SLE/ filesystem.
.RS 2
# iptables -D OUTPUT -p tcp -m multiport --ports 2049 -j ACCEPT
# iptables -D OUTPUT -p tcp -m multiport --ports 2049 -j DROP
.RE
.PP
5. Check HANA and Linux cluster for clean idle state.
Expand Down
Loading
Loading