Skip to content

sap_ha_pacemaker_cluster: Add support for clustered WebDisp systems #929

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 79 commits into from

Conversation

rob0d
Copy link
Contributor

@rob0d rob0d commented Jan 10, 2025

Hi all,
@marcelmamula @ja9fuchs
This PR adds option to configure a clustered SAP Web Dispatcher system.

Rationale: SAP WebDisp (WD) needs to be resilient in the same way Central Services (ASCS) do.
In cloud environments resiliency is generally achieved by installing multiple WD instances and putting a load balancer in front of them. However, with on-prem deployments load balancer is usually an overkill for small to medium deployments as it can have impact on supportability and costs. So specifically with on-prem bare-metal and VMware installations in mind this PR adds supports for clustered WDs.

Thanks to the clever way the ha_pacemaker_cluster role is written, it was relatively easy to add another host type (sap_ha_pacemaker_cluster_host_type): sap_webdisp to achieve it.
As far as I can tell (tested on 8 systems) it is completely non-impacting enhancement and can be used stand-alone or together with other cluster host types (mainly nwas_abap_ascs_ers).

It was tested on-prem only and may need a minimal enhancement for different cloud providers. Although I am not sure if there is a usage case for that.

The minimal input for it to work is as the following:

# SID and Instance Numbers for WebDisp.
sap_ha_pacemaker_cluster_wdp_sid: <WebDisp SID>
sap_ha_pacemaker_cluster_wdp_instance_nr:  <xx>
 
# Profile name created by the installer
sap_ha_pacemaker_cluster_wdp_sapinstance_instance_name: "{{ sap_ha_pacemaker_cluster_wdp_sid }}_W{{ sap_ha_pacemaker_cluster_wdp_instance_nr }}_<virtual_wdp_hostname>"
sap_ha_pacemaker_cluster_wdp_sapinstance_start_profile_string: "/sapmnt/{{ sap_ha_pacemaker_cluster_wdp_sid  }}/profile/{{ sap_ha_pacemaker_cluster_wdp_sid }}_W{{ sap_ha_pacemaker_cluster_wdp_instance_nr }}_<virtual_wdp_hostname>"
sap_ha_pacemaker_cluster_vip_wdp_ip_address: <virtual_wdp_ip>
 
# Filesystem definitions for WebDisp
# the rest can remain default
sap_ha_pacemaker_cluster_wdp_filesystem_host_mount_path: "{{ sap_storage_setup_nfs_local }}/usr/sap/{{ sap_ha_pacemaker_cluster_wdp_sid }}/W{{ sap_ha_pacemaker_cluster_wdp_instance_nr }}"

Note: For some reason Github has decided that few files can't be merged automatically and is marking the whole file as different even though only 1 line was changed. I'm not sure if I've done something wrong, but I can't see it. Any advice would be appreciated.

@marcelmamula
Copy link
Contributor

@rob0d I see you created branch only 6 days ago so it should not be problem in regards to being out of sync.
I would either:

  1. (if there was no commit) Save changes from problematic files to separate file, discard changes and reapply.
  2. Try to rebase/sync
  3. Stash changes, recreate new branch and reapply from stash.
  4. Maybe @berndfinger has another idea.

We cannot currently look into code as those re-created files make it impossible to read changes.

@marcelmamula
Copy link
Contributor

@rob0d Webdispatcher is usually such tiny system that it is most likely bolted on some existing system, where cluster can be already.
Can you clarify if this is scenario for pure WD cluster hosts?

@rob0d
Copy link
Contributor Author

rob0d commented Jan 14, 2025

@rob0d I see you created branch only 6 days ago so it should not be problem in regards to being out of sync. I would either:

1. (if there was no commit) Save changes from problematic files to separate file, discard changes and reapply.

2. Try to rebase/sync

3. Stash changes, recreate new branch and reapply from stash.

4. Maybe @berndfinger has another idea.

We cannot currently look into code as those re-created files make it impossible to read changes.

Hi @marcelmamula,
That's what I did. I created a fresh branch, put the code into it and then merged manually, submitted commit. It's just 3 files it marked as completely different. I will try again, but as I don't understand what I'm doing wrong, it's difficult to avoid it.
The code I've just submitted is in a separate private repo that's 6 months old and this code is 3 months old. I have to download and transfer the files from there to my laptop and then merge the code on my laptop, so it's quite painful if there are too many differences.

@rob0d
Copy link
Contributor Author

rob0d commented Jan 14, 2025

@rob0d Webdispatcher is usually such tiny system that it is most likely bolted on some existing system, where cluster can be already. Can you clarify if this is scenario for pure WD cluster hosts?

It can be used as a standalone cluster, but it's rarely worth it. Generally it will be installed on ASCS/ERS cluster via:

sap_ha_pacemaker_cluster_host_type:
 - nwas_abap_ascs_ers
 - sap_webdisp

Or with any other combination of cluster types. I've only tested it with ASCS/ERS.

@marcelmamula
Copy link
Contributor

@rob0d Webdispatcher is usually such tiny system that it is most likely bolted on some existing system, where cluster can be already. Can you clarify if this is scenario for pure WD cluster hosts?

It can be used as a standalone cluster, but it's rarely worth it. Generally it will be installed on ASCS/ERS cluster via:

sap_ha_pacemaker_cluster_host_type:
 - nwas_abap_ascs_ers
 - sap_webdisp

Or with any other combination of cluster types. I've only tested it with ASCS/ERS.

This is actually first case of multi-host_type scenario that we see. We were just discussing with @ja9fuchs that we might need to convert sap_ha_pacemaker_cluster_host_type to string since we cannot justify list.

@rob0d
Copy link
Contributor Author

rob0d commented Jan 14, 2025

I think the way you guys did this is genius as it allows different combinations of cluster deployments.
Installing ASCS/ERS together with HANA DB on one cluster is a valid option (https://docs.redhat.com/en/documentation/red_hat_enterprise_linux_for_sap_solutions/8/html-single/configuring_a_cost-optimized_sap_s4hana_ha_cluster_hana_system_replication_ensa2_using_the_rhel_ha_add-on/index).

@ja9fuchs
Copy link
Contributor

@rob0d I checked your commit. You have accidentally applied redundant carriage return characters ^M at the end of each line in all the files, possibly by file transfer as binaries instead of text files.

It looks like this in the diff on a terminal:

[...]
+# SPDX-License-Identifier: Apache-2.0^M
+---^M
+################################################################################^M
+# Role generic parameters^M
+################################################################################^M
[...]

There is plenty of info in web searches how to get rid of this.
For example using the tool dos2unix, or you can of course use sed or your favourite editor's way of replacing the nasty characters.

When cleaned up, please update your commit with the fixed files and it should look better. :)

@rob0d
Copy link
Contributor Author

rob0d commented Feb 3, 2025

Aaargh. :Facepalm:
Transferred from Windows to MAC and then edited. I had the editor set to keep the same line ends rather than convert to unix format. Changed it now. Will re-edit and do another commit which should fix that.
Thanks @ja9fuchs.

@ja9fuchs ja9fuchs self-requested a review February 12, 2025 09:49
Copy link
Contributor

@ja9fuchs ja9fuchs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately those Windows LF characters are still present. The last commit added a few other changes - which also contain the ^M LF at the end of the changed lines.
Edit: It looks like the last commit was aimed at PR #922 in response to Marcel's change request.

@rob0d Please check again and update the PR with a fix of the extra LF chars.

Please also rebase your branch, since I had merged other changes in the meanwhile, some of which concerned same files.

@ja9fuchs
Copy link
Contributor

ja9fuchs commented Feb 12, 2025

@rob0d Would this help you in some way to avoid adding the LF characters when working on Windows?
liximomo/vscode-sftp#40 (comment)

https://docs.github.com/en/get-started/getting-started-with-git/configuring-git-to-handle-line-endings

@marcelmamula
Copy link
Contributor

@rob0d I am also working on Windows laptop, but I do all my development on WSL2 with Tumbleweed underneath, which gives me ability to do ansible linting (which does not work on Windows natively) and it removes all windows characters from equation.

@sean-freeman
Copy link
Member

I use the following instructions for Windows:
https://github.com/sap-linuxlab/ansible.playbooks_for_sap/blob/main/docs/GET_STARTED_WINDOWS.md

If I get a CRLF problem, I use:

# If this script fails to execute with error \r command not found, use below to convert CRLF to LF
#sed -i "s/\r$//" FileName

@rob0d
Copy link
Contributor Author

rob0d commented Feb 13, 2025

Sorry guys & @ja9fuchs,
I know the last commit didn't fix it. I need to make a change, save, commit, undo change, save, commit again.
That should remove windows line endings as I've changed my VS Code to save in Unix format.
Just too many things happening right now. I am aiming to do it tomorrow.

@sean-freeman
Copy link
Member

@rob0d Possibly easier to use git reset --soft HEAD~2 to go back 2 commits while keeping the curent changes in the local directory, then git commit -m 'new commit message here', and then git push --force to overwrite the commit history of that branch.

@rob0d
Copy link
Contributor Author

rob0d commented Feb 14, 2025

@sean-freeman This is git college level. I'm barely at git high-school. Will try.

berndfinger and others added 25 commits February 14, 2025 18:54
... to new file tasks/post_install/sum_push_to_finish.yml

Signed-off-by: Bernd Finger <[email protected]>
Start order when ASCS and ERS are down does not matter.
After an ASCS failover, however, ERS should stay until
ASCS has been started. ASCS needs to read enqueue data
from ERS at start.
- Moved default stonith setup into block to share conditionals.
- New: Default location constraints when fence resources are defined
  per host - prevent per-host-fence resources from running on the
  targeted node.
1. When using the systemd start framework for SAP instances,
  the instance services must be registered on all nodes they are
  going to run on. This creates the systemd units consistently.

2. When using the SAPStartSrv resource, the related systemd units
  'sapping' and 'sappong' must be enabled.

+ For convenience: tags added to enable role execution just for
  the NWAS post-installation tasks.
- Applies the same changes as for ASCS/ERS.
- Moved task for sapservices file comments to after systemd unit
  registration to catch new entries too.
  - fixed a conditional that failed in check-mode
  - added tag to allow running in check-mode without executing ha_cluster
During the previous change of the stonith code a conditional
was removed that did not apply to the GCP default, but is
required for other platform default stonith resources that
use host maps.
This fixes the HANA VIP colocation constraint conditional to always apply
the base score and recalculate only when HANA-VIP resource groups are
present in the definition.

Assigning new values inside blocks/loops does not work (anymore) and
and a namespace object has to be used instead.

Reference: https://jinja.palletsprojects.com/en/latest/templates/#assignments
Same as in commit c6935e4, but for the task about the read-only VIP.
- fixes the hook dict definition for mixed setups
- adds a check to prevent invalid host type combinations
For some reason the LSR collection variable is in the way of the ansible
syntax-check of some tools, when the task that calls the ha_cluster LSR
uses `import_role`.
The same syntax works without errors when the task uses `include_role`
instead.

Fixes galaxy-importer error "internal-error: Unexpected error code 1".
- added .ansible/ to excludes, this is auto-created by pre-commit and
  contains the collection, on which the role excludes do not work
- sap_hostagent: removed `ansible_become` from defaults, it does not belong in the role
  but in the playbook, and it triggers the no-role-prefix rule
- sap_hostagent: changed some truthy values for consistency
- sap_hostagent linting is fine now, it is commented out in the exclude
  list
@rob0d rob0d closed this Feb 14, 2025
@rob0d rob0d deleted the ha_pacemaker_webdisp branch February 14, 2025 19:02
@rob0d rob0d restored the ha_pacemaker_webdisp branch February 14, 2025 19:06
@rob0d
Copy link
Contributor Author

rob0d commented Feb 14, 2025

It all went horribly wrong. Wasted 2 hours. Started from scratch with a new branch and PR as I don't know how to fix it...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants