Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: configure alertmanager HA cluster mode for sts #460

Closed
wants to merge 1 commit into from

Conversation

nschad
Copy link
Collaborator

@nschad nschad commented May 12, 2023

What this PR does:

Add's support for automatically adding peers for alertmanager.
I've disabled alertmanager deployment mode as I don't see how we could realistically configure the cluster mode for it.
(deployment names are not predictable/consistent). But I don't have a strong opinion about it, I just think it's unneeded.

CC: @kd7lxl @AlexandreRoux

Note: the ci test values are increased for testing purposes. They will be reverted back if we ever merge this PR

TODO:

  • Revert ci/ changes
  • Remove Alertmanager Deployment Mode if reasonable (clean up values and sts condition)

Which issue(s) this PR fixes:
Fixes #

Checklist

  • CHANGELOG.md updated - the order of entries should be [CHANGE], [FEATURE], [ENHANCEMENT], [BUGFIX]

{{- if gt (int .Values.alertmanager.replicas) 1}}
{{- $fullName := include "cortex.alertmanagerFullname" . }}
{{- range $i := until (int .Values.alertmanager.replicas) }}
- "-alertmanager.cluster.peers={{ $fullName }}-{{ $i }}.{{ $fullName }}-headless.{{ $.Release.Namespace }}.svc.cluster.local:{{ $svcClusterPort }}"
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if this even technically correct. Since according to the documentation this is supposed to be comma seperated list which is different from alertmanager configuration where you have to repeat the flag for each peer.

Copy link
Contributor

@humblebundledore humblebundledore Jun 28, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if this even technically correct. Since according to the documentation this is supposed to be comma seperated list which is different from alertmanager configuration where you have to repeat the flag for each peer.

I see you are referencing L88

	AlertmanagerClusterFlags = func(peers string) map[string]string {
		return map[string]string{
			"-alertmanager.cluster.listen-address": "0.0.0.0:9094", // This is the default, but let's be explicit.
			"-alertmanager.cluster.peers":          peers,
			"-alertmanager.cluster.peer-timeout":   "2s",
		}
	}

but @nschad where do you see that the flag need to be repeated for each peer ?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

alertmanager.cluster.peers

I don't that why I said it should be single string and comma separated (Doc: https://cortexmetrics.io/docs/configuration/configuration-file/)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

alertmanager.cluster.peers

I don't that why I said it should be single string and comma separated (Doc: https://cortexmetrics.io/docs/configuration/configuration-file/)

I misunderstood your message so, sorry.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a commit waiting to be pushed to this branch to change to a comma separated peers :

commit 9c245d488cc916df1e76be1afb66a593dabccdd8 (HEAD -> feature/alertmanager-cluster)
Author: AlexandreRoux <[email protected]>
Date:   Thu Jun 29 17:28:18 2023 +0200

    configure alertmanager cluster peers as comma seperated list

    Signed-off-by: AlexandreRoux <[email protected]>

diff --git a/templates/alertmanager/alertmanager-statefulset.yaml b/templates/alertmanager/alertmanager-statefulset.yaml
index 41b592b..b5d5043 100644
--- a/templates/alertmanager/alertmanager-statefulset.yaml
+++ b/templates/alertmanager/alertmanager-statefulset.yaml
@@ -156,9 +156,12 @@ spec:
             - "-config.file=/etc/cortex/cortex.yaml"
             {{- if gt (int .Values.alertmanager.replicas) 1}}
             {{- $fullName := include "cortex.alertmanagerFullname" . }}
+            {{- $peers := list }}
             {{- range $i := until (int .Values.alertmanager.replicas) }}
-            - "-alertmanager.cluster.peers={{ $fullName }}-{{ $i }}.{{ $fullName }}-headless.{{ $.Release.Namespace }}.svc.cluster.local:{{ $svcClusterPort }}"
+             {{- $peer := printf "%s-%d.%s-headless.%s.svc.cluster.local:%s" $fullName $i $fullName $.Release.Namespace $svcClusterPort }}
+              {{- $peers = append $peers $peer }}
             {{- end }}
+            - "-alertmanager.cluster.peers={{ join "," $peers }}"
             {{- end }}
             {{- range $key, $value := .Values.alertmanager.extraArgs }}
             - "-{{ $key }}={{ $value }}"

but it seems I have no access to push to the branch, is this expected @nschad ?

Is there any preferred way to get it here (then I will merge this upstream / branch to my code in #435.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Uh I don't know if I can enable this after the fact. I can close this PR and you can just edit your PR #435?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Uh I don't know if I can enable this after the fact. I can close this PR and you can just edit your PR #435?

sounds good !

@stale
Copy link

stale bot commented Jun 18, 2023

This issue has been automatically marked as stale because it has not had any activity in the past 30 days. It will be closed in 15 days if no further activity occurs. Thank you for your contributions.

@stale stale bot added the stale label Jun 18, 2023
@stale stale bot removed the stale label Jun 28, 2023
@nschad nschad marked this pull request as ready for review June 30, 2023 09:42
@nschad nschad marked this pull request as draft June 30, 2023 09:44
@nschad nschad closed this Jul 3, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants