Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replication zone support for store-gateway component #1

Closed
wants to merge 6 commits into from

Conversation

ryan-dyer-sp
Copy link
Owner

What this PR does

Which issue(s) this PR fixes or relates to

Fixes #

Checklist

  • Tests updated
  • Documentation added
  • CHANGELOG.md updated - the order of entries should be [CHANGE], [FEATURE], [ENHANCEMENT], [BUGFIX]

Copy link

@Logiraptor Logiraptor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall looking really good, left some feedback.

I think the bulk of work (other than getting this to work) will be making sure we can support live migration of an existing deployment, so it's important that the diff with zone-awareness disabled is either empty or not significant.

affinity:
{{- toYaml .Values.store_gateway.affinity | nindent 8 }}
{{- $rolloutZone.affinity | toYaml | nindent 8 }}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In order to support migration from single to multi-zone, we'll need to preserve the old behavior when zone aware replication is disabled.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When last Krajo and I spoke, we weren't supporting migration from single to multi-zone or vice versa.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gotcha, that was probably true when you spoke to Krajo. The current plan is to support a live migration though. We (Grafana Labs) don't really have an option since this is the helm chart we ask our enterprise customers to use and we will need to provide an upgrade path for them in production.

It's probably more than twice as much work to include migration, so it makes sense to me if you'd rather a maintainer take over at that point.

affinity:
podAntiAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we could be more opinionated on the pod anti-affinity in order to simplify the configuration. What if we just always use the name and rollout-group labels for this? Do you think that would be too inflexible?

We would still need to allow users to override the nodeSelector, since different environments will have different means of distinguishing availability zones at the node level.

operations/helm/charts/mimir-distributed/values.yaml Outdated Show resolved Hide resolved
@@ -1879,3 +1934,7 @@ metaMonitoring:
# -- Annotations to add to all monitoring.grafana.com custom resources.
# Does not affect the ServiceMonitors for kubernetes metrics; use serviceMonitor.annotations for that.
annotations: {}

zone_aware_replication:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the intention to allow overriding the zone awareness in a single place or in each component (like store-gateway above?)

@@ -303,3 +342,38 @@ Cluster name that shows up in dashboard metrics
{{- print "policy/v1beta1" -}}
{{- end -}}
{{- end -}}

{{/* Creates dict for zone aware replication configuration */}}
{{- define "mimir.zoneAwareReplicationMap" -}}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code is taking me a bit to understand. Is there some way we can modify the config format in values.yaml to reduce the need for this code? What I mean is, can we make the values.yaml use the same structure you expect this to return or is that not feasible for some reason?

Copy link
Owner Author

@ryan-dyer-sp ryan-dyer-sp Jun 29, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are two structures here of importance:
.Values.zone_aware_replication
.Values.<component>.zone_aware_replication.

This merges the two with component overriding non. There are many facets to what a person may want to configure for zone awareness. In our case a zone is an aws AZ. I believe for grafana its a virtual labelling and grouping of nodes. So for us we want to ensure the pods are scheduled (via antiaffinity) into the appropriate AZs (topology.kubernetes.io/zone). For grafana it would be something entirely different. There's also the affinity one may desire to prevent pods from binpacking into the same hostname. And then there's node selection. We dont use it but others may have dedicated nodes for their mimir stuff entirely, or perhaps they just want the ingesters on different nodes. We're close to that point in our prod env as our mimir ingesters are using nearly the entire RAM that can be provided by our default node groups RAM. So we nearly need to create a larger instance type just for the ingesters.

The .Values.zone_aware_replication is for those that simply want to set a common nodeSelection/affinity across all the components with the ability to overwrite that definition via the .Values.component.zone...

If we want to get rid of the .Values.zone_aware_replication entirely and force users to individually define each .Values.component.z_a_r , I'm fine with that, it would definitely simply the code, but at the expense of some users ability to configure more easily.

Unfortunately due to the nature of this feature and what is a zone, I dont think we're going to make everyone happy.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is also the aspect of the default values.yaml that the chart provides along with any values files that a user supplies. In order for values that a user supplies to overwrite the "default" configuration from the helm chart, the "map" has to be a list not a dict as helm merges dicts, but overwrites lists when dealing with combining values files and/or setting variables directly.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the thorough explanation! I agree we can't make everyone happy. I think we should prioritize two things:

  1. Ensure the output for a given values file is relatively predictable. Nothing more annoying than running into template errors again and again when you already know exactly what you want to have in the final output.
  2. Ensure a reasonable amount of flexibility to support 90% of use-cases. We can't get to 100% without adding a field for everything in kubernetes which defeats the purpose of a helm chart.

With that said, I think I'd prefer to eliminate the top-level .Values.zone_aware_replication. In most cases we will need to at least specify a different value for the app.kubernetes.io/component label. Either store-gateway, ingester, alertmanager, so I think it's probably relatively rare that a common zone configuration is possible. Same argument applies for node selectors, since these components have such different resource requirements it's unlikely users would want to move just those three onto a special node type I think.

Then in the (I assume rare) case that is needed, it's still possible, albeit with a bit of duplication.

WDYT?

@ryan-dyer-sp ryan-dyer-sp force-pushed the rollout-operator-support branch 3 times, most recently from 64b511e to 1e7ba5b Compare July 15, 2022 15:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants