From ae86af7aec678096679ac8aba6ed743624d57bc0 Mon Sep 17 00:00:00 2001 From: Damiano Donati Date: Wed, 25 May 2022 14:52:57 +0200 Subject: [PATCH] set kubelet's --provider-id flag - What I did I Added AWS specific systemd unit (aws-kubelet-providerid.service) and file (/usr/local/bin/aws-kubelet-providerid) for generating the AWS instance provider-id (then stored in the KUBELET_PROVIDERID env var), in order to pass it as the --provider-id argument to the kubelet service binary. We needed to add such flag, and make it non-empty only on AWS, to make the node syncing (specifically backing instance detection) work via provider-id detection, to cover cases where the node hostname doesn't match the expected private-dns-name (e.g. when a custom DHCP Option Set with empty domain-name is used). Should fix: https://bugzilla.redhat.com/show_bug.cgi?id=2084450 Reference to an upstream issue with context: kubernetes/cloud-provider-aws#384 - How to verify it Try the reproduction steps available at: https://bugzilla.redhat.com/show_bug.cgi?id=2084450#c0 while launching a cluster with this MCO PR included. Verify that the issue is not reproducible anymore. --- .../usr-local-bin-aws-kubelet-providerid.yaml | 26 +++++++++++++++++++ .../units/aws-kubelet-providerid.service.yaml | 23 ++++++++++++++++ .../_base/units/kubelet.service.yaml | 1 + .../_base/units/kubelet.service.yaml | 1 + 4 files changed, 51 insertions(+) create mode 100644 templates/common/aws/files/usr-local-bin-aws-kubelet-providerid.yaml create mode 100644 templates/common/aws/units/aws-kubelet-providerid.service.yaml diff --git a/templates/common/aws/files/usr-local-bin-aws-kubelet-providerid.yaml b/templates/common/aws/files/usr-local-bin-aws-kubelet-providerid.yaml new file mode 100644 index 0000000000..b3f3d73fc8 --- /dev/null +++ b/templates/common/aws/files/usr-local-bin-aws-kubelet-providerid.yaml @@ -0,0 +1,26 @@ +mode: 0755 +path: "/usr/local/bin/aws-kubelet-providerid" +contents: + inline: | + #!/bin/bash + set -e -o pipefail + + NODECONF=/etc/systemd/system/kubelet.service.d/20-aws-providerid.conf + + if [ -e "${NODECONF}" ]; then + echo "Not replacing existing ${NODECONF}" + exit 0 + fi + + # Due to a potential mismatch between Hostname and PrivateDNSName with clusters that use custom DHCP Option Sets + # which can cause issues in cloud controller manager node syncing + # (see: https://github.com/kubernetes/cloud-provider-aws/issues/384), + # set KUBELET_PROVIDERID to be a fully qualified AWS instace provider id. + # This new variable is later used to populate the kubelet's `provider-id` flag, later set on the Node .spec + # and used by the cloud controller manager's node controller to retrieve the Node's backing instance. + # This is obtained by using afterburn service variables, in turn obtained from metadata retrival. + # See respective systemd unit metadata related afterburn doc: https://coreos.github.io/afterburn/usage/attributes/ + cat > "${NODECONF}" <