Skip to content

Upgrade to calico 3.29 and use windows support #5523

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Jun 2, 2025

Conversation

jsturtevant
Copy link
Contributor

@jsturtevant jsturtevant commented Mar 27, 2025

What type of PR is this?

/kind cleanup
What this PR does / why we need it:
Upgrades Calico to 3.29 and updates windows to use the upstream Calico HPC support .

I tried to remove as much of the calico manifests but the Private Cluster test still uses those so they are still needed. We still need a bunch of the CSRs for windows since the calico still needs some additional support to fully remove it (tigera/operator#3113).

I did chat briefly with @Jont828 about expanding the helmProxy to take an additional field that would be Pre/Post yaml to deploy so we could potentially remove some of the extra yaml. But that is low priority and could be a follow up.

Which issue(s) this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close the issue(s) when PR gets merged):
Fixes #5498

Special notes for your reviewer:

TODOs:

  • squashed commits
  • includes documentation
  • adds unit tests
  • cherry-pick candidate

Release note:

Update Calico to 3.29

@k8s-ci-robot k8s-ci-robot added kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. do-not-merge/release-note-label-needed Indicates that a PR should not merge because it's missing one of the release note labels. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. labels Mar 27, 2025
@jsturtevant jsturtevant mentioned this pull request Mar 27, 2025
4 tasks
Copy link

codecov bot commented Mar 27, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 53.19%. Comparing base (d9da21b) to head (333a6ae).
Report is 3 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #5523      +/-   ##
==========================================
- Coverage   53.26%   53.19%   -0.08%     
==========================================
  Files         272      272              
  Lines       29529    29582      +53     
==========================================
+ Hits        15730    15735       +5     
- Misses      12984    13027      +43     
- Partials      815      820       +5     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@k8s-ci-robot k8s-ci-robot added release-note Denotes a PR that will be considered when it comes time to generate release notes. and removed do-not-merge/release-note-label-needed Indicates that a PR should not merge because it's missing one of the release note labels. labels Mar 27, 2025
@jsturtevant
Copy link
Contributor Author

leaving my self some notes after investigating the failures:

  • When kubernetes-services-endpoint is included the operator fails to install things, which can be solved with dns changes: If "KUBERNETES_SERVICE_HOST" is domain, initial installation is not possible. (tigera-operator for windows) projectcalico/calico#9536
  • the service cidrs are not being set properly in the helm template:
    • values: "installation:\n cni:\n type: Calico\n calicoNetwork:\n bgp: Disabled\n \ windowsDataplane: HNS\n mtu: 1350\n ipPools:\n ipPools:\n - cidr: 192.168.0.0/16\n encapsulation: VXLAN\n serviceCIDRs: \n registry: mcr.microsoft.com/oss\n# Image and registry configuration for the tigera/operator pod.\ntigeraOperator:\n \ image: tigera/operator\n registry: mcr.microsoft.com/oss\ncalicoctl:\n image: mcr.microsoft.com/oss/calico/ctl\n"

@jsturtevant
Copy link
Contributor Author

not sure what to do about this one as the chart doesn't let me override dnsConfig settings

@jsturtevant jsturtevant force-pushed the upgrade-calico-3.29 branch from 5fbed9d to 78297a9 Compare March 31, 2025 22:04
@jsturtevant
Copy link
Contributor Author

with the latest changes the cluster is coming online:

Cluster capz-j9dp7i created and fully operational
NAME                              STATUS   ROLES           AGE     VERSION    INTERNAL-IP   EXTERNAL-IP   OS-IMAGE                         KERNEL-VERSION     CONTAINER-RUNTIME
capz-j9dp-55m7p                   Ready    <none>          4m7s    v1.29.10   10.1.0.4      <none>        Windows Server 2022 Datacenter   10.0.20348.2762    containerd://1.7.20
capz-j9dp-dljnd                   Ready    <none>          4m5s    v1.29.10   10.1.0.6      <none>        Windows Server 2022 Datacenter   10.0.20348.2762    containerd://1.7.20
capz-j9dp7i-control-plane-hgrj5   Ready    control-plane   6m40s   v1.29.10   10.0.0.4      <none>        Ubuntu 24.04.1 LTS               6.8.0-1016-azure   containerd://1.7.20
capz-j9dp7i-md-0-pb8j6-xnkct      Ready    <none>          5m1s    v1.29.10   10.1.0.5      <none>        Ubuntu 24.04.1 LTS               6.8.0-1016-azure   containerd://1.7.20
capz-j9dp7i-md-0-pb8j6-zc2cm      Ready    <none>          4m54s   v1.29.10   10.1.0.7      <none>        Ubuntu 24.04.1 LTS               6.8.0-1016-azure   containerd://1.7.20

@jsturtevant
Copy link
Contributor Author

not sure what to do about this one as the chart doesn't let me override dnsConfig settings

Opened projectcalico/calico#10117

@jsturtevant
Copy link
Contributor Author

The workload upgrade test is failing with cloud manager crashing:

E0401 00:45:09.324689       1 runtime.go:79] Observed a panic: "invalid memory address or nil pointer dereference" (runtime error: invalid memory address or nil pointer dereference)
goroutine 321 [running]:
k8s.io/apimachinery/pkg/util/runtime.logPanic({0x20b7500, 0x3b02f10})
	/go/src/sigs.k8s.io/cloud-provider-azure/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:75 +0x85
k8s.io/apimachinery/pkg/util/runtime.HandleCrash({0x0, 0x0, 0xc000783a40?})
	/go/src/sigs.k8s.io/cloud-provider-azure/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:49 +0x6b
panic({0x20b7500?, 0x3b02f10?})
	/usr/local/go/src/runtime/panic.go:785 +0x132
sigs.k8s.io/cloud-provider-azure/pkg/provider.(*Cloud).getRouteTable(0x485d95?, 0x483d17?)
	/go/src/sigs.k8s.io/cloud-provider-azure/pkg/provider/azure_vmsets_repo.go:183 +0x4f

Is this a known issue? I don't see anything that makes it look like this is related to changes here.

@jsturtevant
Copy link
Contributor Author

jsturtevant commented Apr 1, 2025

This is now mostly passing, It requires:

  • a release of the upstream Calico helm chart but the change as been checked in.
  • Images for the Windows Calico in the same location as the linux ones (working internally to solve this issue)

@nawazkh
Copy link
Member

nawazkh commented Apr 2, 2025

Is this a known issue? I don't see anything that makes it look like this is related to changes here.

Super strange that it is flaky with a panic..

@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Apr 8, 2025
@jsturtevant
Copy link
Contributor Author

All the major challenges have been addresses. Just waiting on a Calico release that contains the updated helm chart

@jsturtevant jsturtevant force-pushed the upgrade-calico-3.29 branch from 4285b61 to 1f6f8b4 Compare April 30, 2025 20:36
@k8s-ci-robot k8s-ci-robot added needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. and removed needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. labels Apr 30, 2025
@jsturtevant
Copy link
Contributor Author

Still waiting on a helm chart release. I saw they released the next calico version but don't see the updates in the helm chart so I will ping them to find out if that will be released too

It wasn't scheduled till Calico 3.31 but I requested some backports so should have it in the next patch release! projectcalico/calico#10117 (comment)

@jsturtevant jsturtevant force-pushed the upgrade-calico-3.29 branch from 1f6f8b4 to 06b1720 Compare May 23, 2025 20:28
@k8s-ci-robot k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label May 23, 2025
@jsturtevant jsturtevant force-pushed the upgrade-calico-3.29 branch from 47b9852 to 87199fa Compare May 23, 2025 22:37
Copy link
Contributor

@willie-yao willie-yao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks mostly ready to me! Just some minor comments and questions from my end

@jsturtevant jsturtevant force-pushed the upgrade-calico-3.29 branch from 87199fa to 66ab897 Compare May 30, 2025 14:27
@jsturtevant jsturtevant force-pushed the upgrade-calico-3.29 branch 2 times, most recently from 01bb7f0 to 636b3d2 Compare May 30, 2025 14:39
Signed-off-by: James Sturtevant <[email protected]>
@jsturtevant jsturtevant force-pushed the upgrade-calico-3.29 branch from 636b3d2 to 333a6ae Compare May 30, 2025 17:49
Copy link
Contributor

@willie-yao willie-yao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, just one more small nit + can you squash again?

kubernetesServiceEndpoint:
host: "{{ .Cluster.spec.controlPlaneEndpoint.host }}"
port: "{{ .Cluster.spec.controlPlaneEndpoint.port }}"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: eof line

@jackfrancis
Copy link
Contributor

/test pull-cluster-api-provider-azure-e2e-optional

/hold for optional test results

/lgtm
/approve

@k8s-ci-robot k8s-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label May 30, 2025
@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label May 30, 2025
@k8s-ci-robot
Copy link
Contributor

LGTM label has been added.

Git tree hash: 2fa8cfe0cf5ffab3de995e100243113403e07193

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label May 30, 2025
@willie-yao
Copy link
Contributor

/label tide/merge-method-squash

@k8s-ci-robot k8s-ci-robot added the tide/merge-method-squash Denotes a PR that should be squashed by tide when it merges. label May 30, 2025
@willie-yao
Copy link
Contributor

/lgtm
/approve

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: jackfrancis, willie-yao

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:
  • OWNERS [jackfrancis,willie-yao]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@willie-yao
Copy link
Contributor

/hold cancel

@k8s-ci-robot k8s-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jun 2, 2025
@k8s-ci-robot k8s-ci-robot merged commit db8c747 into kubernetes-sigs:main Jun 2, 2025
31 checks passed
@k8s-ci-robot k8s-ci-robot added this to the v1.20 milestone Jun 2, 2025
@github-project-automation github-project-automation bot moved this from Todo to Done in CAPZ Planning Jun 2, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. lgtm "Looks good to me", indicates that a PR is ready to be merged. release-note Denotes a PR that will be considered when it comes time to generate release notes. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. tide/merge-method-squash Denotes a PR that should be squashed by tide when it merges.
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

Update the Calico version for testing
5 participants