Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix an issue where getReadyIngresses was not loading ready ingresses at startup #1079

Conversation

norbjd
Copy link
Contributor

@norbjd norbjd commented Jul 29, 2023

Changes

  • 🐛 fix getReadyIngresses not loading ingresses at startup

/kind bug

I've noticed that with latest kourier release (1.11.0), kourier was not loading at all ready ingresses at startup. This was confirmed by looking at the logs, after restarting the controller on a cluster with a few ready ingresses:

Priming the config with 0 ingresses

This log is generated by https://github.com/knative-sandbox/net-kourier/blob/knative-v1.11.0/pkg/reconciler/ingress/controller.go#L210. I would have expected to see Priming the config with 100 ingresses, as my cluster contained 100 ingresses.

After investigation, it turns out that the PR #1075 introduced a regression on getReadyIngresses method (https://github.com/knative-sandbox/net-kourier/blob/knative-v1.11.0/pkg/reconciler/ingress/controller.go#L313-L327):

ingresses, err := ingressLister.List(labels.SelectorFromSet(map[string]string{
        v1alpha1ingress.ClassAnnotationKey: config.KourierIngressClassName,
}))

This can't work, because we're passing an annotation (networking.knative.dev/ingress.class: kourier.ingress.networking.knative.dev), while List method is actually filtering using labels. As a result, this List operation will always return an empty slice.

So technically, #1075 solves the issue where kourier takes too long to start, but it's due to a bug: since there are always 0 ingress returned, there is no initial reconciliation; thus, the xds management server is started very quickly, and the readiness probe succeeds, marking kourier as ready pretty fast.

This PR aims to solves this bug. But, as I've rollbacked to the previous version of getReadyIngresses, I've "reintroduced" the slow startup issue (because it's loading all ingresses, as it should do I guess). This is why I've also copied here the changes done in #1066 (on kube api QPS and burst params).

With these 2 changes, kourier loads all ready ingresses at startup and is pretty quick to start.

I'm not sure if introducing KUBE_API_{BURST,QPM} on this PR is a good thing or not; but, I've done so because otherwise, I couldn't even restart kourier on a kind cluster with 500 ingresses (got errors, probably because I was rate limited by Kubernetes API).

Tests

With these changes, on a Kind cluster with 500 ingresses, Kourier took around 1s to restart and to be ready (after reloading the 500 ingresses, I can see the correct log Priming the config with 500 ingresses).

It's worth noting that increasing KUBE_API_{BURST,QPM} has a side effect on kourier in general: it allows to speed up the ingresses creation. I've tried creating 500 ingresses on a Kind cluster (in parallel, with a concurrency limit of 20), here are the results:

  • on main branch: it took 8m18s to create the ingresses. Creating 1 ingress took around 20s, as I were creating 20 ingresses at a time in parallel
  • with this PR (KUBE_API_{BURST,QPM}=200): it takes 20s (25x less) to create all 500 ingresses, with the same parallelism

Release Note

Fix an issue where kourier was not reloading ingresses at startup

Indeed, we can't use label selector as kourier class is not a label, but an annotation. As a result, kourier does not sync ingresses at startup
@knative-prow knative-prow bot added the kind/bug Categorizes issue or PR as related to a bug. label Jul 29, 2023
@knative-prow knative-prow bot requested review from kauana and krsna-m July 29, 2023 18:11
@knative-prow knative-prow bot added the size/S Denotes a PR that changes 10-29 lines, ignoring generated files. label Jul 29, 2023
@norbjd
Copy link
Contributor Author

norbjd commented Jul 29, 2023

Kind ping to @nak3 (:wave:) as you've worked on #1075 and #1066.

@codecov
Copy link

codecov bot commented Jul 29, 2023

Codecov Report

Merging #1079 (8af7a3f) into main (5e4fd61) will not change coverage.
The diff coverage is n/a.

@@           Coverage Diff           @@
##             main    #1079   +/-   ##
=======================================
  Coverage   80.95%   80.95%           
=======================================
  Files          18       18           
  Lines        1355     1355           
=======================================
  Hits         1097     1097           
  Misses        205      205           
  Partials       53       53           

@@ -59,6 +59,10 @@ spec:
value: "kourier-system"
- name: ENABLE_SECRET_INFORMER_FILTERING_BY_CERT_UID
value: "false"
- name: KUBE_API_BURST
Copy link
Contributor Author

@norbjd norbjd Jul 29, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For reviewers: this is based on #1066, but I couldn't find any doc on these environment variables; is there a doc somewhere that I've missed? If so, I think it would be worth adding a small comment there.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we should add a small comment. Would you like to add a comment? Or I can send a follow-up PR.

Copy link
Contributor Author

@norbjd norbjd Jul 30, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done ✔️ 8af7a3f. Feel free to tell me if comment is not clear enough 👍

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you! Looks good to me.

@nak3
Copy link
Contributor

nak3 commented Jul 30, 2023

/lgtm
/hold

Thank you so much @norbjd Please feel free to unhold if you will not include the comment about KUBE_API_ values as #1079 (comment)

@knative-prow knative-prow bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jul 30, 2023
@knative-prow knative-prow bot added the lgtm Indicates that a PR is ready to be merged. label Jul 30, 2023
@nak3 nak3 added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jul 30, 2023
@knative-prow
Copy link

knative-prow bot commented Jul 30, 2023

[APPROVALNOTIFIER] This PR is APPROVED

Approval requirements bypassed by manually added approval.

This pull-request has been approved by: norbjd

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@nak3
Copy link
Contributor

nak3 commented Jul 30, 2023

/cherry-pick release-1.11

@knative-prow-robot
Copy link

@nak3: once the present PR merges, I will cherry-pick it on top of release-1.11 in a new PR and assign it to you.

In response to this:

/cherry-pick release-1.11

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@knative-prow knative-prow bot removed the lgtm Indicates that a PR is ready to be merged. label Jul 30, 2023
@nak3
Copy link
Contributor

nak3 commented Jul 31, 2023

/lgtm
/hold cancel

@knative-prow knative-prow bot added lgtm Indicates that a PR is ready to be merged. and removed do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. labels Jul 31, 2023
@knative-prow knative-prow bot merged commit c8bcc9a into knative-extensions:main Jul 31, 2023
47 checks passed
@knative-prow-robot
Copy link

@nak3: new pull request created: #1081

In response to this:

/cherry-pick release-1.11

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@norbjd norbjd deleted the fix/getReadyIngress-always-return-0-ingresses branch June 23, 2024 14:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. kind/bug Categorizes issue or PR as related to a bug. lgtm Indicates that a PR is ready to be merged. size/S Denotes a PR that changes 10-29 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants