Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add middleware to log server errors #2196

Merged

Conversation

ConnorJC3
Copy link
Contributor

Is this a bug fix or adding new feature?

Both? Kinda?

What is this PR about? / Why do we need it?

Fixes #2154

When the SDK hits a retryable error (for example, most server errors or network issues), it does not log the error in the log, preventing users and maintainers from discovering the nature of the issue. There is a built in logger for retries, but it would be sub-optimal to use that because it would flood the logs any time RequestLimitExceeded errors occurred. Thus, this PR instead introduces a simple logging middleware that allows us to customize the verbosity of the error message based on whether the error is a RLE/throttle error.

What testing is done?

Manually tested (We don't have a way to test log messages atm, and maybe shouldn't even be doing so)

@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Oct 24, 2024
@k8s-ci-robot k8s-ci-robot added the size/S Denotes a PR that changes 10-29 lines, ignoring generated files. label Oct 24, 2024
Copy link

github-actions bot commented Oct 24, 2024

Code Coverage Diff

File Old Coverage New Coverage Delta
github.com/kubernetes-sigs/aws-ebs-csi-driver/pkg/cloud/handlers.go 4.8% 6.5% 1.7

@ConnorJC3
Copy link
Contributor Author

/hold putting this on hold to test edge cases, but it is ready for review

@k8s-ci-robot k8s-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Oct 24, 2024
@ConnorJC3 ConnorJC3 force-pushed the log-on-server-failure branch from 84977af to 1500fba Compare October 31, 2024 18:55
@ConnorJC3
Copy link
Contributor Author

/remove-hold

@k8s-ci-robot k8s-ci-robot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. and removed size/S Denotes a PR that changes 10-29 lines, ignoring generated files. do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. labels Oct 31, 2024
@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Oct 31, 2024
@torredil
Copy link
Member

Broke DNS and can confirm the relevant errors aren't swallowed:

kubectl get pods -n kube-system
NAME                                   READY   STATUS              RESTARTS       AGE
coredns-54d6f577c6-zz27k               0/1     ContainerCreating   0              3m31s
coredns-78f4c5f7c6-js6w6               0/1     ContainerCreating   0              4m3s
coredns-78f4c5f7c6-qpqkt               0/1     ContainerCreating   0              4m3s
kubectl logs ebs-csi-controller-6dcfc9d6bf-p6t6m -n kube-system

Defaulted container "ebs-plugin" out of: ebs-plugin, csi-provisioner, csi-attacher, csi-snapshotter, csi-resizer, liveness-probe
I1031 19:45:35.639359       1 main.go:157] "Initializing metadata"
I1031 19:45:35.643350       1 metadata.go:48] "Retrieved metadata from IMDS"
I1031 19:45:35.645069       1 driver.go:69] "Driver Information" Driver="ebs.csi.aws.com" Version="v1.36.0"
E1031 19:53:15.909066       1 handlers.go:79] "Unknown error attempting to contact AWS API" err="https response error StatusCode: 0, RequestID: , request send failed, Post \"https://ec2.us-east-1.amazonaws.com/\": dial tcp: lookup ec2.us-east-1.amazonaws.com on 10.100.0.10:53: read udp 192.168.23.128:44050->10.100.0.10:53: i/o timeout"

Thank you!

/approve

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: torredil

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Oct 31, 2024
@k8s-ci-robot k8s-ci-robot merged commit c547d22 into kubernetes-sigs:master Oct 31, 2024
18 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. size/M Denotes a PR that changes 30-99 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

insufficient capacity API possibly errors not propagating into logs and events
5 participants