-
Notifications
You must be signed in to change notification settings - Fork 188
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GITHUB-ISSUE-1837: Fix exec in pitr and proxysql pods #1838
base: main
Are you sure you want to change the base?
GITHUB-ISSUE-1837: Fix exec in pitr and proxysql pods #1838
Conversation
Just FYI - have been running this image for 24 hours now and not a single error, which is a big change from dozens of errors (detailed in the github issue) over the same period previously. |
@dcaputo-harmoni thank you for you contribution. I think it makes sense to have some kind of retry mechanism for syncing users. But I believe it's better to use https://pkg.go.dev/k8s.io/client-go/util/retry#OnError rather than retrying in a for loop. |
Looks like all the errors have come back on this last build. Since both the sync users and reconcile errors are occurring, it's probably not related to the retry methodology as that only applies to sync users. Any ideas? I did notice that it's trying to use ssl to communicate with the kubernetes api i.e.
|
Here is some additional debugging info from the operator container (downloaded the latest release of websocat to it):
As stated earlier, the -k (insecure mode) flag is needed because of the self signed certs, and websocat doesn't support providing a certificate. |
Further update - I've managed to get this working, see the last commit. In short, all of the exec commands could use error retrying, so I built the logic from |
I've added conforming retries to PodLogs and IsPodRunning, as those can occasionally encounter the same errors. |
commit: e1d4558 |
Added retries with backoff on proxysql user sync to address intermittent failures. Forced pitr filename checks to default to success error code when files not present to prevent operator from kicking up error.
Fixes #1837