-
Notifications
You must be signed in to change notification settings - Fork 59
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fresh install of tobs fails with promscale db password #562
Comments
Thanks for the report. Can you try the same, but with the latest version of the Helm chart? It's |
So I think what it is (and I'll verify again later) is that if a tobs install fails for any reason, context deadline, node networking, etc, it corrupts the namespace to any future installs. The entire namespace needs to be completely blown away in order for a new install to occur. While I saw a few suggestions here and there in response to other people's problems, it was never called out as an SOP step that should be taken. I think the instructions should clearly state, or at least there should be a troubleshooting section that describes common troubleshooting steps, especially because not everything will get uninstalled when it does uninstall (in the rare case it does manage to uninstall successfully). Specifically, from a flux standpoint, the flux helmrelease needs to be deleted
Then the helm deployment needs to be deleted
Then the entire namespace needs to be deleted, and forced because some of the components often get stuck during uninstall and deletion
However often times that isn't enough. You must list the pods in the ns and delete them individually. Sometimes secrets too.
Then delete the namespace again
Because of this, I think tobs should also come with a recommendation or default to being installed in its own namespace. Also, from a helm and flux standpoint, timeouts should be set to 15m+, and there are some common errors that should be ignored at least for the timeout period such as the tobs-promscale postgres error
As this one seems to either resolve itself after a while, or randomly succeeds during an install, not quite sure yet. If I have time I'll do a pull request to add some documentation. |
Ah yes. We are aware of some of these issues already. Namely cleaning up the namespace when you helm delete the tobs installation. That is currently being addressed here #365 While you can install tobs in any namespace you like the issue with leaving artifacts behind makes it a bit difficult to uninstall. I agree better documentation is needed, there are several open issues that we are working through already.
Yes it is recommended to set a timeout of 15m, we do this currently with our testing suite. Adding better documentation is needed and it will be addressed. Thanks for the update! |
What did you do?
This is a fresh install of tobs into a namespace using helm and fluxcd.
https://github.com/lenaxia/k3s-ops-dev/blob/main/components/apps/base/monitoring/tobs/helm-release.yaml
https://github.com/lenaxia/k3s-ops-dev/blob/main/components/apps/dev/tobs-values.yaml
pod/tobs-promscale ends up in a crash loop unable to connect to the timescaledb
Did you expect to see some different?
tobs should've installed without issue
Environment
Kubernetes version information:
kubectl version
K3s installed via Flux:
flux version
flux check
kcl tobs-promscale-788c855fc5-59rqv -n monitoring
kc get secret tobs-credentials -n monitoring -o yaml
echo RXB5VlByYzc2NE15MVQyRg== | base64 -d
kubectl describe deploy tobs-promscale -n monitoring
kc get secret tobs-promscale -n monitoring -o yaml
echo RXB5VlByYzc2NE15MVQyRg== | base64 -d
Anything else we need to know?:
Installing tobs seems to be really unstable, especially with opentelemetry enabled. I've gotten it to install once or twice okay, but shortly thereafter it becomes unhealthy. And now it won't even install anymore.
The text was updated successfully, but these errors were encountered: