Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CosmosFullNode: Rollouts intermittently take down more pods than configured #339

Closed
DavidNix opened this issue Aug 18, 2023 · 1 comment · Fixed by #376
Closed

CosmosFullNode: Rollouts intermittently take down more pods than configured #339

DavidNix opened this issue Aug 18, 2023 · 1 comment · Fixed by #376
Labels
bug Something isn't working

Comments

@DavidNix
Copy link
Contributor

It's gotten better, but I still see instances where > 1 pod will be deleted when only 1 should be at a time.

I think this happens more on sentries where we've disabled readiness probes. But I've seen it once on deployment where readiness probes were active.

I have yet to find a way to duplicate the issue reliably.

@DavidNix DavidNix added the bug Something isn't working label Aug 18, 2023
@DavidNix
Copy link
Contributor Author

You know what, sometimes I think it's the ScheduledVolumeSnapshot taking down the pod. If there's been a problem for a while, ScheduledVolumeSnapshot is pending. As soon as the min number of pods are ready, it quickly deletes one to take the snapshot.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant