CosmosFullNode: Rollouts intermittently take down more pods than configured #339

DavidNix · 2023-08-18T15:34:18Z

It's gotten better, but I still see instances where > 1 pod will be deleted when only 1 should be at a time.

I think this happens more on sentries where we've disabled readiness probes. But I've seen it once on deployment where readiness probes were active.

I have yet to find a way to duplicate the issue reliably.

DavidNix · 2023-08-18T15:49:38Z

You know what, sometimes I think it's the ScheduledVolumeSnapshot taking down the pod. If there's been a problem for a while, ScheduledVolumeSnapshot is pending. As soon as the min number of pods are ready, it quickly deletes one to take the snapshot.

DavidNix added the bug Something isn't working label Aug 18, 2023

akc2267 assigned DavidNix and unassigned DavidNix Aug 28, 2023

agouin mentioned this issue Oct 25, 2023

simplify status and perform rollouts in correct order #376

Merged

agouin closed this as completed in #376 Oct 25, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CosmosFullNode: Rollouts intermittently take down more pods than configured #339

CosmosFullNode: Rollouts intermittently take down more pods than configured #339

DavidNix commented Aug 18, 2023

DavidNix commented Aug 18, 2023

CosmosFullNode: Rollouts intermittently take down more pods than configured #339

CosmosFullNode: Rollouts intermittently take down more pods than configured #339

Comments

DavidNix commented Aug 18, 2023

DavidNix commented Aug 18, 2023