Consider updating PPND logic not to remain paused for update #299

juliev0 · 2024-09-27T17:23:24Z

Summary

Currently, the PPND logic performs this sequence:

Pause pipeline and wait for it to go to Paused
Update pipeline keeping it paused and wait for that to be reconciled
Run pipeline

For Step 2, I'd decided to be paranoid in keeping it paused because I didn't want the Numaflow Controller to scale up prior to updating the Vertices; however, I can see in Numaflow that it follows this sequence:

Update, create, and delete all Vertex specs (if an error occurs, stop and return error)
Patch Vertex spec to scale up (setting replicas > 0)

Motivation

When we perform a topology change on the Pipeline, the Pipeline Controller creates a Job which creates new buffers, and buckets. Then the daemon and the vertex Pods restart with an init container that waits for those buffers and buckets to be created.

When we issue desiredPhase=Paused (even in the case of Pipeline already being paused), Numaflow tries to connect to the daemon to determine # of pending messages. If this is happening at the same time that the Job is being created and the daemon is waiting for the buffers to exist, then Numaflow will be waiting for the daemon. Theoretically, everything should work in due time, but there seem to be some issues.

Message from the maintainers:

If you wish to see this enhancement implemented please add a 👍 reaction to this issue! We often sort issues this way to know what to prioritize.

The text was updated successfully, but these errors were encountered:

juliev0 · 2024-09-27T19:42:57Z

Just talked to Derek and Sidhant about this issue - instead of doing that and since there is a real use case regardless that user may actually want desiredPhase=Paused while updating topology, Sidhant will update Numaflow Controller to not try to contact the daemon while Pipeline is paused to see if that alleviates some issues.

juliev0 added the enhancement New feature or request label Sep 27, 2024

juliev0 added this to the 0.1 - Enhance upgrade with pause and drain pipeline feature milestone Sep 27, 2024

juliev0 self-assigned this Sep 27, 2024

juliev0 changed the title ~~Update PPND logic not to remain paused for update~~ Consider updating PPND logic not to remain paused for update Sep 27, 2024

juliev0 closed this as completed Sep 30, 2024

juliev0 reopened this Oct 2, 2024

juliev0 added the deprioritized label Nov 4, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Consider updating PPND logic not to remain paused for update #299

Consider updating PPND logic not to remain paused for update #299

juliev0 commented Sep 27, 2024

juliev0 commented Sep 27, 2024

Consider updating PPND logic not to remain paused for update #299

Consider updating PPND logic not to remain paused for update #299

Comments

juliev0 commented Sep 27, 2024

Summary

Motivation

juliev0 commented Sep 27, 2024