Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consider updating PPND logic not to remain paused for update #299

Open
juliev0 opened this issue Sep 27, 2024 · 1 comment
Open

Consider updating PPND logic not to remain paused for update #299

juliev0 opened this issue Sep 27, 2024 · 1 comment
Assignees
Labels
deprioritized enhancement New feature or request

Comments

@juliev0
Copy link
Collaborator

juliev0 commented Sep 27, 2024

Summary

Currently, the PPND logic performs this sequence:

  1. Pause pipeline and wait for it to go to Paused
  2. Update pipeline keeping it paused and wait for that to be reconciled
  3. Run pipeline

For Step 2, I'd decided to be paranoid in keeping it paused because I didn't want the Numaflow Controller to scale up prior to updating the Vertices; however, I can see in Numaflow that it follows this sequence:

  1. Update, create, and delete all Vertex specs (if an error occurs, stop and return error)
  2. Patch Vertex spec to scale up (setting replicas > 0)

Motivation

When we perform a topology change on the Pipeline, the Pipeline Controller creates a Job which creates new buffers, and buckets. Then the daemon and the vertex Pods restart with an init container that waits for those buffers and buckets to be created.

When we issue desiredPhase=Paused (even in the case of Pipeline already being paused), Numaflow tries to connect to the daemon to determine # of pending messages. If this is happening at the same time that the Job is being created and the daemon is waiting for the buffers to exist, then Numaflow will be waiting for the daemon. Theoretically, everything should work in due time, but there seem to be some issues.


Message from the maintainers:

If you wish to see this enhancement implemented please add a 👍 reaction to this issue! We often sort issues this way to know what to prioritize.

@juliev0 juliev0 added the enhancement New feature or request label Sep 27, 2024
@juliev0 juliev0 self-assigned this Sep 27, 2024
@juliev0 juliev0 changed the title Update PPND logic not to remain paused for update Consider updating PPND logic not to remain paused for update Sep 27, 2024
@juliev0
Copy link
Collaborator Author

juliev0 commented Sep 27, 2024

Just talked to Derek and Sidhant about this issue - instead of doing that and since there is a real use case regardless that user may actually want desiredPhase=Paused while updating topology, Sidhant will update Numaflow Controller to not try to contact the daemon while Pipeline is paused to see if that alleviates some issues.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
deprioritized enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant