Implement priority mechanism in new scheduler #8139

jptrindade · 2024-09-30T14:35:06Z

Description

Added priority mechanism to the new scheduler with priorities for every existing task.

Self Check:

Strike through any lines that are not applicable (~~line~~) then check the box

Attached issue to pull request
Changelog entry
Type annotations are present
Code is clear and sufficiently documented
No (preventable) type errors (check using make mypy or make mypy-diff)
Sufficient test cases (reproduces the bug/tests the requested feature)
Correct, in line with design
End user documentation is included or an issue is created for end-user documentation (add ref to issue here: )
If this PR fixes a race condition in the test suite, also push the fix to the relevant stable branche(s) (see test-fixes for more info)

…8015-implement-priority-mechanism # Conflicts: # src/inmanta/agent/agent_new.py # src/inmanta/deploy/scheduler.py # tests/agent_server/deploy/e2e/test_autostarted.py

…echanism # Conflicts: # src/inmanta/agent/agent_new.py

Hugo-Inmanta · 2024-10-03T12:52:00Z

tests/agent_server/deploy/test_scheduler_agent.py

+async def test_scheduler_priority(agent: TestAgent, environment, make_resource_minimal):
+    """
+    Ensure that the tasks are placed in the queue in the correct order
+    And that existing tasks in the queue are replaced if a task that


Suggested change

And that existing tasks in the queue are replaced if a task that

and that existing tasks in the queue are replaced if a task that

Hugo-Inmanta · 2024-10-03T12:55:31Z

src/inmanta/deploy/work.py

@@ -377,11 +392,12 @@ def extend_blocked_on(resource: ResourceIdStr, new_blockers: set[ResourceIdStr])
                # discard rather than remove because task may already be running, in which case we leave it run its course
                # and simply add a new one
                task: tasks.Deploy = tasks.Deploy(resource=resource)
-                priority: Optional[int] = self.agent_queues.discard(task)
+                task_priority: Optional[int] = self.agent_queues.discard(task)


Should we update the typing as well ? e.g.:

Suggested change

task_priority: Optional[int] = self.agent_queues.discard(task)

task_priority: Optional[TaskPriority] = self.agent_queues.discard(task)

And similarly for the method's signature ?

Hugo-Inmanta · 2024-10-03T13:10:07Z

src/inmanta/deploy/work.py

+    TERMINATED = -1
+    USER_DEPLOY = 0
+    NEW_VERSION_DEPLOY = 1
+    USER_REPAIR = 2
+    DRYRUN = 3
+    INTERVAL_DEPLOY = 4
+    FACT_REFRESH = 5
+    INTERVAL_REPAIR = 6


Maybe a non-issue, but should we reserve some leeway by using these numbers * 10 in case we need to insert some other tasks in-between these ones in the future?

I don't think it is much of an issue, we can change the numbers how we like because in the code we just use the task's name. @sanderr what do you think?

Indeed, I care mostly about the relative order. I do think that a comment stating that these can be freely updated would be useful.

If we ever start writing priorities to the database it's different, but afaik we don't plan to.

@jptrindade could you add such a comment?

sanderr

I still have to go over the tests.

sanderr · 2024-10-04T08:24:42Z

src/inmanta/agent/agent_new.py

+        async def interval_deploy() -> None:
+            await self.scheduler.deploy(TaskPriority.INTERVAL_DEPLOY)
+
+        async def interval_repair() -> None:
+            await self.scheduler.repair(TaskPriority.INTERVAL_REPAIR)


I have a slight preference for functools.partial over defining custom methods for this. But if you prefer it like this I'm fine with it.

sanderr · 2024-10-04T08:30:30Z

src/inmanta/deploy/scheduler.py

@@ -169,20 +169,20 @@ async def stop(self) -> None:
        self._work.agent_queues.send_shutdown()
        await asyncio.gather(*self._workers.values())

-    async def deploy(self) -> None:
+    async def deploy(self, priority: TaskPriority) -> None:


I think it might make sense to add a default = TaskPriority.USER_DEPLOY and similar for the repair. We'll probably move the timers to the scheduler itself before we release, in which case outside callers should not be concerned with priorities unless they really want to override the default. If we do add the default I would make sure to update callers not to explicitly set the user-triggered priority levels.

Wdyt?

sanderr · 2024-10-04T08:45:20Z

src/inmanta/deploy/scheduler.py

        """
        Trigger a deploy
        """
        async with self._scheduler_lock:
-            self._work.deploy_with_context(self._state.dirty, deploying=self._deploying)
+            self._work.deploy_with_context(self._state.dirty, priority, deploying=self._deploying_stale)


deploying=self._deploying -> deploying=self._deploying_stale is incorrect I believe. Might this be a merge conflict resolved incorrectly?

We'll have another one between our PRs but it shouldn't be too diffcult to resolve.

sanderr · 2024-10-04T09:14:35Z

src/inmanta/deploy/scheduler.py

@@ -305,6 +305,7 @@ async def _new_version(
                # ensure deploy for ALL dirty resources, not just the new ones
                self._work.deploy_with_context(
                    self._state.dirty,
+                    TaskPriority.NEW_VERSION_DEPLOY,


Nice, I hadn't considered this to be a separate case.

sanderr · 2024-10-04T09:15:03Z

src/inmanta/deploy/scheduler.py

+                        # Not sure about this priority. I believe this method is called when a resource has a new state
+                        # and hence, new version.


Reminder to drop this comment.

As to the question: this method is called when a resource finishes deploy (or at least we only reach this branch of it in that case). The deploy that is triggered here is the propagation of events to its dependents. e.g. a service's resource deploys successfully -> we schedule the lsm state transfer resource for deploy.

I wonder if we might want a separate priority level for event propagation. I would rank it pretty high (urgent) for fast event response, but perhaps given how eagerly we currently send them out (because we'll still default to receive_events=True), that might not be the best idea at the moment.

Overall, perhaps we should simply schedule it with the same priority as the resource that just finished? We could pass in the information to this method, or we could just query the agent queues mapping. The first is more direct, but it feels a bit clunky, given how generic this method is, so I think I prefer the second.

Hmm, we don't even keep that information at the moment. I need to give this some more thought. Preliminary suggestion: make AgentQueues._in_progress a mapping from Task to PrioritizedTask instead of the current set, then query that here.

Or perhaps have the queue_get method return PrioritizedTask instead of Task, and pass on the task context to this method?

I'll get back to this. I have to consider what fits best.

sanderr · 2024-10-04T09:30:36Z

src/inmanta/deploy/work.py

+    TERMINATED = -1
+    USER_DEPLOY = 0
+    NEW_VERSION_DEPLOY = 1
+    USER_REPAIR = 2
+    DRYRUN = 3
+    INTERVAL_DEPLOY = 4
+    FACT_REFRESH = 5
+    INTERVAL_REPAIR = 6


@jptrindade could you add such a comment?

sanderr · 2024-10-04T09:40:56Z

src/inmanta/deploy/work.py

+    USER_DEPLOY = 0
+    NEW_VERSION_DEPLOY = 1


I'm inclined to swap these two (0 and 1). @wouterdb WDYT?

sanderr · 2024-10-04T09:43:15Z

src/inmanta/deploy/work.py

@@ -294,6 +307,7 @@ def reset(self) -> None:
    def deploy_with_context(
        self,
        resources: Set[ResourceIdStr],
+        priority: TaskPriority,


I'm very much in favor of making this kw-only like the other args.

sanderr · 2024-10-04T09:44:23Z

src/inmanta/deploy/work.py

                queued.remove(resource)
                self._waiting[resource] = BlockedDeploy(
-                    # FIXME[#8015]: default priority
-                    task=PrioritizedTask(task=task, priority=priority if priority is not None else 0),
+                    task=PrioritizedTask(task=task, priority=task_priority if task_priority is not None else priority),


Shouldn't we bump the priority here if it's higher?

sanderr · 2024-10-04T09:48:07Z

src/inmanta/deploy/work.py

                # scheduled as well, they will follow the provides relation to ensure this deploy waits its turn.
+                if prioritized_task.task in self.agent_queues:
+                    self.agent_queues.queue_put_nowait(prioritized_task)


Suggested change

self.agent_queues.queue_put_nowait(prioritized_task)

# simply add it again, the queue will make sure only the highest priority is kept

self.agent_queues.queue_put_nowait(prioritized_task)

jptrindade added 10 commits September 24, 2024 16:00

Introduce repair and redeploy timers to the new agent

d736967

mypy and some docstrings

70f7121

initial commit

94d22cd

[WIP]

f8890d3

Merge remote-tracking branch 'refs/remotes/origin/master' into issue/…

4d6b1f5

…8015-implement-priority-mechanism # Conflicts: # src/inmanta/agent/agent_new.py # src/inmanta/deploy/scheduler.py # tests/agent_server/deploy/e2e/test_autostarted.py

merge master

27fa325

added comment

595ce5d

Merge branch 'refs/heads/master' into issue/8015-implement-priority-m…

b3811d2

…echanism # Conflicts: # src/inmanta/agent/agent_new.py

fixed test case

0c36c84

fixed typo on repair

891c4b8

jptrindade requested a review from sanderr October 3, 2024 08:00

removed some code duplication

a80630f

jptrindade requested a review from Hugo-Inmanta October 3, 2024 08:05

Hugo-Inmanta reviewed Oct 3, 2024

View reviewed changes

updated return type

f34ef11

jptrindade requested a review from Hugo-Inmanta October 3, 2024 14:33

Hugo-Inmanta approved these changes Oct 3, 2024

View reviewed changes

sanderr requested changes Oct 4, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement priority mechanism in new scheduler #8139

Implement priority mechanism in new scheduler #8139

jptrindade commented Sep 30, 2024 •

edited

Loading

Hugo-Inmanta Oct 3, 2024

Hugo-Inmanta Oct 3, 2024

Hugo-Inmanta Oct 3, 2024

jptrindade Oct 3, 2024

sanderr Oct 3, 2024

sanderr Oct 4, 2024

sanderr left a comment

sanderr Oct 4, 2024

sanderr Oct 4, 2024

sanderr Oct 4, 2024

sanderr Oct 4, 2024

sanderr Oct 4, 2024

sanderr Oct 4, 2024

sanderr Oct 4, 2024

sanderr Oct 4, 2024

sanderr Oct 4, 2024

sanderr Oct 4, 2024

sanderr Oct 4, 2024

sanderr Oct 4, 2024

	And that existing tasks in the queue are replaced if a task that
	and that existing tasks in the queue are replaced if a task that

	task_priority: Optional[int] = self.agent_queues.discard(task)
	task_priority: Optional[TaskPriority] = self.agent_queues.discard(task)

		# Not sure about this priority. I believe this method is called when a resource has a new state
		# and hence, new version.

	self.agent_queues.queue_put_nowait(prioritized_task)
	# simply add it again, the queue will make sure only the highest priority is kept
	self.agent_queues.queue_put_nowait(prioritized_task)

Implement priority mechanism in new scheduler #8139

Are you sure you want to change the base?

Implement priority mechanism in new scheduler #8139

Conversation

jptrindade commented Sep 30, 2024 • edited Loading

Description

Self Check:

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sanderr left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jptrindade commented Sep 30, 2024 •

edited

Loading