Skip to content

Commit

Permalink
Add configurations for rootish taskgroup threshold (#8898)
Browse files Browse the repository at this point in the history
* Increase rootish dependencies

* Use config variables

* Add configurations for rootish taskgroup threshold
  • Loading branch information
phofl authored Oct 17, 2024
1 parent fa9806b commit 42e34e3
Show file tree
Hide file tree
Showing 4 changed files with 49 additions and 2 deletions.
26 changes: 26 additions & 0 deletions distributed/distributed-schema.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -133,6 +133,32 @@ properties:
generally leave `worker-saturation` at 1.0, though 1.25-1.5 could slightly improve
performance if ample memory is available.
rootish-taskgroup:
type:
- integer

description: |
Controls when a specific task group is identified as rootish when
worker saturation is set.
A task group is identifier as rootish if it has only up to a certain number
of dependencies (5 by default). This can be faulty for very large datasets
where the number of data tasks from xarray can be higher than 5.
Increasing this limit will capture these root tasks successfully but increase
the risk of misidentifying task groups as rootish, which can have
performance implications.
rootish-taskgroup-dependencies:
type:
- integer

description: |
Controls the number of transitive dependencies a task group can have to be considered rootish.
It checks the number of dependencies each dependency of a rootish task groups has.
The same caveats as for `rootish-taskgroup` apply.
worker-ttl:
type:
- string
Expand Down
2 changes: 2 additions & 0 deletions distributed/distributed.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,8 @@ distributed:
work-stealing: True # workers should steal tasks from each other
work-stealing-interval: 100ms # Callback time for work stealing
worker-saturation: 1.1 # Send this fraction of nthreads root tasks to workers
rootish-taskgroup: 5 # number of dependencies of a rootish tg
rootish-taskgroup-dependencies: 5 # number of dependencies of the dependencies of the rootish tg
worker-ttl: "5 minutes" # like '60s'. Time to live for workers. They must heartbeat faster than this
preload: [] # Run custom modules with Scheduler
preload-argv: [] # See https://docs.dask.org/en/latest/how-to/customize-initialization.html
Expand Down
11 changes: 9 additions & 2 deletions distributed/scheduler.py
Original file line number Diff line number Diff line change
Expand Up @@ -1840,6 +1840,13 @@ def __init__(
+ repr(self.WORKER_SATURATION)
)

self.rootish_tg_threshold = dask.config.get(
"distributed.scheduler.rootish-taskgroup"
)
self.rootish_tg_dependencies_threshold = dask.config.get(
"distributed.scheduler.rootish-taskgroup-dependencies"
)

@abstractmethod
def log_event(self, topic: str | Collection[str], msg: Any) -> None: ...

Expand Down Expand Up @@ -3090,8 +3097,8 @@ def is_rootish(self, ts: TaskState) -> bool:
# TODO short-circuit to True if `not ts.dependencies`?
return (
len(tg) > self.total_nthreads * 2
and len(tg.dependencies) < 5
and sum(map(len, tg.dependencies)) < 5
and len(tg.dependencies) < self.rootish_tg_threshold
and sum(map(len, tg.dependencies)) < self.rootish_tg_dependencies_threshold
)

def check_idle_saturated(self, ws: WorkerState, occ: float = -1.0) -> None:
Expand Down
12 changes: 12 additions & 0 deletions distributed/tests/test_scheduler.py
Original file line number Diff line number Diff line change
Expand Up @@ -5277,3 +5277,15 @@ async def before_close(self):
assert s.plugins["before_close"].call_count == 1
lines = caplog.getvalue().split("\n")
assert sum("Closing scheduler" in line for line in lines) == 1


@gen_cluster(
client=True,
config={
"distributed.scheduler.rootish-taskgroup": 10,
"distributed.scheduler.rootish-taskgroup-dependencies": 15,
},
)
async def test_rootish_taskgroup_configuration(c, s, *workers):
assert s.rootish_tg_threshold == 10
assert s.rootish_tg_dependencies_threshold == 15

0 comments on commit 42e34e3

Please sign in to comment.