Skip to content

v0.13.0

Compare
Choose a tag to compare
@github-actions github-actions released this 04 Nov 09:59

HyperQueue 0.13.0

New features

Resource management

  • Almost complete rewrite of resource management.
    CPU and other resources were unified: the most visible change is that you can define "cpus" and other resource;
    and other resources can now be defined in groups (NUMA-like resources).

  • Many improvements in scheduler: Improved schedules for multi-resource requests;
    better behavior on non-heterogeneous clusters;
    better interaction between resources and priorities.

Automatic allocation

  • #467 You can now pause (and resume)
    autoalloc queues using hq alloc pause and hq alloc resume.
    Paused queues will not submit new allocations into the selected job manager. They can be later resumed.
    When an autoalloc queue hits too many submission or worker execution errors, it will now be paused
    instead of removed.

Tasks

  • HQ allows to limit how many times a task may be in a running state while worker is lost
    (such a task may be a potential source of worker's crash).
    If the limit is reached, the task is marked as failed.
    The limit can be configured by --crash-limit in submit.

  • Groups of workers are introduced. A multi-node task is now started only on workers from the same group.
    By default, workers are grouped by PBS/Slurm allocations, but it can be configured manually.

Changes

Resource management

  • --cpus=no-ht is now changed to a flag --no-hyper-threading.
  • Explicit list definition of a resource was changed from --resource xxx=list(1,2,3) to --resource xxx=[1,2,3].
    (this is the result of unification of CPUs with other resources).
  • Python API: Attribute generic in ResourceRequest is renamed to resources

Tasks

  • #461 When a task is cancelled, times out
    or its worker is killed, HyperQueue now tries to make sure that both the tasks and any processes that
    it has spawned will be also terminated.
  • #480 You can now select multiple tasks in hq task info.

Artifact summary:

  • hq-v0.13.0-*: Main HyperQueue build containing the hq binary. Download this archive to
    use HyperQueue from the command line
    .
  • hyperqueue-0.13.0-*: Wheel containing the hyperqueue package with HyperQueue Python
    bindings.