Skip to content

Releases: It4innovations/hyperqueue

v0.12.0-rc1

29 Aug 14:38
Compare
Choose a tag to compare
v0.12.0-rc1 Pre-release
Pre-release

HyperQueue 0.12.0-rc1

New features

Automatic allocation

  • #457 You can now specify the idle timeout
    for workers started by the automatic allocator using the --idle-timeout flag of the hq alloc add command.

Resiliency

  • #449 Tasks that were present during multiple
    crashes of the workers will be canceled.

CLI

  • #463 You can now wait until N workers
    are connected to the clusters with hq worker wait N.

Python API

  • Resource requests improvements in Python API.

Changes

CLI

  • #477 Requested resources are now shown while
    submitting an array and while viewing information about task TASK_ID of specified
    job JOB_ID using hq task info JOB_ID TASK_ID

  • #444 The hq task list command will now
    hide some details by default, to conserve space in terminal output. To show all details, use the
    -v flag to enable verbose output.

  • #455 Improve the quality of error messages
    produced when parsing various CLI parameters, like resources.

Automatic allocation

  • #448 The automatic allocator will now start
    workers in multi-node Slurm allocations using srun --overlap. This should avoid taking up Slurm
    task resources by the started workers (if possible). If you run into any issues with using srun
    inside HyperQueue tasks, please let us know.

Jobs

  • #483 There is no longer a length limit
    for job names.

Fixes

Job submission

  • #450 Attempts to resubmit a job with zero
    tasks will now result in an explicit error, rather than a crash of the client.

Artifact summary:

  • hq-v0.12.0-rc1-*: Main HyperQueue build containing the hq binary. Download this archive to
    use HyperQueue from the command line
    .
  • hyperqueue-0.12.0-rc1-*: Wheel containing the hyperqueue package with HyperQueue Python
    bindings.

v0.11.0-ligate1

22 Jul 12:25
Compare
Choose a tag to compare
v0.11.0-ligate1 Pre-release
Pre-release

HyperQueue 0.11.0-ligate1

New features

CLI

  • #423 You can now specify the server
    directory using the HQ_SERVER_DIR environment variable.

Resource management

  • #427 A new specifier has been added to
    specify indexed pool resources for workers as a set of individual resource indices.
    $ hq worker start --resource "gpus=list(1,3,8)"
  • #428 Workers will now attempt to automatically
    detect available GPU resources from the CUDA_VISIBLE_DEVICES environment variable.

Stream log

  • Basic export of stream log into JSON (hq log <log_file> export)

Server

  • Improved scheduling of multi-node tasks.

  • Server now generates a random unique ID (UID) string every time a new server is started (hq server start).
    It can be used as a placeholder %{SERVER_ID}.

Changes

CLI

  • #433 (Backwards incompatible change)
    The CLI command hq job tasks has been removed and its functionality has been incorporated into the
    hq task list command instead.
    resource requests,

  • #420 Shebang (e.g. #!/bin/bash) will
    now be read from submitted program based on the provided
    directives mode. If a shebang
    is found, HQ will execute the program located at the shebang path and pass it the rest of the
    submitted arguments.

    By default, directives and shebang will be read from the submitted program only if its filename ends
    with .sh. If you want to explicitly enable reading the shebang, pass --directives=file to
    hq submit.

    Another change is that the shebang is now read by the client (i.e. it will be read on the node that
    submits the job), not on worker nodes as previously. This means that the submitted file has to be
    accessible on the client node.

Resource management

  • #427 (Backwards incompatible change)
    The environment variable HQ_RESOURCE_INDICES_<resource-name>, which is passed to tasks with
    resource requests,
    has been renamed to HQ_RESOURCE_VALUES_<resource-name>.

  • #427 (Backwards incompatible change)
    The specifier for specifying indexed pool resources for workers as a range has been renamed from
    indices to range.

    # before
    $ hq worker start --resource "gpus=indices(1-3)"
    # now
    $ hq worker start --resource "gpus=range(1-3)"
  • #427 The
    generic resource
    documentation has been rewritten and improved.

Artifact summary:

  • hq-v0.11.0-ligate1-*: Main HyperQueue build containing the hq binary. Download this archive to
    use HyperQueue from the command line
    .
  • hyperqueue-0.11.0-ligate1-*: Wheel containing the hyperqueue package with HyperQueue Python
    bindings.

v0.11.0

21 Jun 14:44
Compare
Choose a tag to compare

HyperQueue 0.11.0

New features

CLI

  • #423 You can now specify the server
    directory using the HQ_SERVER_DIR environment variable.

Resource management

  • #427 A new specifier has been added to
    specify indexed pool resources for workers as a set of individual resource indices.
    $ hq worker start --resource "gpus=list(1,3,8)"
  • #428 Workers will now attempt to automatically
    detect available GPU resources from the CUDA_VISIBLE_DEVICES environment variable.

Stream log

  • Basic export of stream log into JSON (hq log <log_file> export)

Server

  • Improved scheduling of multi-node tasks.

  • Server now generates a random unique ID (UID) string every time a new server is started (hq server start).
    It can be used as a placeholder %{SERVER_ID}.

Changes

CLI

  • #433 (Backwards incompatible change)
    The CLI command hq job tasks has been removed and its functionality has been incorporated into the
    hq task list command instead.
    resource requests,

  • #420 Shebang (e.g. #!/bin/bash) will
    now be read from submitted program based on the provided
    directives mode. If a shebang
    is found, HQ will execute the program located at the shebang path and pass it the rest of the
    submitted arguments.

    By default, directives and shebang will be read from the submitted program only if its filename ends
    with .sh. If you want to explicitly enable reading the shebang, pass --directives=file to
    hq submit.

    Another change is that the shebang is now read by the client (i.e. it will be read on the node that
    submits the job), not on worker nodes as previously. This means that the submitted file has to be
    accessible on the client node.

Resource management

  • #427 (Backwards incompatible change)
    The environment variable HQ_RESOURCE_INDICES_<resource-name>, which is passed to tasks with
    resource requests,
    has been renamed to HQ_RESOURCE_VALUES_<resource-name>.

  • #427 (Backwards incompatible change)
    The specifier for specifying indexed pool resources for workers as a range has been renamed from
    indices to range.

    # before
    $ hq worker start --resource "gpus=indices(1-3)"
    # now
    $ hq worker start --resource "gpus=range(1-3)"
  • #427 The
    generic resource
    documentation has been rewritten and improved.

Artifact summary:

  • hq-v0.11.0-*: Main HyperQueue build containing the hq binary. Download this archive to
    use HyperQueue from the command line
    .
  • hyperqueue-0.11.0-*: Wheel containing the hyperqueue package with HyperQueue Python
    bindings.

v0.11.0-rc1

16 Jun 11:44
Compare
Choose a tag to compare
v0.11.0-rc1 Pre-release
Pre-release

HyperQueue 0.11.0-rc1

New features

CLI

  • #423 You can now specify the server
    directory using the HQ_SERVER_DIR environment variable.

Resource management

  • #427 A new specifier has been added to
    specify indexed pool resources for workers as a set of individual resource indices.
    $ hq worker start --resource "gpus=list(1,3,8)"
  • #428 Workers will now attempt to automatically
    detect available GPU resources from the CUDA_VISIBLE_DEVICES environment variable.

Stream log

  • Basic export of stream log into JSON (hq log <log_file> export)

Server

  • Improved scheduling of multi-node tasks.

  • Server now generates a random unique ID (UID) string every time a new server is started (hq server start).
    It can be used as a placeholder %{SERVER_ID}.

Changes

CLI

  • #433 (Backwards incompatible change)
    The CLI command hq job tasks has been removed and its functionality has been incorporated into the
    hq task list command instead.
    resource requests,

  • #420 Shebang (e.g. #!/bin/bash) will
    now be read from submitted program based on the provided
    directives mode. If a shebang
    is found, HQ will execute the program located at the shebang path and pass it the rest of the
    submitted arguments.

    By default, directives and shebang will be read from the submitted program only if its filename ends
    with .sh. If you want to explicitly enable reading the shebang, pass --directives=file to
    hq submit.

    Another change is that the shebang is now read by the client (i.e. it will be read on the node that
    submits the job), not on worker nodes as previously. This means that the submitted file has to be
    accessible on the client node.

Resource management

  • #427 (Backwards incompatible change)
    The environment variable HQ_RESOURCE_INDICES_<resource-name>, which is passed to tasks with
    resource requests,
    has been renamed to HQ_RESOURCE_VALUES_<resource-name>.

  • #427 (Backwards incompatible change)
    The specifier for specifying indexed pool resources for workers as a range has been renamed from
    indices to range.

    # before
    $ hq worker start --resource "gpus=indices(1-3)"
    # now
    $ hq worker start --resource "gpus=range(1-3)"
  • #427 The
    generic resource
    documentation has been rewritten and improved.

Artifact summary:

  • hq-v0.11.0-rc1-*: Main HyperQueue build containing the hq binary. Download this archive to
    use HyperQueue from the command line
    .
  • hyperqueue-0.11.0-rc1-*: Wheel containing the hyperqueue package with HyperQueue Python
    bindings.

v0.10.0

20 May 13:21
Compare
Choose a tag to compare

HyperQueue 0.10.0

New features

Running tasks

  • HQ will now set the OpenMP OMP_NUM_THREADS environment variable for each task. The amount of threads
    will be set according to the number of requested cores. For example, this job submission:
$ hq submit --cpus=4 -- <program>

would pass OMP_NUM_THREADS=4 to the executed <program>.

  • New task OpenMP pinning mode was added. You can now use --pin=omp when submitting jobs. This
    CPU pin mode will generate the corresponding OMP_PLACES and OMP_PROC_BIND environment variables
    to make sure that OpenMP pins its threads to the exact cores allocated by HyperQueue.

  • Preview version of multi-node tasks. You may submit multi-node task by hq submit --nodes=X ...

CLI

  • Less verbose log output by default. You can use "--debug" to turn on the old behavior.

Changes

Scheduler

  • When there is only a few tasks, scheduler tries to fit tasks on fewer workers.
    Goal is to enable earlier stopping of workers because of idle timeout.

CLI

  • The --pin boolean option for submitting jobs has been changed to take a value. You can get the
    original behaviour by specifying --pin=taskset.

Fixes

Automatic allocation

  • PBS/Slurm allocations using multiple workers will now correctly spawn a HyperQueue worker on all
    allocated nodes.

Artifact summary:

  • hq-v0.10.0-*: Main HyperQueue build containing the hq binary. Download this archive to
    use HyperQueue from the command line
    .
  • hyperqueue-0.10.0-*: Wheel containing the hyperqueue package with HyperQueue Python
    bindings.

v0.10.0-rc1

13 May 09:12
Compare
Choose a tag to compare
v0.10.0-rc1 Pre-release
Pre-release

HyperQueue 0.10.0-rc1

New features

Running tasks

  • HQ will now set the OpenMP OMP_NUM_THREADS environment variable for each task. The amount of threads
    will be set according to the number of requested cores. For example, this job submission:
$ hq submit --cpus=4 -- <program>

would pass OMP_NUM_THREADS=4 to the executed <program>.

  • New task OpenMP pinning mode was added. You can now use --pin=omp when submitting jobs. This
    CPU pin mode will generate the corresponding OMP_PLACES and OMP_PROC_BIND environment variables
    to make sure that OpenMP pins its threads to the exact cores allocated by HyperQueue.

  • Preview version of multi-node tasks. You may submit multi-node task by hq submit --nodes=X ...

CLI

  • Less verbose log output by default. You can use "--debug" to turn on the old behavior.

Changes

Scheduler

  • When there is only a few tasks, scheduler tries to fit tasks on fewer workers.
    Goal is to enable earlier stopping of workers because of idle timeout.

CLI

  • The --pin boolean option for submitting jobs has been changed to take a value. You can get the
    original behaviour by specifying --pin=taskset.

Fixes

Automatic allocation

  • PBS/Slurm allocations using multiple workers will now correctly spawn a HyperQueue worker on all
    allocated nodes.

Artifact summary:

  • hq-v0.10.0-rc1-*: Main HyperQueue build containing the hq binary. Download this archive to
    use HyperQueue from the command line
    .
  • hyperqueue-0.10.0-rc1-*: Wheel containing the hyperqueue package with HyperQueue Python
    bindings.

v0.9.0

16 Mar 08:47
Compare
Choose a tag to compare

HyperQueue 0.9.0

New features

Tasks

  • Task may be started with a temporary directory that is automatically deleted when the task is finished.
    (flag --task-dir).

  • Task may provide its own error message by creating a file with name passed by environment variable
    HQ_ERROR_FILENAME.

CLI

  • You can now use the hq task list <job-selector> command to display a list of tasks across multiple jobs.
  • Add --filter flag to worker list to allow filtering workers by their status.

Changes

Automatic allocation

  • Automatic allocation has been rewritten from scratch. It will no longer query PBS/Slurm allocation
    statuses periodically, instead it will try to derive allocation state from workers that connect
    to it from allocations.
  • When adding a new allocation queue, HyperQueue will now try to immediately submit a job into the queue
    to quickly test whether the entered configuration is correct. If you want to avoid this behaviour, you
    can use the --no-dry-run flag for hq alloc add <pbs/slurm>.
  • If too many submissions (10) or running allocations (3) fail in a succession, the corresponding
    allocation queue will be automatically removed to avoid error loops.
  • hq alloc events command has been removed.
  • The --max-kept-directories parameter for allocation queues has been removed. HyperQueue will now keep
    20 last allocation directories amongst all allocation queues.

Fixes

  • HQ will no longer warn that stdout/stderr path does not contain the %{TASK_ID} placeholder
    when submitting array jobs if the placeholder is contained within the working directory path and
    stdout/stderr contains the %{CWD} placeholder.

v0.9.0-rc3

11 Mar 13:22
Compare
Choose a tag to compare
v0.9.0-rc3 Pre-release
Pre-release

HyperQueue 0.9.0-rc3

New features

Tasks

  • Task may be started with a temporary directory that is automatically deleted when the task is finished.
    (flag --task-dir).

  • Task may provide its own error message by creating a file with name passed by environment variable
    HQ_ERROR_FILENAME.

CLI

  • You can now use the hq task list <job-selector> command to display a list of tasks across multiple jobs.
  • Add --filter flag to worker list to allow filtering workers by their status.

Changes

Automatic allocation

  • Automatic allocation has been rewritten from scratch. It will no longer query PBS/Slurm allocation
    statuses periodically, instead it will try to derive allocation state from workers that connect
    to it from allocations.
  • When adding a new allocation queue, HyperQueue will now try to immediately submit a job into the queue
    to quickly test whether the entered configuration is correct. If you want to avoid this behaviour, you
    can use the --no-dry-run flag for hq alloc add <pbs/slurm>.
  • If too many submissions (10) or running allocations (3) fail in a succession, the corresponding
    allocation queue will be automatically removed to avoid error loops.
  • hq alloc events command has been removed.
  • The --max-kept-directories parameter for allocation queues has been removed. HyperQueue will now keep
    20 last allocation directories amongst all allocation queues.

Fixes

  • HQ will no longer warn that stdout/stderr path does not contain the %{TASK_ID} placeholder
    when submitting array jobs if the placeholder is contained within the working directory path and
    stdout/stderr contains the %{CWD} placeholder.

v0.9.0-rc2

24 Feb 13:34
Compare
Choose a tag to compare
v0.9.0-rc2 Pre-release
Pre-release

HyperQueue 0.9.0-rc2

New features

Tasks

  • Task may be started with a temporary directory that is automatically deleted when the task is finished.
    (flag --task-dir).

CLI

  • You can now use the hq task list <job-selector> command to display a list of tasks across multiple jobs.
  • Add --filter flag to worker list to allow filtering workers by their status.

Changes

Automatic allocation

  • When adding a new allocation queue, HyperQueue will now try to immediately submit a job into the queue
    to quickly test whether the entered configuration is correct. If you want to avoid this behaviour, you
    can use the --no-dry-run flag for hq alloc add <pbs/slurm>.
  • The automatic allocator will now be invoked much less frequently, which should reduce stress put
    on the used HPC job manager (e.g. PBS). You might thus see up to 10-minute delays before the HQ
    allocation list will display updated information or before a new allocation will be submitted.
    We plan to rework the automatic allocator in future versions to allow more frequent updates while
    avoiding generating too many requests to the HPC job manager.

Fixes

  • HQ will no longer warn that stdout/stderr path does not contain the %{TASK_ID} placeholder
    when submitting array jobs if the placeholder is contained within the working directory path and
    stdout/stderr contains the %{CWD} placeholder.
  • The automatic allocator will query PBS allocation statuses less often. It will now ask for status
    of all allocations per allocation queue in a single qstat call, and it now also contains backoff
    that will slow down new allocations if there are submission errors. If too many submissions (10) or
    running allocations (3) fail in a succession, its corresponding allocation queue will be automatically
    removed.

v0.9.0-rc1

17 Feb 09:57
Compare
Choose a tag to compare
v0.9.0-rc1 Pre-release
Pre-release

HyperQueue 0.9.0-rc1

New features

Tasks

  • Task may be started with a temporary directory that is automatically deleted when the task is finished.
    (flag --task-dir)

CLI

  • You can now use the hq task list <job-selector> command to display a list of tasks across multiple jobs.
  • Add --filter flag to worker list to allow filtering workers by their status.

Changes

Automatic allocation

  • When adding a new allocation queue, HyperQueue will now try to immediately submit a job into the queue
    to quickly test whether the entered configuration is correct. If you want to avoid this behaviour, you
    can use the --no-dry-run flag for hq alloc add <pbs/slurm>.

Fixes

  • HQ will no longer warn that stdout/stderr path does not contain the %{TASK_ID} placeholder
    when submitting array jobs if the placeholder is contained within the working directory path and
    stdout/stderr contains the %{CWD} placeholder.
  • The automatic allocator will query PBS allocation statuses less often. It will now ask for status
    of all allocations per allocation queue in a single qstat call, and it now also contains backoff
    that will slow down new allocations if there are submission errors. If too many submissions (50) or
    allocations (10) fail in a succession, its corresponding allocation queue will be automatically removed.