01 Nov 10:12

github-actions

e5ea66f

v0.17.0

HyperQueue 0.17.0

Breaking change

Memory resource in megabytes

Automatically detected resource "mem" that is the size of RAM of a worker is now using megabytes as a unit.
i.e. --resource mem=100 asks now for 100 MiB (previously 100 bytes).

New features

Non-integer resource requests

You may now ask of non-integer amount of a resource. e.g. for 0.5 of GPU.
This enables resource sharing on the logical level of HyperQueue scheduler and allows to utilize remaining part the resource
by another tasks.

Job submission

You can now specify cleanup modes when passing stdout/stderr paths to tasks. Cleanup mode decides what should
happen with the file once the task has finished executing. Currently, a single cleanup mode is implemented, which removes
the file if the task has finished successfully:

$ hq submit --stdout=out.txt:rm-if-finished /my-program

Fixes

Fixed crash when task fails during its initialization

Artifact summary:

hq-v0.17.0-*: Main HyperQueue build containing the hq binary. Download this archive to
use HyperQueue from the command line.
hyperqueue-0.17.0-*: Wheel containing the hyperqueue package with HyperQueue Python
bindings.

Assets 6

25 Oct 11:52

github-actions

v0.17.0-rc1

e5ea66f

v0.17.0-rc1 Pre-release

Pre-release

HyperQueue 0.17.0-rc1

Breaking change

Memory resource in megabytes

Automatically detected resource "mem" that is the size of RAM of a worker is now using megabytes as a unit.
i.e. --resource mem=100 asks now for 100 MiB (previously 100 bytes).

New features

Non-integer resource requests

You may now ask of non-integer amount of a resource. e.g. for 0.5 of GPU.
This enables resource sharing on the logical level of HyperQueue scheduler and allows to utilize remaining part the resource
by another tasks.

Job submission

You can now specify cleanup modes when passing stdout/stderr paths to tasks. Cleanup mode decides what should
happen with the file once the task has finished executing. Currently, a single cleanup mode is implemented, which removes
the file if the task has finished successfully:

$ hq submit --stdout=out.txt:rm-if-finished /my-program

Fixes

Fixed crash when task fails during its initialization

Artifact summary:

hq-v0.17.0-rc1-*: Main HyperQueue build containing the hq binary. Download this archive to
use HyperQueue from the command line.
hyperqueue-0.17.0-rc1-*: Wheel containing the hyperqueue package with HyperQueue Python
bindings.

Assets 6

12 Jul 19:30

github-actions

v0.16.0

56fd6ca

v0.16.0

HyperQueue 0.16.0

New features

Pregenerating access files

Via command hq server generate-access you can precreate an access file that can be later used for staring server,
and connecting workers, and clients. This is usefull in cloud environments.

Job submission

A new command hq job forget <job-selector> has been introduced. It can be used to completely forget a job, and thus
reduce the memory usage of the HQ server. It is useful especially if you submit a large amount of jobs and keep the
server running for a long time.

Automatic allocation

Autoalloc can now execute a custom shell command/script on each worker node before the worker starts and after the
worker stops. You can use this feature e.g. to initialize some data or load software modules for each worker node.
```
$ hq alloc add pbs --time-limit 30m \
  --worker-start-cmd "/project/xxx/init-node.sh" \
  --worker-stop-cmd "/project/xxx/cleanup-node.sh"
```
You can now set a time limit for workers spawned in allocations with the --worker-time-limit flag. You can use this
command to make workers stop sooner, so that you e.g. give more headroom for a --worker-stop-cmd command to execute
before the allocation is terminated. If you do not use this parameter, worker time limit will be set to the time limit
of the allocation.

Example:
```
$ hq alloc add pbs --time-limit 1h --worker-time-limit 58m --worker-stop-cmd "/project/xxxx/slow-command.sh"
```
In this case, the allocation will run for one hour, but the HQ worker will be stopped after 58 minutes (unless it is
stopped sooner because of idle timeout). The worker stop command will thus have at least two minutes to execute.

Changes

Access file

The format of the access file is changed. It is mostly internal change but you can experience parsing error when connecting
an old client/worker to a new server (Connecting a new client/worker to an old server will given you a proper message).

Artifact summary:

hq-v0.16.0-*: Main HyperQueue build containing the hq binary. Download this archive to
use HyperQueue from the command line.
hyperqueue-0.16.0-*: Wheel containing the hyperqueue package with HyperQueue Python
bindings.

Assets 6

08 Jul 18:12

github-actions

v0.16.0-rc1

9ca0d3f

v0.16.0-rc1 Pre-release

Pre-release

HyperQueue 0.16.0-rc1

New features

Pregenerating access files

Via command hq server generate-access you can precreate an access file that can be later used for staring server,
and connecting workers, and clients. This is usefull in cloud environments.

Job submission

A new command hq job forget <job-selector> has been introduced. It can be used to completely forget a job, and thus
reduce the memory usage of the HQ server. It is useful especially if you submit a large amount of jobs and keep the
server running for a long time.

Automatic allocation

Autoalloc can now execute a custom shell command/script on each worker node before the worker starts and after the
worker stops. You can use this feature e.g. to initialize some data or load software modules for each worker node.
```
$ hq alloc add pbs --time-limit 30m \
  --worker-start-cmd "/project/xxx/init-node.sh" \
  --worker-stop-cmd "/project/xxx/cleanup-node.sh"
```
You can now set a time limit for workers spawned in allocations with the --worker-time-limit flag. You can use this
command to make workers stop sooner, so that you e.g. give more headroom for a --worker-stop-cmd command to execute
before the allocation is terminated. If you do not use this parameter, worker time limit will be set to the time limit
of the allocation.

Example:
```
$ hq alloc add pbs --time-limit 1h --worker-time-limit 58m --worker-stop-cmd "/project/xxxx/slow-command.sh"
```
In this case, the allocation will run for one hour, but the HQ worker will be stopped after 58 minutes (unless it is
stopped sooner because of idle timeout). The worker stop command will thus have at least two minutes to execute.

Changes

Access file

Artifact summary:

hq-v0.16.0-rc1-*: Main HyperQueue build containing the hq binary. Download this archive to
use HyperQueue from the command line.
hyperqueue-0.16.0-rc1-*: Wheel containing the hyperqueue package with HyperQueue Python
bindings.

Assets 6

17 Apr 19:54

github-actions

v0.15.0

1268e7c

v0.15.0

HyperQueue 0.15.0

Breaking changes

NVIDIA GPUs are now automatically detected under the resource name gpus/nvidia, instead of
just gpus! If you have been using the gpus resource name, you should update your scripts.
See more details below.

New features

Resource management

You can now specify more resources for one task, e.g.: 1 cpu and 1 gpu OR 4 cpus. The scheduler considers both configurations in task planning.
For example let us assume that we have many tasks with the mentioned configuration and worker with 16 cpus and 4 gpus.
The tasks will fully utilize the node, 4 tasks will run in the configuration with gpu and 3 tasks will run in the cpu only mode.
Job Definition File is a TOML file that can define a job.
It allows to submit complex jobs without using Python API (dependencies, resource variants, ...).
```
$ hq job submit-file myfile.toml
```
You can now specify (indexed) resource values provided by workers as strings (previously only
integers were allowed). Notably, automatic detection of Nvidia GPUs specified with string UUIDs
now works.
```
$ hq worker start --resource="res1=[foo, bar]"
```
HyperQueue now provides built-in support for AMD GPUs. For this reason, the default name of GPU
resources that are automatically detected on a worker has been changed from gpus to gpus/nvidia
for NVIDIA GPUs. AMD GPUs are now autodetected as gpus/amd. In the future, we intend to create a way
to ask for any GPU resource (e.g. --resource=gpus=2), regardless of its type.
AMD GPUs are now automatically detected in workers from the environment variable ROCR_VISIBLE_DEVICES.
Allowed characters for resource names has been changed. The name now has to begin with an ASCII letter,
and it can only contain ASCII letters, ASCII digits and the slash (/) symbol. This restriction is
introduced for better alignment with shells, which typically do not support complicated variable names.
HQ passes the resource names to executed tasks through environment variables, so it has to take this
into account. Note that the / symbol in resource name will be normalized to _ when being passed
to a task.
hq task info now shows more information

Changes

Job submission

The default path for stdout and stderr files has been changed from %{SUBMIT_DIR}/job-%{JOB_ID}/%{TASK_ID}.[stdout/stderr]
to %{CWD}/job-%{JOB_ID}/%{TASK_ID}.[stdout/stderr]. Note that the default value for the working
directory (%{CWD}) is set to the submission directory, so if you have used the defaults before,
nothing will change for you. Stdout and stderr paths are now also resolved relative to the working
directory of the given task, not to the submit directory.

Artifact summary:

hq-v0.15.0-*: Main HyperQueue build containing the hq binary. Download this archive to
use HyperQueue from the command line.
hyperqueue-0.15.0-*: Wheel containing the hyperqueue package with HyperQueue Python
bindings.

Assets 6

12 Apr 20:14

github-actions

v0.15.0-rc1

377e01d

v0.15.0-rc1 Pre-release

Pre-release

HyperQueue 0.15.0-rc1

Breaking changes

NVIDIA GPUs are now automatically detected under the resource name gpus/nvidia, instead of
just gpus! If you have been using the gpus resource name, you should update your scripts.
See more details below.

New features

Resource management

You can now specify more resources for one task, e.g.: 1 cpu and 1 gpu OR 4 cpus. The scheduler considers both configurations in task planning.
For example let us assume that we have many tasks with the mentioned configuration and worker with 16 cpus and 4 gpus.
The tasks will fully utilize the node, 4 tasks will run in the configuration with gpu and 3 tasks will run in the cpu only mode.
Job Definition File is a TOML file that can define a job.
It allows to submit complex jobs without using Python API (dependencies, resource variants, ...).
```
$ hq job submit-file myfile.toml
```
You can now specify (indexed) resource values provided by workers as strings (previously only
integers were allowed). Notably, automatic detection of Nvidia GPUs specified with string UUIDs
now works.
```
$ hq worker start --resource="res1=[foo, bar]"
```
HyperQueue now provides built-in support for AMD GPUs. For this reason, the default name of GPU
resources that are automatically detected on a worker has been changed from gpus to gpus/nvidia
for NVIDIA GPUs. AMD GPUs are now autodetected as gpus/amd. In the future, we intend to create a way
to ask for any GPU resource (e.g. --resource=gpus=2), regardless of its type.
AMD GPUs are now automatically detected in workers from the environment variable ROCR_VISIBLE_DEVICES.
Allowed characters for resource names has been changed. The name now has to begin with an ASCII letter,
and it can only contain ASCII letters, ASCII digits and the slash (/) symbol. This restriction is
introduced for better alignment with shells, which typically do not support complicated variable names.
HQ passes the resource names to executed tasks through environment variables, so it has to take this
into account. Note that the / symbol in resource name will be normalized to _ when being passed
to a task.
hq task info now shows more information

Changes

Job submission

The default path for stdout and stderr files has been changed from %{SUBMIT_DIR}/job-%{JOB_ID}/%{TASK_ID}.[stdout/stderr]
to %{CWD}/job-%{JOB_ID}/%{TASK_ID}.[stdout/stderr]. Note that the default value for the working
directory (%{CWD}) is set to the submission directory, so if you have used the defaults before,
nothing will change for you. Stdout and stderr paths are now also resolved relative to the working
directory of the given task, not to the submit directory.

Artifact summary:

hq-v0.15.0-rc1-*: Main HyperQueue build containing the hq binary. Download this archive to
use HyperQueue from the command line.
hyperqueue-0.15.0-rc1-*: Wheel containing the hyperqueue package with HyperQueue Python
bindings.

Assets 6

02 Feb 10:28

github-actions

v0.14.0

06b23f8

v0.14.0

HyperQueue 0.14.0

New features

CLI

#545 Add a new command hq job summary,
which displays the amount of jobs per each job state.

Platforms

HQ can be now compiled for Raspbery Pi

Fixes

Worker

#539 Fix connection of worker to server
in the presence of both IPv4 and IPv6 addresses.

Job submission

#540 Parse all arguments from shebang
in a directives file (e.g. #!/bin/bash -l).

Streaming

Fixed a bug in closing streaming when tasks are very short and sychronized.

Artifact summary:

hq-v0.14.0-*: Main HyperQueue build containing the hq binary. Download this archive to
use HyperQueue from the command line.
hyperqueue-0.14.0-*: Wheel containing the hyperqueue package with HyperQueue Python
bindings.

Assets 6

29 Jan 18:43

github-actions

v0.14.0-rc1

06b23f8

v0.14.0-rc1 Pre-release

Pre-release

HyperQueue 0.14.0-rc1

New features

CLI

#545 Add a new command hq job summary,
which displays the amount of jobs per each job state.

Platforms

HQ can be now compiled for Raspbery Pi

Fixes

Worker

#539 Fix connection of worker to server
in the presence of both IPv4 and IPv6 addresses.

Job submission

#540 Parse all arguments from shebang
in a directives file (e.g. #!/bin/bash -l).

Streaming

Fixed a bug in closing streaming when tasks are very short and sychronized.

Artifact summary:

hq-v0.14.0-rc1-*: Main HyperQueue build containing the hq binary. Download this archive to
use HyperQueue from the command line.
hyperqueue-0.14.0-rc1-*: Wheel containing the hyperqueue package with HyperQueue Python
bindings.

Assets 6

04 Nov 09:59

github-actions

v0.13.0

9259ee5

v0.13.0

HyperQueue 0.13.0

New features

Resource management

Almost complete rewrite of resource management.
CPU and other resources were unified: the most visible change is that you can define "cpus" and other resource;
and other resources can now be defined in groups (NUMA-like resources).
Many improvements in scheduler: Improved schedules for multi-resource requests;
better behavior on non-heterogeneous clusters;
better interaction between resources and priorities.

Automatic allocation

#467 You can now pause (and resume)
autoalloc queues using hq alloc pause and hq alloc resume.
Paused queues will not submit new allocations into the selected job manager. They can be later resumed.
When an autoalloc queue hits too many submission or worker execution errors, it will now be paused
instead of removed.

Tasks

HQ allows to limit how many times a task may be in a running state while worker is lost
(such a task may be a potential source of worker's crash).
If the limit is reached, the task is marked as failed.
The limit can be configured by --crash-limit in submit.
Groups of workers are introduced. A multi-node task is now started only on workers from the same group.
By default, workers are grouped by PBS/Slurm allocations, but it can be configured manually.

Changes

Resource management

--cpus=no-ht is now changed to a flag --no-hyper-threading.
Explicit list definition of a resource was changed from --resource xxx=list(1,2,3) to --resource xxx=[1,2,3].
(this is the result of unification of CPUs with other resources).
Python API: Attribute generic in ResourceRequest is renamed to resources

Tasks

#461 When a task is cancelled, times out
or its worker is killed, HyperQueue now tries to make sure that both the tasks and any processes that
it has spawned will be also terminated.
#480 You can now select multiple tasks in hq task info.

Artifact summary:

hq-v0.13.0-*: Main HyperQueue build containing the hq binary. Download this archive to
use HyperQueue from the command line.
hyperqueue-0.13.0-*: Wheel containing the hyperqueue package with HyperQueue Python
bindings.

Assets 5

02 Nov 15:18

github-actions

v0.13.0-rc1

9259ee5

v0.13.0-rc1 Pre-release

Pre-release

HyperQueue 0.13.0-rc1

New features

Resource management

Almost complete rewrite of resource management.
CPU and other resources were unified: the most visible change is that you can define "cpus" and other resource;
and other resources can now be defined in groups (NUMA-like resources).
Many improvements in scheduler: Improved schedules for multi-resource requests;
better behavior on non-heterogeneous clusters;
better interaction between resources and priorities.

Automatic allocation

#467 You can now pause (and resume)
autoalloc queues using hq alloc pause and hq alloc resume.
Paused queues will not submit new allocations into the selected job manager. They can be later resumed.
When an autoalloc queue hits too many submission or worker execution errors, it will now be paused
instead of removed.

Tasks

HQ allows to limit how many times a task may be in a running state while worker is lost
(such a task may be a potential source of worker's crash).
If the limit is reached, the task is marked as failed.
The limit can be configured by --crash-limit in submit.
Groups of workers are introduced. A multi-node task is now started only on workers from the same group.
By default, workers are grouped by PBS/Slurm allocations, but it can be configured manually.

Changes

Resource management

--cpus=no-ht is now changed to a flag --no-hyper-threading.
Explicit list definition of a resource was changed from --resource xxx=list(1,2,3) to --resource xxx=[1,2,3].
(this is the result of unification of CPUs with other resources).
Python API: Attribute generic in ResourceRequest is renamed to resources

Tasks

#461 When a task is cancelled, times out
or its worker is killed, HyperQueue now tries to make sure that both the tasks and any processes that
it has spawned will be also terminated.
#480 You can now select multiple tasks in hq task info.

Artifact summary:

hq-v0.13.0-rc1-*: Main HyperQueue build containing the hq binary. Download this archive to
use HyperQueue from the command line.
hyperqueue-0.13.0-rc1-*: Wheel containing the hyperqueue package with HyperQueue Python
bindings.

Assets 5

Releases: It4innovations/hyperqueue

v0.17.0

HyperQueue 0.17.0

Breaking change

Memory resource in megabytes

New features

Non-integer resource requests

Job submission

Fixes

Artifact summary:

v0.17.0-rc1

HyperQueue 0.17.0-rc1

Breaking change

Memory resource in megabytes

New features

Non-integer resource requests

Job submission

Fixes

Artifact summary:

v0.16.0

HyperQueue 0.16.0

New features

Pregenerating access files

Job submission

Automatic allocation

Changes

Access file

Artifact summary:

v0.16.0-rc1

HyperQueue 0.16.0-rc1

New features

Pregenerating access files

Job submission

Automatic allocation

Changes

Access file

Artifact summary:

v0.15.0

HyperQueue 0.15.0

Breaking changes

New features

Resource management

Changes

Job submission

Artifact summary:

v0.15.0-rc1

HyperQueue 0.15.0-rc1

Breaking changes

New features

Resource management

Changes

Job submission

Artifact summary:

v0.14.0

HyperQueue 0.14.0

New features

CLI

Platforms

Fixes

Worker

Job submission

Streaming

Artifact summary:

v0.14.0-rc1

HyperQueue 0.14.0-rc1

New features

CLI

Platforms

Fixes

Worker

Job submission

Streaming

Artifact summary:

v0.13.0

HyperQueue 0.13.0

New features

Resource management

Automatic allocation

Tasks

Changes