Skip to content

Commit

Permalink
Docs updated
Browse files Browse the repository at this point in the history
  • Loading branch information
spirali committed Nov 16, 2024
1 parent 896a26d commit d026836
Show file tree
Hide file tree
Showing 3 changed files with 114 additions and 62 deletions.
30 changes: 19 additions & 11 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,12 @@

### Changes

- `hq event-log` command renamed to `hq journal`
* `hq event-log` command renamed to `hq journal`

### New features

* Added `hq journal prune` for pruning journal file.
* Added `hq journal flush` for forcing server to flush the journal.

## v0.20.0

Expand All @@ -11,13 +16,16 @@
* It is now possible to dynamically submit new tasks into an existing job (we call this concept "Open jobs").
See [Open jobs documentation](https://it4innovations.github.io/hyperqueue/stable/jobs/openjobs/)

* Worker streaming. Before, you could stream task stderr/stdout to the server over the network using the `--log` parameter of `hq submit`.
This approach had various issues and was not scalable. Therefore, we have replaced this functionality with worker streaming,
* Worker streaming. Before, you could stream task stderr/stdout to the server over the network using the `--log`
parameter of `hq submit`.
This approach had various issues and was not scalable. Therefore, we have replaced this functionality with worker
streaming,
where the streaming of task output to a set of files on disk is performed by workers instead.
This new streaming approach creates more files than original solution (where it was always one file per job),
but the number of files stays small and independent on the number of executed tasks.
The new architecture also allows parallel I/O writing and storing of multiple job streams in one stream handle.
You can use worker streaming using the `--stream` parameter of `hq submit`. Check out the documentation for more information.
You can use worker streaming using the `--stream` parameter of `hq submit`. Check out the documentation for more
information.

* Optimization of journal size

Expand Down Expand Up @@ -607,7 +615,7 @@ would pass `OMP_NUM_THREADS=4` to the executed `<program>`.
is `hq submit`, which is now a shortcut for `hq job submit`. Here is a table of changed commands:

| **Previous command** | **New command** |
|----------------------|--------------------|
|----------------------|--------------------|
| `hq jobs` | `hq job list` |
| `hq job` | `hq job info` |
| `hq resubmit` | `hq job resubmit` |
Expand Down Expand Up @@ -723,18 +731,18 @@ would pass `OMP_NUM_THREADS=4` to the executed `<program>`.
* Generic resource management has been added. You can find out more in
the [documentation](https://it4innovations.github.io/hyperqueue/stable/jobs/gresources/).
* HyperQueue can now automatically detect how many Nvidia GPUs are present on a worker node.
* HyperQueue can now automatically detect how many Nvidia GPUs are present on a worker node.
* You can now submit a task array where each task will receive one element of a JSON array using
`hq submit --from-json`. You can find out more in
the [documentation](https://it4innovations.github.io/hyperqueue/stable/jobs/arrays/#json-array).
### Changes
* There have been a few slight CLI changes:
* `hq worker list` no longer has `--offline` and `--online` flags. It will now display only running
workers by default. If you want to show also offline workers, use the `--all` flag.
* `hq alloc add` no longer has a required `--queue/--partition` option. The PBS queue/Slurm partition
should now be passed as a trailing argument after `--`: `hq alloc add pbs -- -qqprod`.
* `hq worker list` no longer has `--offline` and `--online` flags. It will now display only running
workers by default. If you want to show also offline workers, use the `--all` flag.
* `hq alloc add` no longer has a required `--queue/--partition` option. The PBS queue/Slurm partition
should now be passed as a trailing argument after `--`: `hq alloc add pbs -- -qqprod`.
* Server subdirectories generated for each run of the HyperQueue server are now named with a numeric ID instead of
a date.
* The documentation has been [rewritten](https://it4innovations.github.io/hyperqueue).
Expand Down Expand Up @@ -805,4 +813,4 @@ would pass `OMP_NUM_THREADS=4` to the executed `<program>`.

* Job arrays
* Cpu management
* --stdout/--stderr configuration in submit
* --stdout/--stderr configuration in submit
20 changes: 17 additions & 3 deletions docs/deployment/server.md
Original file line number Diff line number Diff line change
Expand Up @@ -76,13 +76,15 @@ or using a terminal multiplexer like [tmux](https://en.wikipedia.org/wiki/Tmux).

## Resuming stopped/crashed server

The server supports resilience, which allows it to restore its state after it is stopped or if it crashes. To enable resilience, you can tell the server to log events into a *journal* file, using the `--journal` flag:
The server supports resilience, which allows it to restore its state after it is stopped or if it crashes. To enable
resilience, you can tell the server to log events into a *journal* file, using the `--journal` flag:

```bash
$ hq server start --journal /path/to/journal
```

If the server is stopped or it crashes, and you use the same command to start the server (using the same journal file path), it will continue from the last point:
If the server is stopped or it crashes, and you use the same command to start the server (using the same journal file
path), it will continue from the last point:

```bash
$ hq server start --journal /path/to/journal
Expand All @@ -99,6 +101,7 @@ have to be connected to the server after it restarts.
after resuming the server, the task will be not be computed after a server restart.

### Exporting journal events

If you'd like to programmatically analyze events that are stored in the journal file, you can
export them to JSON using the following command:

Expand All @@ -110,6 +113,7 @@ The events will be read from the provided journal and printed to `stdout` encode
event per line (this corresponds to line-delimited JSON, i.e. [NDJSON](http://ndjson.org/)).

You can also directly stream events in real-time from the server using the following command:

```bash
$ hq journal stream
```
Expand All @@ -119,6 +123,16 @@ $ hq journal stream
The JSON format of the journal events and their definition is currently unstable and can change
with a new HyperQueue version.

### Pruning journal

Command `hq journal prune` removes all completed jobs and disconnected workers from the journal file.

### Flushing journal

Command `hq journal flush` will force the server to flush the journal.
It is mainly for the testing purpose or if you are going to `hq journal export` on
a live journal (however, it is usually better to use `hq journal stream`).

## Stopping server

You can stop a running server with the following command:
Expand All @@ -127,4 +141,4 @@ You can stop a running server with the following command:
$ hq server stop
```

When a server is stopped, all running jobs and connected workers will be immediately stopped.
When a server is stopped, all running jobs and connected workers will be immediately stopped.
Loading

0 comments on commit d026836

Please sign in to comment.