Skip to content

Commit

Permalink
Introduce dev-doc for processes management
Browse files Browse the repository at this point in the history
ilab should be able to run detached processes, re-attach to them, and monitor them

Signed-off-by: Charlie Doern <[email protected]>
  • Loading branch information
cdoern committed Nov 22, 2024
1 parent 29c7497 commit fb61d35
Show file tree
Hide file tree
Showing 2 changed files with 71 additions and 0 deletions.
6 changes: 6 additions & 0 deletions .spellcheck-en-custom.txt
Original file line number Diff line number Diff line change
Expand Up @@ -28,6 +28,7 @@ Containerfile
cpp
cuBLAS
CUDA
ctrl
customizations
CVE
CVEs
Expand Down Expand Up @@ -121,6 +122,7 @@ Params
Pareja
PEFT
Pereira
PID
PlantUML
PLOS
pluggable
Expand Down Expand Up @@ -171,7 +173,9 @@ Standup
subcommand
subcommands
subdirectory
subprocess
Sudalairaj
systemd
Taj
tatsu
TBD
Expand All @@ -190,6 +194,8 @@ UI
ui
unquantized
USM
UUID
UUIDs
UX
venv
Vishnoi
Expand Down
65 changes: 65 additions & 0 deletions docs/cli/ilab-processes.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,65 @@
# Processes in InstructLab

The ability to detach from processes is crucial to the user experience of InstructLab. However, the concept of multi-processing, process management, and the monitoring of processes is very complex.

It is important to try and add this concept in as simply as possible, expanding on the state reporting, logging, and other features as we go along.

## Phased approach to InstructLab Processes

This document is going to describe phase 1 of implementing processes in InstructLab. Phase 1 is to be described as the "ilab simple process management system". This will depend purely on python packages, PID tracking, and log files to create the experience of detachable processes. The key here is the concept of the UUID, allowing a future REST API to keep track of InstructLab processes using these unique identifiers.

We can re-visit all this in phase 2, when we discuss if we want to utilize something like systemd or a more in-depth process-monitor repo to track processes.

### Phase 1

Phase one would focus on adding the concept of detaching from processes. This would be implemented in `ilab 0.23.0`.

Process management would only apply to `ilab data generate` and `ilab model train`.

The workflow would allow for:

`ilab data generate -dt` (run a detached generation process)
`ilab model train -dt` (run a detached training process)

`ilab process list`

```console=
+----------------------------------------+------------+-------------+-------------+---------+
| Proccess UUID | PID | Type | Duration | Status |
+----------------------------------------+--------------------------+-------------+---------+
| train-2423813fhthfbnaj | 33451 | training | 8 hr 32 min | Running |
| generation-6768769599sdbjklm | 35543 | generation | 3 hr 15 min | Running |
| generation-1345661hlsmma | 45424 | generation | 22 min | Errored |
+----------------------------------------+---------------------+------------------+---------+
```

`ilab process attach <UUID>`

This command would re-attach to the given process, allowing to user to view the live logs of the process. `attach` would trail the log file and listen for user-input to kill the process.

These commands will be done in a very simple way at first using the following architecture:

1. The backbone of this process management will be the python multiprocessing package, using `Process` to start the detached commands and `Manager` to maintain an in-code dictionary of the existing processes.
2. a detached process be re-attachable by tailing the log file and then allowing the user to ctrl+c the process as normal using `KeyboardInterrupt`
3. A `Manager.dict` process registry will be maintained for tracking UUIDs created via the `uuid` python package, the PID of the actual process, and a `log_file` where the process will be outputting its logs to so that the user can re-attach. this log file directory will be tracked using our `DEFAULTS` package and will be standard throughout releases.

The general flow would be:

1. a user runs `ilab data generate -dt`
2. a UUID, PID, and log file is added to the process registry.
3. the process would exit, and print the UUID of the sdg run
4. a user could attach to this process using `ilab process attach <UUID>`.
5. This command would look in the process registry for the PID and/or UUID, get the log file, tail the log file, and listen for a ctrl+c keyboard interrupt.

This allows us to detach from processes while still running them in the background and maintain log files all without the use of anything other than UUID and subprocess.

#### log file management

If existing log files from the various libraries exist, those will be used in this scenario. If they do not, InstructLab will manage writing process logs to disk. Regardless of whether the libraries maintain their own log file, InstructLab will need to co-locate the log files in a centralized directory.

If a log file exists, it will be copied and renamed into the following directory format:

`~/.local/share/instructlab/logs/<command_name>/<command_name>-<timestamp>.log`

If the log file does not exist, InstructLab will create one with this format. Libraries are responsible for standardizing where their logs are stored if they already exist so the CLI can access them in a uniform fashion and copy them to the proper directory.

0 comments on commit fb61d35

Please sign in to comment.