Skip to content

Commit

Permalink
add manual training with entrypoint instruction
Browse files Browse the repository at this point in the history
Signed-off-by: Sunyanan Choochotkaew <[email protected]>
  • Loading branch information
sunya-ch committed Jul 24, 2024
1 parent 4bdcc9c commit d76a9dc
Show file tree
Hide file tree
Showing 2 changed files with 88 additions and 0 deletions.
6 changes: 6 additions & 0 deletions model_training/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -38,6 +38,7 @@ Please confirm the following requirements:
## 2. Run benchmark and collect metrics
### With benchmark automation and pipeline
There are two options to run the benchmark and collect the metrics, [CPE-operator](https://github.com/IBM/cpe-operator) with manual script and [Tekton Pipeline](https://github.com/tektoncd/pipeline).
> The adoption of the CPE operator is slated for deprecation. We are on transitioning to the automation of collection and training processes through the Tekton pipeline. Nevertheless, the CPE operator might still be considered for usage in customized benchmarks requiring performance values per sub-workload within the benchmark suite.
Expand All @@ -46,6 +47,11 @@ There are two options to run the benchmark and collect the metrics, [CPE-operato
### [CPE Operator Instruction](./cpe_script_instruction.md)
### With manual execution
In addition to the above two automation approach, you can manually run your own benchmarks, then collect, train, and export the models by the entrypoint `cmd/main.py`
### [Manual Metric Collection and Training with Entrypoint](./cmd_instruction.md)
## Clean up
### For kind-for-training cluster
Expand Down
82 changes: 82 additions & 0 deletions model_training/cmd_instruction.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,82 @@
# Manual Metric Collection and Training with Entrypoint

## 1. Collect metrics
Without benchmark/pipeline automation, kepler metrics can be collected by `query` function by either one of the following options.
### 1.1. by defining start time and end time

```bash
# value setting
BENCHMARK= # name of the benchmark (will generate [BENCHMARK].json to save start and end time for reference)
PROM_URL= # e.g., http://localhost:9090
START_TIME= # format date +%Y-%m-%dT%H:%M:%SZ
END_TIME= # format date +%Y-%m-%dT%H:%M:%SZ
COLLECT_ID= # any unique id e.g., machine name

# query execution
DATAPATH=/path/to/workspace python cmd/main.py query --benchmark $BENCHMARK --server $PROM_URL --output kepler_query --start-time $START_TIME --end-time $END_TIME --id $COLLECT_ID
```

### 1.2. by defining last interval from the execution time

```bash
# value setting
BENCHMARK= # name of the benchmark (will generate [BENCHMARK].json to save start and end time for reference)
PROM_URL= # e.g., http://localhost:9090
INTERVAL= # in second
COLLECT_ID= # any unique id e.g., machine name

# query execution
DATAPATH=/path/to/workspace python cmd/main.py query --benchmark $BENCHMARK --server $PROM_URL --output kepler_query --interval $INTERVAL --id $COLLECT_ID
```

### Output:
There will three files created in the `/path/to/workspace`, those are:
- `kepler_query.json`: raw prometheus query response
- `<COLLECT_ID>.json`: machine system features (spec)
- `<BENCHMARK>.json`: an item contains startTimeUTC and endTimeUTC

## 2. Train models

```bash
# value setting
PIPELINE_NAME= # any unique name for the pipeline (one pipeline can be accumulated by multiple COLLECT_ID)

# train execution
# require COLLECT_ID from collect step
DATAPATH=/path/to/workspace MODEL_PATH=/path/to/workspace python cmd/main.py train --pipeline-name $PIPELINE_NAME --input kepler_query --id $COLLECT_ID
```

## 3. Export models
Export function is to archive the model that has an error less than threshold from the trained pipeline and make a report in the format that is ready to push to kepler-model-db.

### 3.1. exporting the trained pipeline with BENCHMARK

The benchmark file is created by CPE operator or by step 1.1. or 1.2..

```bash
# value setting
EXPORT_PATH= # /path/to/kepler-model-db/models
PUBLISHER= # github account of publisher

# export execution
# require BENCHMARK from collect step
# require PIPELINE_NAME from train step
DATAPATH=/path/to/workspace MODEL_PATH=/path/to/workspace python cmd/main.py export --benchmark $BENCHMARK --pipeline-name $PIPELINE_NAME -o $EXPORT_PATH --publisher $PUBLISHER --zip=true
```

### 3.2. exporting the trained models without BENCHMARK

If the data is collected by tekton, there is no benchmark file created. Need to manually set `--collect-date` instead of `--benchmark` parameter.

```bash
# value setting
EXPORT_PATH= # /path/to/kepler-model-db/models
PUBLISHER= # github account of publisher
COLLECT_DATE= # collect date

# export execution
# require BENCHMARK from collect step
# require PIPELINE_NAME from train step
DATAPATH=/path/to/workspace MODEL_PATH=/path/to/workspace python cmd/main.py export --pipeline-name $PIPELINE_NAME -o $EXPORT_PATH --publisher $PUBLISHER --zip=true --collect-date $COLLECT_DATE
```

0 comments on commit d76a9dc

Please sign in to comment.