Skip to content

Commit

Permalink
Optimize Run + Design Except Feature + Various (#70)
Browse files Browse the repository at this point in the history
## Optimize config/dir creation
As part of the setup, before running experiments, we create all working directories and place the `config.json` in the folder. 
Until now, this relied on pure Ansible. However, for many jobs, this creation becomes a bottleneck. 
Now, there is a custom module that does this more efficiently.

## Optimize result fetching
We replaced the slow fetching of results with an additional custom module.
More efficiency is possible because now `tsp` or `slurm` do not need to process a completing job id one-by-one.
Instead, they can report that a list of job ids finished and can be downloaded.


## New Experiment Design Feature: `except_filters`
We added a basic implementation of #71 that allows filtering out certain combinations.

## New Super ETL Feature: `pipelines` filter 
We add the possibility when running  a super_etl via make to run only a subset of pipelines (see `pipelines="a b"`)


## ETL Extract Optimization
We only flatten results that require flattening, which speeds up the processing.


## ETL Step
We add a new ETL Loader that allows storing a data frame as a pickle.

## Option to save ETL results to Notion
We add a utility function that allows storing the results of a loader to a Notion page.



---------

Co-authored-by: Hidde L <[email protected]>
  • Loading branch information
nicolas-kuechler and hiddely authored Aug 11, 2023
1 parent 4676b5d commit 65f3ebc
Show file tree
Hide file tree
Showing 73 changed files with 707 additions and 648 deletions.
7 changes: 6 additions & 1 deletion Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -32,6 +32,10 @@ ifdef id
myid=--id $(id)
endif

ifdef pipelines
mypipelines=--pipelines $(pipelines)
endif

# on `make` and `make help` list all targets with information
help:
@echo 'Running Experiments'
Expand All @@ -48,6 +52,7 @@ help:
@echo ' make etl-design suite=<SUITE> id=<ID> - same as `make etl ...` but uses the pipeline from the suite design instead of results'
@echo ' make etl-all - run etl pipelines of all results'
@echo ' make etl-super config=<CONFIG> out=<PATH> - run the super etl to combine results of multiple suites (for <CONFIG> e.g., demo_plots)'
@echo ' make etl-super ... pipelines="<P1> <P2>" - run only a subset of pipelines in the super etl'
@echo 'Clean ETL'
@echo ' make etl-clean suite=<SUITE> id=<ID> - delete etl results from specific suite (can be regenerated with make etl ...)'
@echo ' make etl-clean-all - delete etl results from all suites (can be regenerated with make etl-all)'
Expand Down Expand Up @@ -170,7 +175,7 @@ etl-all: install
# e.g., make etl-super config=demo_plots out=/home/kuenico/dev/doe-suite/tmp
etl-super: install
@cd $(does_config_dir) && \
poetry run python $(PWD)/doespy/doespy/etl/super_etl.py --config $(config) --output_path $(out)
poetry run python $(PWD)/doespy/doespy/etl/super_etl.py --config $(config) --output_path $(out) $(mypipelines)

# delete etl results for a specific `suite` and `id` (can be regenerated with `make etl suite=<SUITE> id=<ID>`)
etl-clean: install
Expand Down
5 changes: 3 additions & 2 deletions ansible.cfg
Original file line number Diff line number Diff line change
Expand Up @@ -13,11 +13,12 @@ any_errors_fatal_setup = true
any_errors_fatal_experiments = false



# TODO [nku] can I control this cfg from the makefile? `make run suite=abc id=new` vs `make run-v suite=abc id=new` vs `make run-vvvv suite=abc id=new`
stdout_callback = community.general.selective


# speedup by using ssh pipelining
pipelining = True
[ssh_connection]
pipelining = True
ssh_args = -o ControlMaster=auto -o ControlPersist=1200
ssh_args = -o ControlMaster=auto -o ControlPersist=300s
7 changes: 4 additions & 3 deletions demo_project/demo_latency.py
Original file line number Diff line number Diff line change
Expand Up @@ -24,9 +24,10 @@ def main():

print("Measuring Latency...")
data = {}
data["latency"] = a * args.size + random.uniform(
-1, 1
) # latency depends linear on size + some noise for reps
noise = 0.93290707138428
#noise = random.uniform(-1, 1)
data["latency"] = a * args.size + noise # latency depends linear on size + some noise for reps


time.sleep(10) # wait 15 seconds

Expand Down
28 changes: 23 additions & 5 deletions demo_project/doe-suite-config/designs/example03-format.yml
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,8 @@
#
# The `cross` format uses the keyword `$FACTOR$` as a YAML key,
# while the `factor list` uses `$FACTOR$` as a YAML value and expects a corresponding level in the `factor_levels` list.
#
# The `except_filters` construct can be used to ignore specific combinations of configuration (e.g., the cross product between two factors except a specific combination should be skipped)

# experiment in the pure `cross` format
format_cross:
Expand All @@ -25,8 +27,16 @@ format_cross:
name:
$FACTOR$: [app1, app2, app3] # varied parameter between runs (factor)
# hyperparam: X -> not used in this experiment
except_filters:
# we ignore the combination of vector_size 40 with app2 and app3 and only run it with app1
- vector_size: 40
app:
name: app2
- vector_size: 40
app:
name: app3
#
# The experiment `format_cross` results in 12 runs:
# The experiment `format_cross` results in 10 runs:
# - {"vector_size": 10, "app.name": app1, "seed": 1234}
# - {"vector_size": 10, "app.name": app2, "seed": 1234}
# - {"vector_size": 10, "app.name": app3, "seed": 1234}
Expand All @@ -40,8 +50,8 @@ format_cross:
# - {"vector_size": 30, "app.name": app3, "seed": 1234}

# - {"vector_size": 40, "app.name": app1, "seed": 1234}
# - {"vector_size": 40, "app.name": app2, "seed": 1234}
# - {"vector_size": 40, "app.name": app3, "seed": 1234}
# - {"vector_size": 40, "app.name": app2, "seed": 1234} -> Ignored by except_filters
# - {"vector_size": 40, "app.name": app3, "seed": 1234} -> Ignored by except_filters


# experiment in the pure `level list` format
Expand Down Expand Up @@ -102,6 +112,14 @@ format_mixed:
- app:
name: app3
hyperparam: 5
except_filters:
# we ignore the combination of vector_size 40 with app2 and app3 and only run it with app1
- vector_size: 40
app:
name: app2
- vector_size: 40
app:
name: app3

# The mix between `cross`and `level-list` is the most flexible because it allows to define $FACTORS$
# for which we want to create the cross product (e.g., `vector_size`) and
Expand All @@ -125,8 +143,8 @@ format_mixed:
# - {"vector_size": 30, "app.name": app3, "app.hyperparam": 5 , "seed": 1234}

# - {"vector_size": 40, "app.name": app1, "app.hyperparam": 0.1, "seed": 1234}
# - {"vector_size": 40, "app.name": app2, "app.hyperparam": 10 , "seed": 1234}
# - {"vector_size": 40, "app.name": app3, "app.hyperparam": 5 , "seed": 1234}
# - {"vector_size": 40, "app.name": app2, "app.hyperparam": 10 , "seed": 1234} -> ignored by except_filters
# - {"vector_size": 40, "app.name": app3, "app.hyperparam": 5 , "seed": 1234} -> ignored by except_filters

$ETL$:
check_error: # ensures that stderr.log is empty everywhere and that no files are generated except stdout.log
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,7 @@ minimal:
- '!'
factor_levels:
- {}
except_filters: []
$ETL$:
check_error:
experiments:
Expand Down
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
,suite_name,suite_id,exp_name,run,host_type,host_idx,factor_columns,source_file,opt,out,payload_size_mb,$CMD$.small,latency_mean,latency_min,latency_max,latency_std,latency_count
0,example02-single,$expected,experiment_1,0,small,0,"['payload_size_mb', 'opt']",demo_latency_out.json,True,json,10,[{'main': '/cluster/home/kunicola/doe-suite/example_nku/example02-single/code/demo_project/.venv/bin/python /cluster/home/kunicola/doe-suite/example_nku/example02-single/code/demo_project/demo_latency.py --opt True --size 10 --out json'}],14.9329070714,14.9329070714,14.9329070714,0.0,2
1,example02-single,$expected,experiment_1,1,small,0,"['payload_size_mb', 'opt']",demo_latency_out.json,False,json,10,[{'main': '/cluster/home/kunicola/doe-suite/example_nku/example02-single/code/demo_project/.venv/bin/python /cluster/home/kunicola/doe-suite/example_nku/example02-single/code/demo_project/demo_latency.py --opt False --size 10 --out json'}],27.9329070714,27.9329070714,27.9329070714,0.0,2
2,example02-single,$expected,experiment_1,2,small,0,"['payload_size_mb', 'opt']",demo_latency_out.json,True,json,20,[{'main': '/cluster/home/kunicola/doe-suite/example_nku/example02-single/code/demo_project/.venv/bin/python /cluster/home/kunicola/doe-suite/example_nku/example02-single/code/demo_project/demo_latency.py --opt True --size 20 --out json'}],28.9329070714,28.9329070714,28.9329070714,0.0,2
3,example02-single,$expected,experiment_1,3,small,0,"['payload_size_mb', 'opt']",demo_latency_out.json,False,json,20,[{'main': '/cluster/home/kunicola/doe-suite/example_nku/example02-single/code/demo_project/.venv/bin/python /cluster/home/kunicola/doe-suite/example_nku/example02-single/code/demo_project/demo_latency.py --opt False --size 20 --out json'}],54.9329070714,54.9329070714,54.9329070714,0.0,2
4,example02-single,$expected,experiment_1,4,small,0,"['payload_size_mb', 'opt']",demo_latency_out.json,True,json,30,[{'main': '/cluster/home/kunicola/doe-suite/example_nku/example02-single/code/demo_project/.venv/bin/python /cluster/home/kunicola/doe-suite/example_nku/example02-single/code/demo_project/demo_latency.py --opt True --size 30 --out json'}],42.9329070714,42.9329070714,42.9329070714,0.0,2
5,example02-single,$expected,experiment_1,5,small,0,"['payload_size_mb', 'opt']",demo_latency_out.json,False,json,30,[{'main': '/cluster/home/kunicola/doe-suite/example_nku/example02-single/code/demo_project/.venv/bin/python /cluster/home/kunicola/doe-suite/example_nku/example02-single/code/demo_project/demo_latency.py --opt False --size 30 --out json'}],81.9329070714,81.9329070714,81.9329070714,0.0,2
0,example02-single,$expected,experiment_1,0,small,0,"['payload_size_mb', 'opt']",demo_latency_out.json,True,json,10,[{'main': '/cluster/home/kunicola/doe-suite/example_nku/example02-single/code/demo_project/.venv/bin/python /cluster/home/kunicola/doe-suite/example_nku/example02-single/code/demo_project/demo_latency.py --opt True --size 10 --out json'}],14.932907071384278,14.932907071384278,14.932907071384278,0.0,2
1,example02-single,$expected,experiment_1,1,small,0,"['payload_size_mb', 'opt']",demo_latency_out.json,False,json,10,[{'main': '/cluster/home/kunicola/doe-suite/example_nku/example02-single/code/demo_project/.venv/bin/python /cluster/home/kunicola/doe-suite/example_nku/example02-single/code/demo_project/demo_latency.py --opt False --size 10 --out json'}],27.932907071384278,27.932907071384278,27.932907071384278,0.0,2
2,example02-single,$expected,experiment_1,2,small,0,"['payload_size_mb', 'opt']",demo_latency_out.json,True,json,20,[{'main': '/cluster/home/kunicola/doe-suite/example_nku/example02-single/code/demo_project/.venv/bin/python /cluster/home/kunicola/doe-suite/example_nku/example02-single/code/demo_project/demo_latency.py --opt True --size 20 --out json'}],28.932907071384278,28.932907071384278,28.932907071384278,0.0,2
3,example02-single,$expected,experiment_1,3,small,0,"['payload_size_mb', 'opt']",demo_latency_out.json,False,json,20,[{'main': '/cluster/home/kunicola/doe-suite/example_nku/example02-single/code/demo_project/.venv/bin/python /cluster/home/kunicola/doe-suite/example_nku/example02-single/code/demo_project/demo_latency.py --opt False --size 20 --out json'}],54.93290707138428,54.93290707138428,54.93290707138428,0.0,2
4,example02-single,$expected,experiment_1,4,small,0,"['payload_size_mb', 'opt']",demo_latency_out.json,True,json,30,[{'main': '/cluster/home/kunicola/doe-suite/example_nku/example02-single/code/demo_project/.venv/bin/python /cluster/home/kunicola/doe-suite/example_nku/example02-single/code/demo_project/demo_latency.py --opt True --size 30 --out json'}],42.93290707138428,42.93290707138428,42.93290707138428,0.0,2
5,example02-single,$expected,experiment_1,5,small,0,"['payload_size_mb', 'opt']",demo_latency_out.json,False,json,30,[{'main': '/cluster/home/kunicola/doe-suite/example_nku/example02-single/code/demo_project/.venv/bin/python /cluster/home/kunicola/doe-suite/example_nku/example02-single/code/demo_project/demo_latency.py --opt False --size 30 --out json'}],81.93290707138428,81.93290707138428,81.93290707138428,0.0,2
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,7 @@ experiment_1:
- false
factor_levels:
- {}
except_filters: []
experiment_2:
n_repetitions: 3
common_roles: []
Expand All @@ -45,6 +46,7 @@ experiment_2:
other: '[0, 1]'
factor_levels:
- {}
except_filters: []
$ETL$:
pipeline1:
experiments:
Expand Down
Empty file.
Empty file.

This file was deleted.

Empty file.

This file was deleted.

This file was deleted.

Empty file.

This file was deleted.

Empty file.
Empty file.
Empty file.
Empty file.
Empty file.
Empty file.
Empty file.
Empty file.
Original file line number Diff line number Diff line change
@@ -1,8 +1,8 @@
---

exp_job_ids: [{'suite': 'example03-format', 'suite_id': '$expected', 'exp_name': 'format_cross', 'exp_run': 0, 'exp_run_rep': 0}, {'suite': 'example03-format', 'suite_id': '$expected', 'exp_name': 'format_cross', 'exp_run': 1, 'exp_run_rep': 0}, {'suite': 'example03-format', 'suite_id': '$expected', 'exp_name': 'format_cross', 'exp_run': 2, 'exp_run_rep': 0}, {'suite': 'example03-format', 'suite_id': '$expected', 'exp_name': 'format_cross', 'exp_run': 3, 'exp_run_rep': 0}, {'suite': 'example03-format', 'suite_id': '$expected', 'exp_name': 'format_cross', 'exp_run': 4, 'exp_run_rep': 0}, {'suite': 'example03-format', 'suite_id': '$expected', 'exp_name': 'format_cross', 'exp_run': 5, 'exp_run_rep': 0}, {'suite': 'example03-format', 'suite_id': '$expected', 'exp_name': 'format_cross', 'exp_run': 6, 'exp_run_rep': 0}, {'suite': 'example03-format', 'suite_id': '$expected', 'exp_name': 'format_cross', 'exp_run': 7, 'exp_run_rep': 0}, {'suite': 'example03-format', 'suite_id': '$expected', 'exp_name': 'format_cross', 'exp_run': 8, 'exp_run_rep': 0}, {'suite': 'example03-format', 'suite_id': '$expected', 'exp_name': 'format_cross', 'exp_run': 9, 'exp_run_rep': 0}, {'suite': 'example03-format', 'suite_id': '$expected', 'exp_name': 'format_cross', 'exp_run': 10, 'exp_run_rep': 0}, {'suite': 'example03-format', 'suite_id': '$expected', 'exp_name': 'format_cross', 'exp_run': 11, 'exp_run_rep': 0}]
exp_job_ids: [{'suite': 'example03-format', 'suite_id': '$expected', 'exp_name': 'format_cross', 'exp_run': 0, 'exp_run_rep': 0}, {'suite': 'example03-format', 'suite_id': '$expected', 'exp_name': 'format_cross', 'exp_run': 1, 'exp_run_rep': 0}, {'suite': 'example03-format', 'suite_id': '$expected', 'exp_name': 'format_cross', 'exp_run': 2, 'exp_run_rep': 0}, {'suite': 'example03-format', 'suite_id': '$expected', 'exp_name': 'format_cross', 'exp_run': 3, 'exp_run_rep': 0}, {'suite': 'example03-format', 'suite_id': '$expected', 'exp_name': 'format_cross', 'exp_run': 4, 'exp_run_rep': 0}, {'suite': 'example03-format', 'suite_id': '$expected', 'exp_name': 'format_cross', 'exp_run': 5, 'exp_run_rep': 0}, {'suite': 'example03-format', 'suite_id': '$expected', 'exp_name': 'format_cross', 'exp_run': 6, 'exp_run_rep': 0}, {'suite': 'example03-format', 'suite_id': '$expected', 'exp_name': 'format_cross', 'exp_run': 7, 'exp_run_rep': 0}, {'suite': 'example03-format', 'suite_id': '$expected', 'exp_name': 'format_cross', 'exp_run': 8, 'exp_run_rep': 0}, {'suite': 'example03-format', 'suite_id': '$expected', 'exp_name': 'format_cross', 'exp_run': 9, 'exp_run_rep': 0}]
exp_job_ids_unfinished: [] # pending + queued + running
exp_job_ids_pending: []
exp_job_ids_queued: []
exp_job_ids_running: []
exp_job_ids_finished: [{'suite': 'example03-format', 'suite_id': '$expected', 'exp_name': 'format_cross', 'exp_run': 0, 'exp_run_rep': 0}, {'suite': 'example03-format', 'suite_id': '$expected', 'exp_name': 'format_cross', 'exp_run': 1, 'exp_run_rep': 0}, {'suite': 'example03-format', 'suite_id': '$expected', 'exp_name': 'format_cross', 'exp_run': 2, 'exp_run_rep': 0}, {'suite': 'example03-format', 'suite_id': '$expected', 'exp_name': 'format_cross', 'exp_run': 3, 'exp_run_rep': 0}, {'suite': 'example03-format', 'suite_id': '$expected', 'exp_name': 'format_cross', 'exp_run': 4, 'exp_run_rep': 0}, {'suite': 'example03-format', 'suite_id': '$expected', 'exp_name': 'format_cross', 'exp_run': 5, 'exp_run_rep': 0}, {'suite': 'example03-format', 'suite_id': '$expected', 'exp_name': 'format_cross', 'exp_run': 6, 'exp_run_rep': 0}, {'suite': 'example03-format', 'suite_id': '$expected', 'exp_name': 'format_cross', 'exp_run': 7, 'exp_run_rep': 0}, {'suite': 'example03-format', 'suite_id': '$expected', 'exp_name': 'format_cross', 'exp_run': 8, 'exp_run_rep': 0}, {'suite': 'example03-format', 'suite_id': '$expected', 'exp_name': 'format_cross', 'exp_run': 9, 'exp_run_rep': 0}, {'suite': 'example03-format', 'suite_id': '$expected', 'exp_name': 'format_cross', 'exp_run': 10, 'exp_run_rep': 0}, {'suite': 'example03-format', 'suite_id': '$expected', 'exp_name': 'format_cross', 'exp_run': 11, 'exp_run_rep': 0}]
exp_job_ids_finished: [{'suite': 'example03-format', 'suite_id': '$expected', 'exp_name': 'format_cross', 'exp_run': 0, 'exp_run_rep': 0}, {'suite': 'example03-format', 'suite_id': '$expected', 'exp_name': 'format_cross', 'exp_run': 1, 'exp_run_rep': 0}, {'suite': 'example03-format', 'suite_id': '$expected', 'exp_name': 'format_cross', 'exp_run': 2, 'exp_run_rep': 0}, {'suite': 'example03-format', 'suite_id': '$expected', 'exp_name': 'format_cross', 'exp_run': 3, 'exp_run_rep': 0}, {'suite': 'example03-format', 'suite_id': '$expected', 'exp_name': 'format_cross', 'exp_run': 4, 'exp_run_rep': 0}, {'suite': 'example03-format', 'suite_id': '$expected', 'exp_name': 'format_cross', 'exp_run': 5, 'exp_run_rep': 0}, {'suite': 'example03-format', 'suite_id': '$expected', 'exp_name': 'format_cross', 'exp_run': 6, 'exp_run_rep': 0}, {'suite': 'example03-format', 'suite_id': '$expected', 'exp_name': 'format_cross', 'exp_run': 7, 'exp_run_rep': 0}, {'suite': 'example03-format', 'suite_id': '$expected', 'exp_name': 'format_cross', 'exp_run': 8, 'exp_run_rep': 0}, {'suite': 'example03-format', 'suite_id': '$expected', 'exp_name': 'format_cross', 'exp_run': 9, 'exp_run_rep': 0}]
Empty file.
Empty file.
Empty file.
Empty file.
Empty file.

This file was deleted.

Empty file.

This file was deleted.

This file was deleted.

Empty file.

This file was deleted.

Empty file.
Empty file.
Empty file.
Empty file.
Empty file.
Original file line number Diff line number Diff line change
Expand Up @@ -2,14 +2,14 @@
"$CMD$": {
"small": [
{
"main": "echo \"run app=app2 with hyperparam=10 vec=40 seed=1234\""
"main": "echo \"run app=app3 with hyperparam=5 vec=10 seed=1234\""
}
]
},
"app": {
"hyperparam": 10,
"name": "app2"
"hyperparam": 5,
"name": "app3"
},
"seed": 1234,
"vector_size": 40
"vector_size": 10
}
Original file line number Diff line number Diff line change
@@ -1 +1 @@
run app=app2 with hyperparam=10 vec=40 seed=1234
run app=app3 with hyperparam=5 vec=10 seed=1234
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
"$CMD$": {
"small": [
{
"main": "echo \"run app=app3 with hyperparam=5 vec=10 seed=1234\""
"main": "echo \"run app=app3 with hyperparam=5 vec=20 seed=1234\""
}
]
},
Expand All @@ -11,5 +11,5 @@
"name": "app3"
},
"seed": 1234,
"vector_size": 10
"vector_size": 20
}
Original file line number Diff line number Diff line change
@@ -1 +1 @@
run app=app3 with hyperparam=5 vec=10 seed=1234
run app=app3 with hyperparam=5 vec=20 seed=1234
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
"$CMD$": {
"small": [
{
"main": "echo \"run app=app3 with hyperparam=5 vec=20 seed=1234\""
"main": "echo \"run app=app3 with hyperparam=5 vec=30 seed=1234\""
}
]
},
Expand All @@ -11,5 +11,5 @@
"name": "app3"
},
"seed": 1234,
"vector_size": 20
"vector_size": 30
}
Original file line number Diff line number Diff line change
@@ -1 +1 @@
run app=app3 with hyperparam=5 vec=20 seed=1234
run app=app3 with hyperparam=5 vec=30 seed=1234
Loading

0 comments on commit 65f3ebc

Please sign in to comment.