-
Notifications
You must be signed in to change notification settings - Fork 2
Configuration file
A configuration file that defines the pipeline needs to be supplied to doepipeline at runtime. The config file is written in YAML. An example config file is available below.
The following key-value pairs are required in the config file:
Required. Specifies the factors and responses to investigate, as well as what design type to use in the optimization phase. Valid keys in design are:
-
type
: Required. Design to use. Currently, central composite faced (CCF) is available. -
factors
: Required. Specify one or more factors.-
<factor-name>
: Keys are names used for the factors, and will be used for substitutions into scripts. Values are specified below.
-
-
responses
: Required. Specify one or more responses.-
<response-name>
: Keys are names used for responses, values are specified below.
-
Specification of a factor. Valid keys specifying factors are:
-
type
Optional. Type will be quantitative if not specified. Valid values are:-
quantitative
: Default. Numeric factor which can take real values. -
ordinal
: Numeric factor constrained to integer values. -
categorical
: Categorical values. Must specify which values the factor can take (see 'values' below).
-
-
low_init
: Required for numeric factors. Low starting value. -
high_init
: Required for numeric factors. High starting value. -
max
: Required for numeric factors. Maximum (global) allowed value. -
min
: Required for numeric factors. Minimum (global) allowed value. -
values
: Required for categorical factors. List of values that the factor is constrained to. -
screening_levels
: Optional for numeric factors. Number of levels investigated during screening phase. Default is 5. Different factors may have different values forscreening levels
.
The low_init
and high_init
are only used when --skip-screening
flag is set in the doepipeline command. They set the initial design space for the optimization design (see figure 1 b in the publication), as this must be smaller than when using the GSD. The min
and max
define the global design space, which is spanned by the GSD design space in the screening phase, and which the designs in the optimzation phase must always keep within. However, we always recommend to run the screening step.
Specification of a response. Valid keys specifying responses are:
-
criterion
: Required for all responses. Valid values are:-
maximize
/minimize
: Response will be maximized or minimized respectively. -
target
: Reach target value (optimum is neither above nor below value).
-
-
target
: Required when there are multiple responses. -
low_limit
: Required when there are multiple responses, for responses with criteriontarget
ormaximize
, optional otherwise. Indicates lowest acceptable value. -
high_limit
: Required when there are multiple responses, for responses with criteriontarget
orminimize
, optional otherwise. Indicates highest acceptable value.
Required. Indicates the name of the file containing the results from each pipeline run. A results-file will be produced in the working-directory for each experiment. The results-file must contain the response values in the following format:
RESPONSE_1,VALUE
RESPONSE_2,VALUE
RESPONSE_N,VALUE
Required. Root directory which will contain the results from all iterations and experiments.
Required. Ordered list specifying the order of pipeline steps/jobs. Values are <job-name>
.
Specification of a pipeline step/job that will be run in the pipeline
. Optionally, factors will be substituted. For each step specified in pipeline
there must be a <job-name>
specification.
-
script
: Required. String containing the command that will be executed in this step. May use substitution (see below) to parameterize commands. The contents ofscript
is formatted (factors substituted) and output to a bash file that will be executed by doepipeline. -
factors
: Optional. Contains mapping of what factors that should be used in the current experiment. Valid keys are:-
<factor-name>
: One of the factors specified indesign
. Each factor must carry the following key-value pair to indicate that the value of the factor should be substituted into the script.-
substitute: true
: Factor will be substituted using templating.
-
-
-
slurm
: Optional. When specified, doepipeline will submit the script to the SLURM workload management system. Contains the SLURM specific parameters to be used in the submission of the script. The bash script will contain the specified SLURM paramters and be submitted withsbatch
. See the specification for 'MySecondJob' in the example configuration file below.
doepipeline uses a simple templating system for substituting factors and other values into scripts. Factors that should be substituted is wrapped in {% ... %}
.
Factors are substituted using their names specified in the design. For example, if the value of FactorA
should be passed as an argument to my_script.sh
, the script specified in the pipeline step should be written as my_script.sh {% FactorA %}
- doepipeline will then substitute the template with the value of the factor according to the experimental design.
design:
# The type of design to use.
type: ccf
factors:
# Specify each factor that you wish to use in the optimization.
# Change the names to your liking.
FactorA:
min: 0.0 # Global minimum value.
max: 40.0 # Global maximum value.
low_init: 0.0 # Initial low setting.
high_init: 20.0 # Initial high value.
type: quantitative # The type of factor.
FactorB:
min: 0
max: 10
low_init: 3
high_init: 7
type: ordinal
FactorC:
# For a categorical (qualitative) factor, specify the different categories like this:
values:
- 'stringent_filter'
- 'lenient_filter'
- 'no_filter'
type: categorical
responses:
# Specify each response that you wish to use in the optimization.
# Change the names to your liking.
ResponseA:
criterion: maximize # Maximize/minimize
# File where final results are written.
results_file: my_results.txt
# The working directory where all run files will be written.
working_directory: ~/my_work_directory
pipeline:
# Specifies order of pipeline steps/jobs. These names must match the job-names below.
- MyFirstJob
- MySecondJob
- MyThirdJob
MyFirstJob:
# The script that will be executed is specified here.
script: >
bash my_first_script.sh --parameter {% FactorA %}
factors:
# The factors must match the factors in the design section above.
FactorA:
substitute: true
MySecondJob:
# The script can be multi-line. First activate a conda environment, then execute my_program
script: >
activate my_conda_env && \
my_program \
--parameter1 {% FactorB %} \
--parameter2 {% FactorC}
factors:
FactorB:
substitute: true
FactorC:
substitute: true
# This pipeline step should be run through SLURM.
# Set the slurm parameters as you would otherwise do.
slurm:
A: <PROJECT>
c: 2
t: 00:40:00
o: second_job_slurm.out
e: second_job_slurm.error
MyThirdJob:
# Make sure to save the response to the results_file.
script: >
python make_output.py -o {% results_file %}