Skip to content

Commit

Permalink
first version of new runner explanation
Browse files Browse the repository at this point in the history
  • Loading branch information
HannaMohr committed Nov 7, 2024
1 parent c8745ca commit 2418b41
Showing 1 changed file with 31 additions and 37 deletions.
68 changes: 31 additions & 37 deletions doc/intro.rst
Original file line number Diff line number Diff line change
Expand Up @@ -247,7 +247,7 @@ the reason for the contract enforced on the Optimizee constructor

Note that all the (non-exploring) paramters to the `Optimizer` is passed in to its constructor through a
:func:`~collections.namedtuple` to keep the paramters documented. For examples see :class:`.GeneticAlgorithmParameters`
or :class:`.SimulatedAnnealingParameters`
or :class:`.CrossEntropyParameters`

The :meth:`~l2l.optimizers.optimizer.Optimizer.post_process` function:
----------------------------------------------------------------------
Expand Down Expand Up @@ -295,43 +295,28 @@ logging and recording. See the source of :file:`bin/l2l-template.py` for more de

Execution setup
~~~~~~~~~~~~~~~
The L2L framework works with JUBE in order to deploy the execution of the different instances of the optimizee on
the available computational resources. This requires that the trajectory contains a parameter group called JUBE_params
The L2L framework works together with workers from a runner class to distribute the execution of the different instances of the optimizee on
the available computational resources. This requires that the trajectory contains a parameter group called runner_params
which contains details for the right execution of the program.

**Mandatory** steps to define the execution of the optimizees:
1. Add a parameter group to the :obj: traj called JUBE_params using its :meth: f_add_parameter_group.
2. Setup the execution command :attr: exec by using the trajectory :meth: f_add_parameter_to_group.
Add parameter to group receives three parameters, which in this case should be specified as:
group_name=JUBE_params, key="exec", val=<execution command string>
This <execution command string> will be used to launch individual optimizees. An example of a simple call without using MPI calls
is: "python " + os.path.join(paths.simulation_path, "run_files/run_optimizee.py"
3. Setup the ready and working paths :attr: exec by using the trajectory :meth: f_add_parameter_to_group.
Add parameter to group receives three parameters, which in this case should be specified as:
group_name=JUBE_params, key="paths", val=<path object>
<path object> should contain the root working path. An example of this path is:
paths = Paths(name, dict(run_num='test'), root_dir_path=<root_dir_path>, suffix="-example")

In order to launch simulations on a laptop or a local cluster without a scheduler, only the mandatory parameters must
be specified. These parameters are part of the template.

To launch the simulations on a cluster with a scheduler, the following optional parameters must be defined. They currently match
slurm but this can also be adjusted to other schedulers.
1. Name of the scheduler, :atr: "scheduler", e.g. "Slurm"
2. Command to submit jobs to the schedulers, :atr: "submit_cmd", e.g. "sbatch"
3. Template file for the particular scheduler, :atr: "job_file", e.g. "job.run"
4. Number of nodes to request for each run, :atr: "nodes", e.g. "1"
5. Requested time for the compute resources, :atr: "walltime", e.g. "00:01:00"
6. MPI Processes per node, :atr: "ppn", e.g. "1"
7. CPU cores per MPI process, :atr: "cpu_pp", e.g. "1"
8. Threads per process, :atr: "threads_pp", e.g. "1"
9. Type of emails to be sent from the scheduler, :atr: "mail_mode", e.g. "ALL"
10. Email to notify events from the scheduler, :atr: "mail_address", e.g. "[email protected]"
11. Error file for the job, :atr: "err_file", e.g. "stderr"
12. Output file for the job, :atr: "out_file", e.g. "stdout"
13. MPI Processes per job, :atr: "tasks_per_job", e.g. "1"

See the :file: 'l2l-template-scheduler.py' for a base file with all these parameters.
**Mandatory** steps to define the execution of the optimizees, if you do not want to use the default parameters:

1. Create a dictionary that contains the runner parameters:
* **srun**: This is the srun command that is called when running the program in parallel to execute an individual.
The default-parameter is an empty string, which means that the program is only executed locally.
* **exec**: This is the command to execute an individual. The default is set to python.
* **max_workers**: The maximum number of workers must be determined by the user,
depending on how many computer resources were requested and how many are required per individual.
For example: a total of 1 node with 100 cores was requested;
50 cores are required for an individual, therefore a maximum of 2 workers may be used.
The default is set to 32 workers.
* **work_path**: Specifies the path for the workspace. The results of the simulation,
the trajectories and the logs for the individual workers are stored here. The default-parameter is set to the root_dir_path
of the experiment.
* **path_obj**: Strores the path object.
2. Pass the dictionary to the experiment while calling **experiment.prepare_experiment(runner_params=params)**.

See the :file:`bin/l2l-template.py` for a base file with all these parameters.

Examples
********
Expand All @@ -348,12 +333,21 @@ Data postprocessing

Todo...

.. _checkpointing:

Checkpointing
*************

Currently, checkpointing is only available for genetic algorithms.
Here, a generation from a previous simulation can be read in and continued from there.
For an example look at :file:`bin/l2l-fun-ga-checkpoint.py`

.. _parallelization:

Parallelization
***************

We also support running different instances of the experiments on different cores and hosts using Jube.
We also support running different instances of the experiments on different cores.


.. _logging:
Expand Down

0 comments on commit 2418b41

Please sign in to comment.