Skip to content

How to set up a case and customize the PE layout

Robert Jacob edited this page Jan 30, 2017 · 1 revision

Calling case.setup

After creating a case using create_newcase, you need to call the case.setup command from $CASEROOT. To see the options to case.setup use the --help option. Calling case.setup creates the following additional files and directories in $CASEROOT: (**TODO: which files are modifiable below?)

case.setup -clean removes $CASEROOT/$CASE.run and must be run if modifications are made to env_mach_pes.xml. case.setup must then be rerun before you can build and run the model. If env_mach_pes.xml variables need to be changed after case.setup has been called, then case.setup -clean must be run first, followed by case.setup.

(Also see the Section called *BASICS: What are the directories and files in my case directory?* in Chapter 6.)

Changing the PE layout

The file, env_mach_pes.xml, determines the number of processors and OpenMP threads for each component, the number of instances of each component and the layout of the components across the hardware processors. Optimizing the throughput and efficiency of a CIME experiment often involves customizing the processor (PE) layout for load balancing. CIME provides significant flexibility with respect to the layout of components across different hardware processors. In general, the CIME components -- atm, lnd, ocn, ice, glc, rof, wav, and cpl -- can run on overlapping or mutually unique processors. Whereas Each component is associated with a unique MPI communicator, the CIME driver runs on the union of all processors and controls the sequencing and hardware partitioning. The component processor layout is via three settings: the number of MPI tasks, the number of OpenMP threads per task, and the root MPI processor number from the global set.

The entries in env_mach_pes.xml have the following meanings:

Table 2-3. env_mach_pes.xml entries
XML entry Description
NTASKS the total number of MPI tasks, a negative value indicates nodes rather than tasks.
NTHRDS the number of OpenMP threads per MPI task.
ROOTPE the global mpi task of the component root task, if negative, indicates nodes rather than tasks.
PSTRID the stride of MPI tasks across the global set of pes (for now set to 1)
NINST the number of component instances (will be spread evenly across NTASKS)

For example, if a component has NTASKS=16, NTHRDS=4 and ROOTPE=32, then it will run on 64 hardware processors using 16 MPI tasks and 4 threads per task starting at global MPI task 32. Each CIME component has corresponding entries for NTASKS, NTHRDS, ROOTPE and NINST in env_mach_pes.xml. There are some important things to note.

  • NTASKS must be greater or equal to 1 (one) even for inactive (stub) components.
  • NTHRDS must be greater or equal to 1 (one). If NTHRDS is set to 1, this generally means threading parallelization will be off for that component. NTHRDS should never be set to zero.
  • The total number of hardware processors allocated to a component is NTASKS * NTHRDS.
  • The coupler processor inputs specify the pes used by coupler computation such as mapping, merging, diagnostics, and flux calculation. This is distinct from the driver which always automatically runs on the union of all processors to manage model concurrency and sequencing.
  • The root processor is set relative to the MPI global communicator, not the hardware processors counts. An example of this is below.
  • The layout of components on processors has no impact on the science. The scientific sequencing is hardwired into the driver. Changing processor layouts does not change intrinsic coupling lags or coupling sequencing. ONE IMPORTANT POINT is that for a fully active configuration, the atmosphere component is hardwired in the driver to never run concurrently with the land or ice component. Performance improvements associated with processor layout concurrency is therefore constrained in this case such that there is never a performance reason not to overlap the atmosphere component with the land and ice components. Beyond that constraint, the land, ice, coupler and ocean models can run concurrently, and the ocean model can also run concurrently with the atmosphere model.
  • If all components have identical NTASKS, NTHRDS, and ROOTPE set, all components will run sequentially on the same hardware processors.

An important, but often misunderstood point, is that the root processor for any given component, is set relative to the MPI global communicator, not the hardware processor counts. For instance, in the following example:

NTASKS(ATM)=6  NTHRRDS(ATM)=4  ROOTPE(ATM)=0  
NTASKS(OCN)=64 NTHRDS(OCN)=1   ROOTPE(OCN)=16
The atmosphere and ocean will run concurrently, each on 64 processors with the atmosphere running on MPI tasks 0-15 and the ocean running on MPI tasks 16-79. The first 16 tasks are each threaded 4 ways for the atmosphere. CIME ensures that the batch submission script ($CASE.run) automatically request 128 hardware processors, and the first 16 MPI tasks will be laid out on the first 64 hardware processors with a stride of 4. The next 64 MPI tasks will be laid out on the second set of 64 hardware processors. If you had set ROOTPE_OCN=64 in this example, then a total of 176 processors would have been requested and the atmosphere would have been laid out on the first 64 hardware processors in 16x4 fashion, and the ocean model would have been laid out on hardware processors 113-176. Hardware processors 65-112 would have been allocated but completely idle.

Note: env_mach_pes.xml cannot be modified after "./case.setup" has been invoked without first invoking "case.setup -clean". For an example of changing pes, see the Section called *BASICS: How do I change processor counts and component layouts on processors?* in Chapter 6.

Clone this wiki locally