Skip to content

User_guide.md

Herbert Greenlee edited this page Jun 21, 2023 · 2 revisions

User guide

Larsoft common batch and workflow tools are contained in ups product larbatch, which is built and distributed as part of larsoft. Larbatch tools are built on top of Fermilab jobsub_lite batch submission tools. For general information about 'jobsub_lite' and the Fermilab batch system, refer to articles on the jobsub_lite wiki and the fife wiki.

No other part of larsoft is dependent on larbatch, and larbatch is not setup as a dependent of the larsoft umbrella ups product. Rather, larbatch is intended to be a dependent of experiment-specific ups products (see this article for instructions on configuring larbatch for a specific experiment.

After setting up ups product larbatch, several executable scripts and python modules are available on the execution path and python path. Here is a list of the more important ones.

  • project.py An executable python script that is the the main entry point for user interation. More information can be found below.
  • project_utilities.py A python module, imported by project.py, that implements some of the workflow functionality. End users would not normally interact directly with this module. However, a significant aspect of project_utilities.py is that is supplies hooks for providing experiment-specific implementations of some functionality, as described in an accompanying article on this wiki.
  • condor_lar.sh The main batch script. Condor_lar.sh is a general purpose script that manages a single invocation of an art framework program (lar executable). Condor_lar.sh sets up the run-time environment, fetches input data, interacts with sam, and copies output data. It is not intended that end users will directly invoke condor_lar.sh. However, one can get a general idea of the features and capabilities of condor_lar.sh by viewing the built-in documentation by typing condor_lar.sh -h, or reading the file header.
  • condor_start_project.sh Batch script for starting a sam project.
  • condor_stop_project.sh Batch script for stopping a sam project.

Project.py is used in conjunction with a xml format project definition file (see below). The concept of a project, as understood by project.py, and as defined by the project definition file, is a multistage linear processing chain involving a specified number of batch workers at each stage.

Refer to header of project.py or type project.py --help. Internal documentation is always kept up to date project.py command line options are changed.

In a typical invocation of project.py, one specifies the project file (via option --xml), tha stage name (via option --stage), and one or more action options. Here are some use cases for invoking project.py.

  • project.py -h or project.py --help Print built-in help (lists all available command line options).
  • project.py -xh or project.py --xmlhelp Print built-in xml help (lists all available elements that can be included in project definition file).
  • project.py --xml xml-name --status Print global summary status of the project.
  • project.py --xml xml-name --stage stage-name --submit Submit batch jobs for specified stage.
  • project.py --xml xml-name --stage stage-name --check Check results from specified stage (identifies failed jobs). This action assumes that the art program produces an artroot output file.
  • project.py --xml xml-name --stage stage-name --checkana Check results from specified stage (identifies failed jobs). This version of the check action skips some checks done by --check that only make sense if the art program produces an artroot output file. Use this action to check results from an analyzer-only art program.
  • project.py --xml xml-name --stage stage-name --makeup Submit makeup jobs for failed jobs, as identified by a previous --check or --checkana action.
  • project.py --xml xml-name --stage stage-name --clean Delete output for the specified stage and later stages. This option can be combined with --submit.
  • project.py --xml xml-name --stage stage-name --declare Declare successful artroot files to sam.
  • project.py --xml xml-name --stage stage-name --upload Upload successful artroot files to enstore.
  • project.py --xml xml-name --stage stage-name --define Create sam dataset definition.
  • project.py --xml xml-name --stage stage-name --audit Check the completeness and correctness of a processing stage using sam parentage information. For this action to work, input and output files must be must be declared to sam.

Project.py has a GUI interface called projectgui.py, which is invoked as follows.

    projectgui.py xml-files

Essentially all of the functionality that is available via the command line interface of project.py can be access from the GUI, in (hopefully) obvious fashion.

The general structure of the project file is that it is an XML file that contains a single root element of type project (enclosed in <project name=project-name>...</project>). Inside the project element, there are additional subelements, including one or moe stage subelements (enclosed in <stage name=stage-name>...</stage>. Each stage element defines a group of batch jobs that are submitted together by a single invocation of jobsub_submit.

Example XML project files used by microboone from ubutil product can be found here.

Refer to header of project.py or type project.py --xmlhelp. Internal documentation is always kept up to date when XML constructs are added or changed.

The initial lines of an XML project file should follow a standard pattern. Here is a typical example header.

    <?xml version="1.0"?>
    <!DOCTYPE project [
    <!ENTITY release "v02_05_01">
    <!ENTITY file_type "mc">
    <!ENTITY run_type "physics">
    <!ENTITY name "prod_eminus_0.1-2.0GeV_isotropic_uboone">
    <!ENTITY tag "mcc5.0">
    ]>

The significance of the header elements are as follows.

  • The XML version Copy the above version line exactly, namely,
        <?xml version="1.0"?>
  • The document type (DOCTYPE keyword). The argument following the DOCTYPE keyword specifies the "root element" of the XML file, and should always be project.
  • Entity definitions Entity definitions, which occur inside the DOCTYPE section, are XML aliases. Any string that occurs repeatedly inside an XML file is a candidate for being defined as an entity. Entities can be substituted inside the the body of the XML file by enclosing the entity name inside &...; (e.g. &release;).

Each project definition file should contain a single project element enclosed in <project name=project-name>...</project>. The name attribute of the project element is required.

The content of the project element consists of other XML subelements, including the following.

  • A single subelement with tag larsoft, which defines the run-time environment.
  • Option subelements.
  • One or more stage subelements.

Each project element is required to contain a single subelement with tag larsoft (enclosed in <larsoft>...</larsoft>. The larsoft subelement defines the batch run-time environment. The larsoft subelement may contain simple text subelements, of which there are currently three:

  • <tag>...</tag>
    Larsoft release version.
  • <qual>...</qual>
    Larsoft release qualifier.
  • <local>...</local>
    Path of user's local test release directory or tarball. Tarballs are preferred over bare directories. Here is a command that will make a suitable tarball called local.tar from your local release.
        tar -C $MRB_INSTALL -czf local.tar .

NOTE: all elements of the larsoft specification for all stages of a project must be in this section. If you want different versions for different stages, they must be separate projects. The local subelement is optional. Here is how a typical larsoft subelement might appear in a project definition file.

    <larsoft>
      <tag>&release;</tag>
      <qual>e6:prof</qual>
    </larsoft>

Note in this example that the larsoft version is defined by an entity release, which should be defined in the DOCTYPE section.

Project options are text subelements of the project element with tags other that larsoft or stage. Here are some project options (this is the full list when this wiki was written). The full list of project options (and all defined XML constructs) can always be found by typing project.py --xmlhelp.

  • <group>...</group> Should contain the standard experiment name (for microboone use uboone). If missing, environment variable $GROUP is used.
  • <numevents>...</numevents> Total number of events to process. This parameter is only useful for generator jobs, when it can be used to specify the total number of generated events among all workers. Otherwise (for non-generator jobs), you should usually specify the number of events to be a large number so that all input events are processed.
  • <numjobs>...</numjobs> Number of parallel worker jobs (default 1). Can be overridden in individual stages.
  • <maxfilesperjob>...</maxfilesperjob> Maximum number of files to deliver to a single job. Useful in case you want to limit output file size or keep 1 -> 1 correlation between input and output.
    Can be overwritten by <stage><maxfilesperjob>. This parameter only has an effect if input is from sam.
  • <os>...</os> Comma-separated list of allowed batch OSes (e.g. SL5,SL6). This option is passed directly to jobsub_submit command line option --OS. Default is jobsub decides.
  • <resource>...</resource> Specify jobsub resources (command line option --resource-provides=usage_model=). Default is DEDICATED,OPPORTUNISTIC. For OSG specify OFFSITE. Can be overridden in individual stages.
  • <server>...</server> Specify jobsub server. Expert option, usually not needed.
  • <site>...</site> OSG site(s) (comma-separated list). Use with <resource>OFFSITE</resource>. Default is jobsub decides, which usually means any site.
  • <filetype>...</filetype> Sam file type (e.g. data or mc). Default none.
  • <runtype>...</runtype> Sam run type (e.g. physics). Default none.
  • <merge>...</merge> Histogram merging program. Default hadd -T. Can be overridden in each stage.
  • <fcldir>...</fcldir> Specify additional directories in which to search for top-level fcl job files. Project.py searches $FHICL_FILE_PATH and the current directory by default.
  • <memory>...</memory> Specify the amount of memory needed for each job. This number is specified in megabytes (MB) only, but do NOT put a unit in the xml files. e.g. <memory>2000</memory>.
  • <jobsub>...</jobsub> Specify arbitrary jobsub_submit options for inclusion in jobsub_submit command line. You can specify run time of the jobs by including --expected-lifetime on this line. (e.g. <jobsub> --expected-lifetime=8h </jobsub> and you have to specify the unit on this one. m=minutes, h=hours).

Each project element should contain one or more stage subelements enclosed in <stage name=stage-name>...</stage>. The name attribute of the stage subelement is required, and should be different for each stage. The stage element should contain stage options in the form of simple text subelements. Here are the stage options:

  • <fcl>...</fcl> Top-level fcl job file (required). Can be specified as full or relative path.
  • <outdir>...</outdir> Output directory full path (required). Data output files are copied here when batch worker finishes. The output directory should be accessible interactively on the submitting node and grid-write-accessible via ifdh cp from the batch worker.
  • <logdir>...</logdir> Log file directory full path (required). Non-data output files are copied here when batch worker finishes. The log file directory should be accessible interactively on the submitting node and grid-write-accessible via ifdh cp from the batch worker. Recommended best practice is to make this the same directory as <outdir>...</outdir>.
  • <workdir>...</workdir> This is a temporary directory where files are assembled prior to batch submission that need to be copied to the batch worker. The batch job will fetch them from this directory. The work directory should be accessible interactively on the submitting node and grid-read-accessible via ifdh cp from the batch worker.
  • <bookdir>...</bookdir> A directory that is accessible and has fast i/o on the submitting node. Does not need to be grid-accessible. All files created or checked by project.py on the submitting node are stored here. This element is optional. If not specified, same as <logdir>. Specifying this element can greatly reduce accesses to dCache (if <logdir> is on dCache).
  • <numjobs>...</numjobs> Number of parallel worker jobs. If not specified, inherit from project options.
  • <targetfilesize>...</targetfilesize> If specified, this option may override the number of workers (option numjobs) in the downward direction to achieve the estimated target file size.

The following options deal with where this processing stage gets its input data. Specify no more than one input option. You can also omit any input opiton, in which case, output data from the previous stage is pipelined to this stage, or there is no input.

  • <schema>...</schema> Specify the transfer protocol that is used for input files. In general, XROOTD is the preferred transfer protocol and is specified with <schema>root</schema>. Use <schema>gsiftp</schema> to specify gridftp.
  • <inputfile>...</inputfile> Specify a single input file full path.
  • <inputlist>...</inputlist> Specify input file list (a file containing a list of input files, one per line, full path).
  • <inputdef>...</inputdef> Specify input sam dataset definition.
  • <inputmode>...</inputmode> Specify input mode, which can be textfile (do not include the quotes in the xml file) or nothing. Use this option with <inputfile>...</inputfile> or <inputlist>...</inputlist> together with art producer module TextFileGen. TextFileGen should be configured at location physics.producers.generator in the job configuration fcl file. One example fcl is prodtext.fcl. The number of files in the inputlist should match the number of grid jobs.

The following options allow job customizations by user-written scripts. The script location should be specified as an absolute or relative path (relative to current directory where project.py is invoked). Any specified job customization scripts are copied to the work directory and from there are copied to the batch worker.

  • <initscript>...</initscript> Worker initialization script (condor_lar.sh --init-script).
  • <initsource>...</initsource> Worker initialization source script (condor_lar.sh --init-source).
  • <endscript>...</endscript> Worker finalization script (condor_lar.sh --end-script).

Additional options.

  • <defname>...</defname> Sam dataset definition name for art output files.
  • <anadefname>...</anadefname> Sam dataset definition name for analysis output files.
  • <datatier>...</datatier> Sam data tier for art output files.
  • <anadatatier>...</anadatatier> Sam data tier for analysis output files.
  • <merge>...</merge> Histogram merging program. If not specified, inherit from project options.
  • <resource>...</resource> Specify jobsub resources (command line option --resource-provides=usage_model=). If not specified, inherit from project options.
  • <lines>...</lines> Specify arbitrary condor command via @jobsub_submit --lines= (expert option).
  • <site>...</site> OSG site(s) (comma-separated list). If not specified, inherit from project options.
  • <output>...</output> Specify output file name.
  • <TFileName>...</TFileName> Specify TFileName.
  • <jobsub>...</jobsub> Arbitrary jobsub_submit command line options (space-separated list).
  • <jobsub_start>...</jobsub_start> Specify arbitrary josbub_submit options for inclusion in jobsub_submit command line for sam start/stop jobs. Recommended option is --expected-lifetime=short.

If you want to generate single particle samples using larbatch and store the output in tape-backed storage with SAM4Users, you can follow the these instructions: MicroBooNE Single Particle Generation Example MCC8

If you want to run reconstruction and anatree on a MicroBooNE sample that is part of a SAM datasets definition, you can follow these instructions: MicroBooNE Reco Processing Example MCC8