Add converted shell guidelines from Confluence

ecmwf-ifs · Jan 26, 2024 · 49b7a3a · 49b7a3a
1 parent 21fd2ef
commit 49b7a3a
Show file tree

Hide file tree

Showing 14 changed files with 374 additions and 10 deletions.
diff --git a/shell/foreword.rst b/shell/foreword.rst
@@ -0,0 +1,24 @@
+========
+Foreword
+========
+
+We have a large collection of (mostly ksh93) shell scripts supporting
+the IFS, which have grown somewhat organically to a high level of
+complexity, with a variety of different styles and approaches. Although
+there have been suggestions from multiple sources over the years of
+rules that should be followed, these have generally been somewhat ad-hoc
+and inconsistently applied, with various degrees of applicability to
+modern systems.
+
+Further, the choice of ksh93 as a "standard" shell is an increasingly
+niche one, carrying risks that (a) it may not be readily available on
+newer platforms (e.g. in the context of DestinE), and (b) it's likely to
+be increasingly unfamiliar to new starters.
+
+Following discussion with stakeholders, what is presented here aims to
+be a coherent set of standards and guidance for shell scripts (as we
+already have for example for Fortran) promoting a consistent, structured,
+and modern approach that is applicable across research, development,
+testing and operational environments. While they should be sufficient to
+start making improvements, it is expected that they will be further
+refined and extended over time.
diff --git a/shell/guidelines/dependencies.rst b/shell/guidelines/dependencies.rst
@@ -0,0 +1,25 @@
+Available commands and dependencies
+-----------------------------------
+
+Most of the time, our scripts currently run on a constrained set of
+GNU/Linux based systems, where the availability of various GNU
+extensions can be assumed on top of the standard commands specified by
+POSIX (both additional commands and options to standard ones).
+
+However, there are exceptions - for example for OpenIFS usage, or local
+IFS development and testing, on other non-GNU or non-Linux platforms
+(notably macOS), even if they are Unix-like and POSIX-compliant.
+(Previous HPC systems before the Cray were also not necessarily
+GNU/Linux-based.)
+
+Scripts should therefore avoid using GNU-specific commands or extensions
+when there is an alternative POSIX command or syntax that is equally
+suitable. Where there is significant benefit however (in terms of
+simplicity, clarity, performance, reliability etc.), GNU extensions may
+be used, but the relevant GNU tools should be clearly documented as a
+dependency of the script in case they need to be installed explicitly on
+non-GNU-based platforms.
+
+More broadly, *any* external command (or package of them) that's not
+part of the POSIX standard should be documented as a dependency of the
+script and/or the package of which it is part.
diff --git a/shell/guidelines/ecflow/index.rst b/shell/guidelines/ecflow/index.rst
@@ -0,0 +1,7 @@
+ecFlow task wrappers
+--------------------
+.. toctree::
+   :maxdepth: 1
+
+   structure
+   telemetry
diff --git a/shell/guidelines/ecflow/structure.rst b/shell/guidelines/ecflow/structure.rst
@@ -0,0 +1,62 @@
+Structure
+~~~~~~~~~
+
+These are not “pure” shell scripts, but are preprocessed first by
+ecFlow. In general, they should be simple shell wrappers (ideally
+generated by pyFlow, but that is beyond the scope of this document)
+around a standalone script (or a short sequence of such scripts with
+minimal logic), passing through any ecFlow-variable-derived options via
+command line arguments. Where there are a large number of such
+variables, they may be placed in a configuration file in the task's
+temporary working directory that is then passed as a single command-line
+argument.
+
+Options that come not from ecFlow variables but directly from a
+suite-level (or sub-suite component-level) configuration file (e.g.
+current ``config.*.h`` or their successor) should be passed by one of
+the following, depending how many need to be supplied:
+
+#. individually as command-line arguments, if only a few are required;
+#. via a temporary configuration file as with ecFlow-derived variables,
+   if there are too many to pass individually;
+#. by directly passing the entire higher-level configuration file, if a
+   large portion of its content is required;
+#. by passing a coherent sub-portion of the higher-level configuration
+   structure, if a hierarchical configuration system is introduced in
+   future.
+
+In very limited cases, but only where it corresponds to a
+well-established interface (e.g. specifying paths to certain tools),
+options may be used to set environment variables instead; however as
+these are effectively a form of global variable this should not be the
+norm, and should *never* be done for options that may vary between
+components of a suite. Exporting the entire content of ``config.*.h`` to
+the environment via ``set -a / -o allexport`` is strongly deprecated.
+
+In all cases, the interface to the called script should be well
+documented, so that the script which actually does the work can be
+tested outside of ecFlow. This called script must correctly report any
+failure via a non-zero return code to the task wrapper.
+
+In light of the above, *only* the header and footer of the task
+wrappers, or the boilerplate code directly ``%include``\ d in them,
+should include ecFlow substitutions (``%VAR%``, further ``%include``\ s
+etc.) Called / sourced scripts should not include such syntax, as this
+makes them impossible to use or test outside of ecFlow. *If* they are
+deployed via a construct like this in the task wrapper:
+
+::
+
+    cat >$TMPDIR/script <<\EOF
+    %includenopp <script>
+    EOF
+    chmod +x $TMPDIR/script
+
+then this should *always* be done using ``%includenopp`` rather than
+``%include`` to prevent such substitutions and the need to "double-up"
+real ``%`` characters in the script. Similarly, this should be done with
+``<<\`` rather than ``<<`` to prevent shell substitutions during
+deployment (which should only happen at runtime). *However, future
+alternative mechanisms for script deployment are possible but outside
+the scope of this document, to be considered alongside the wider
+evolution of workflow code for suite generation and deployment.*
diff --git a/shell/guidelines/ecflow/telemetry.rst b/shell/guidelines/ecflow/telemetry.rst
@@ -0,0 +1,39 @@
+ecFlow telemetry and trapping
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+It is essential that ecFlow task wrappers do an ``ecflow_client --init``
+at the start, ``--complete`` on successful completion, and implement
+proper trapping to ensure an ``--abort`` on all failures. This should
+generally be done by standard “boilerplate” included at the start and
+end of the task wrapper, rather than in an ad-hoc way. Trapping should
+be set up as early as possible, and special care taken to ensure that
+any preceding setup/configuration code will not cause the script to exit
+without calling ``ecflow_client --abort``.
+
+It is not normally necessary for scripts *called from* or *sourced from*
+the task wrapper to implement trapping themselves, unless required for
+their own internal cleanup – if a called script fails or receives a
+signal, the task wrapper will see this as a non-zero return code (which
+it can handle or abort on explicitly or via “\ ``set -e``\ ” invoking
+the wrapper's traps).
+
+Sourced scripts, all functions in Bash, and POSIX ``()`` functions in
+Korn shell, will inherit the trapping environment in which they are
+called, and failures within them will behave similarly to those in the
+body of the script. Changes to traps will propagate back out to the
+calling environment, so must be carefully restored on all exit paths.
+Such scripts and functions should therefore not change the traps unless
+this is precisely their intended and documented purpose.
+
+Called scripts, and ``function``-keyword functions in Korn shell, have
+their own trapping environment, so may freely implement their own local
+trap handlers without explicitly restoring them on exit, although this
+is not necessary simply to maintain correct trapping in the task as a
+whole, because the calling script will see and handle the non-zero
+return code as outlined above. While trapping within a standalone called
+script may be safe and useful, local traps should *not* be set within
+functions unless a specific exception is agreed, because this is a
+Korn-shell-specific behaviour – if the script is run in Bash, they will
+behave the same as POSIX ``()`` functions and changes to the trap will
+propagate back out to the caller, potentially breaking the task
+wrapper's trapping.
diff --git a/shell/guidelines/google.rst b/shell/guidelines/google.rst
@@ -0,0 +1,18 @@
+Google Shell Style Guide
+------------------------
+
+The `Google Shell Style
+Guide <https://google.github.io/styleguide/shellguide.html>`__ provides
+an existing well-thought-out set of detailed guidelines for shell
+scripting (and there are very few others to be found) that we have
+chosen to adopt as a foundation for our purposes. **It is recommended to
+follow this guide as a baseline. For conciseness, the details are not
+reproduced here but the guide should be read in conjunction with this
+document.**
+
+The `ChromiumOS
+extensions <https://chromium.googlesource.com/chromiumos/docs/+/HEAD/styleguide/shell.md>`__
+to the general Google guidance also offer some useful additions which
+should be considered (in particular around shell arithmetic, defaulting
+variables, ``printf`` vs ``echo`` and argument parsing with
+``getopts``).
diff --git a/shell/guidelines/index.rst b/shell/guidelines/index.rst
@@ -0,0 +1,19 @@
+=====================================
+Detailed guidelines for shell scripts
+=====================================
+
+New scripts should follow these guidelines, except where dependence on
+legacy scripts prevents this; legacy scripts will be migrated over a
+period of time according to a managed programme.
+
+.. toctree::
+   :maxdepth: 1
+   :numbered:
+
+   when
+   which
+   options
+   google
+   ecflow/index
+   tools
+   dependencies
diff --git a/shell/guidelines/options.rst b/shell/guidelines/options.rst
@@ -0,0 +1,48 @@
+Preferred shell options
+-----------------------
+
+The following are proposed as a standard set of options to enable in all
+scripts, except where there is a clear and documented need to do
+otherwise:
+
+-  ``set -e`` / ``-o errexit``: exit immediately when a command fails.
+   This reduces the verbosity of error handling, similar to using a
+   limited form of exceptions rather than C-like code.
+-  ``set -u`` / ``-o nounset``: exit immediately when using an undefined
+   variable. This helps to catch the use of undefined or mistyped
+   variables, similar to using "``implicit none``" in Fortran.
+-  ``set -o pipefail``: consider a pipeline failed if *any* of its
+   commands fail, rather than ignoring errors in all but the *last*
+   command.
+-  ``shopt -s inherit_errexit``: this disables the Bash peculiarity
+   where "``set -e``" gets reset within ``$(...)`` command
+   substitutions.
+
+These can be combined for brevity:
+
+::
+
+    set -eu -o pipefail
+    shopt -s inherit_errexit
+
+The following are worthy of consideration in cases where they are
+appropriate:
+
+-  ``shopt -s extglob``: allow various extended globbing syntaxes.
+-  ``shopt -s failglob``: exit immediately if a glob matches no
+   filenames.
+-  ``shopt -s nullglob``: globs that match no files expand to an empty
+   string rather than themselves.
+
+Finally, where verbose tracing of the script's execution is desired,
+"``set -x`` / ``-o xtrace``" should be used.
+
+More rarely, e.g. when tracking down hard-to-find parsing errors,
+"``set -v`` / ``-o verbose``" may also be useful to output a trace as
+the script is *read* rather than as it is executed, but this should not
+normally be left on in production scripts.
+
+"``set -a`` / ``set -o allexport``" may be used sparingly and locally,
+if a whole list of variables are to be exported, but should *always* be
+reset immediately afterwards to avoid polluting the environment with
+unnecessary variables from the rest of the script.
diff --git a/shell/guidelines/principles.rst b/shell/guidelines/principles.rst
@@ -0,0 +1,20 @@
+==================
+Guiding principles
+==================
+
+These indicate the high-level direction of what these guidelines are
+intended to promote, with the detail of how to achieve this covered in
+the subsequent sections.
+
+#. Use the most suitable tool for the job, without excessive variety.
+#. Consistency is more important than any particular stylistic choice.
+#. Reusable modular components rather than duplication or repetition.
+#. Robustness against run-time failures (both standalone and under
+   ecFlow).
+#. Maximise use of automated check/lint/formatting tools.
+#. Avoid either sticking with the past, or adopting the latest thing,
+   for the sake of it.
+#. Make it *easier* to write and debug scripts that are needed, rather
+   than getting in the way of scientific or technical work.
+#. Enable a gradual/piecemeal transition of existing scripts rather than
+   a "big bang" rewrite of everything at once.
diff --git a/shell/guidelines/tools.rst b/shell/guidelines/tools.rst
@@ -0,0 +1,33 @@
+Use of shellcheck and shfmt
+---------------------------
+
+The "`shellcheck <https://www.shellcheck.net/wiki/Home>`__" tool can
+catch many undesirable practices in shell scripts, including those
+discouraged in the Google guide. All scripts should pass its validation
+with the all checks enabled and warnings *sparingly* disabled per line
+or file where it is really necessary.
+
+The "`shfmt <https://github.com/mvdan/sh>`__" tool can be used to
+auto-format shell scripts (much like "black" does for Python) in a
+manner consistent with the Google guide, and (unless there are strong
+arguments for an exception) all scripts should follow this with an
+appropriate set of options (``-i 2 -ci -s``). Editor configuration files
+should also be available to support correct formatting as scripts are
+written.
+
+There will be a need for some wrapping around these tools, (a) to define
+the prescribed set of options to use, (b) to support shell scripts
+embedded in ``.ecf`` task wrappers, and (c) to provide appropriate
+skeleton config.*.h files to ensure that all the config variables aren't
+falsely shown as undefined. These wrappers will probably live either
+directly in ifs-scripts, or in ifs-git-tools (like "git ifsnorms" for
+Fortran).
+
+Attention should be paid during pull request reviews to whether any new
+exceptions to the rules enforced by these tools are justified.
+
+At some point, when we have a "clean" set of scripts, we could consider
+enforcing these checks via Git commit hooks, but that's a decision for
+the future rather than now. In the meantime, we should aim to check for
+regressions against them (i.e. new violations that weren't already
+there) via the continuous integration system.
diff --git a/shell/guidelines/when.rst b/shell/guidelines/when.rst
@@ -0,0 +1,33 @@
+When (not) to use shell
+-----------------------
+
+Shell scripting should be primarily considered for individual short
+simple scripts, with more structured languages preferred for larger and
+more complex code.
+
+Shell is particularly suited to being the top-level wrapper or glue that
+sets up the appropriate environment before executing a more complex
+program, or a simple sequence of them. (For example, an ecFlow task
+wrapper that extracts the relevant parameters from ecFlow variables and
+passes them to a complex executable; or something like ``run_parallel``
+which sets up the appropriate environment for srun.)
+
+However, it is much less suitable for complex logic and data/text
+processing. Tasks like the generation of one set of configuration files
+from another (e.g. generating Fortran namelists based on a suite or
+experiment configuration, possibly modulated by knowledge of current
+state) can generally be expressed more clearly and robustly by calling
+out to a more structured language such as Python. (Conversely, Python is
+*not* generally a good choice as the wrapper which forks subprocesses,
+even more so when parallelisation is required.)
+
+It is therefore recommended that, while shell may be the appropriate
+choice for the outermost layer and simple subtasks, this should delegate
+to something else (typically Python, unless it’s handled by a compiled
+language) for complex processing and logic.
+
+Separate guidance will also be developed for Python, but in general
+Python components need to have good interfaces, test coverage and
+documentation to ensure that (as with compiled Fortran code) they are
+unlikely to need on-the-fly debugging or internal modification in an
+operational context.
diff --git a/shell/guidelines/which.rst b/shell/guidelines/which.rst
@@ -0,0 +1,30 @@
+Which shell to use
+------------------
+
+While there are a variety of different shells around, each with its own
+nice features, no other shell has achieved the ubiquity of Bash. It is
+the default login shell on most modern Linux systems, leading to
+widespread familiarity, and readily available on almost all other
+Unix-like platforms.
+
+Greater portability *can* be achieved by sticking to POSIX-standard
+shell syntax, which should behave the same in any POSIX-compatible
+shell, but this comes at the cost of an extremely restricted feature set
+(lacking arrays, ``[[ ... ]]`` conditional syntax, function-local
+variables etc.) Outside of specific very simple scripts that may require
+portability to exotic or embedded systems which don't support Bash, this
+is unlikely to be a good trade-off.
+
+It might be possible to define a supported set of shells, and restrict
+to a common subset of their extended functionality and syntax (possibly
+with a function library to encapsulate differences). However, it's not
+necessarily clear *which* other shells ought to be included here, and
+may come at the price of significant complexity in both design and
+ongoing testing. (Such an approach might have a *short-term* role while
+new scripts co-exist with legacy ksh ones with common includes, however,
+as discussed in the migration plan.)
+
+**It is therefore recommended that Bash (version 4.4+) is the default
+choice for all shell scripts**, except in rare cases where they must run
+on esoteric systems where only a POSIX-compatible ``/bin/sh`` can be
+assumed to be available.
diff --git a/shell/index.rst b/shell/index.rst
@@ -0,0 +1,16 @@
+==============================
+IFS shell scripting guidelines
+==============================
+
+.. toctree::
+   :maxdepth: 2
+
+   foreword
+   guidelines/index
+
+Indices and tables
+==================
+
+* :ref:`genindex`
+* :ref:`modindex`
+* :ref:`search`
diff --git a/shell/rules/I1.rst b/shell/rules/I1.rst