Python command line tools for working with YAML and similar structured text data, eg, round-trip conversion between XML and YAML, preserving attributes and comments (with minor corrections). The default file encoding for all types is UTF-8 without a BOM. Now includes more console entry points to grep or sort interesting YAML files (eg, lists of rules found in the SCAP Security Guide) and support for more input file types to ingest SSG and other upstream data, eg, NIST oscal-content.
Available modules, console commands, and scripts:
ymltoxml
- YAML / XML round-trip conversion and cleanupyasort
- sort large lists in YAML filesyagrep
- grep for keys/values in YAML filesoscal
(WIP) - ingest NIST 800-53 content in multiple formats
Experimental "demo" scripts:
analyze_control_ids.py
- analyze control ID sets with optional fuzzy matchanalyze_ssg_controls.py
- analyze NIST controls from SSG content
For the above "demo" scripts, check the top of the source file for any knobs adjustable via environment variables, eg:
FILE = os.getenv(
"ID_FILE",
default="tests/data/OE-expanded-profile-all-ids.txt",
)
SSG_PATH = os.getenv("SSG_PATH", default="ext/content/controls")
DEBUG = int(os.getenv("DEBUG", default=0))
Note
Text files with a single "column" of IDs have a special read mode but the files must be named with the string "ids" in the filename and the extension ".txt" for this mode to be used (as in the example above).
This package is not yet published on PyPI, thus use one of the following to install yaml-tools on any platform. Install from the main branch:
$ pip install https://github.com/sarnold/yaml-tools/archive/refs/heads/main.tar.gz
or use this command to install a specific release version:
$ pip install https://github.com/sarnold/yaml-tools/releases/download/0.4.0/yaml_tools-0.4.0.tar.gz
The yaml_tools
package provides the modules shown above as well as
module-specific reference configuration files with defaults for all values.
If you'd rather work from the source repository, it supports the common idiom to install it on your system in a virtual env after cloning:
$ python3 -m venv env $ source env/bin/activate (env) $ pip install . (env) $ ymltoxml --version (env) $ ymltoxml --dump-config (env) $ deactivate
The alternative to python venv is the tox
test driver. If you have it
installed already, see the example tox commands below.
The current version supports minimal command options; if no options are provided, the only required arguments are one or more files of a single type:
$ ymltoxml usage: ymltoxml [-h] [--version] [-v] [-d] [-s] [-i [FILE]] [-o [FILE]] [FILE ...] Transform YAML to XML and XML to YAML positional arguments: FILE Process input file(s) to target extension (default: None) options: -h, --help show this help message and exit --version show program's version number and exit -v, --verbose Display more processing info (default: False) -d, --dump-config Dump default configuration file to stdout (default: False) -s, --save-config save active config to default filename (.ymltoxml.yml) and exit (default: False) -i [FILE], --infile [FILE] Path to single input file (use with --outfile) (default: None) -o [FILE], --outfile [FILE] Path to single output file (use with --infile) (default: None)
- for processing individual files/paths, use the
--infile
option, either with or without the--outfile
option - for processing multiple files, pass all files as arguments (paths
can be relative or absolute)
- when passing input files as arguments, the output file names/paths are the same as the input files but with the (new) output extension
By default it will process one or more input files as command args, typically
in the current directory, however, the --infile
option will only
process a single file path, optionally with an output file path, with no
extra (file) arguments.
The main processing tweaks for yml/xml output formatting are specified
in the default configuration file; if you need to change something, you
can use your own config file in the working directory; note the local
copy must be named .ymltoxml.yaml
. To get a copy of the default
configuration file, do:
$ cd path/to/work/dir/ $ ymltoxml --save-config $ $EDITOR .ymltoxml.yaml
A new module is now included for searching keys and values in
YAML files. The yagrep
script also has its own built-in config
file, which can be copied and edited as shown above. In this case the
script is intended to feel more-or-less like grep
so the default
config should Just Work. That said, the script uses the dpath
python library, so you may need to change the default "path" separator
if your data has keys containing forward slashes (see the upstream
docs for details).
General usage guidelines:
- use the
-f
(filter) arg to search for a value string - follow the (json) output from above to find the key name
- then use the
-l
(lookup) arg to extract the values for the key
Useful yagrep config file settings:
default_separator: | change the path separator to something like ; if data
has forward slashes |
---|---|
output_format: | set the output format to raw for unformatted output |
$ yagrep -h usage: yagrep [-h] [--version] [-v] [-d] [-s] [-f | -l] TEXT FILE [FILE ...] Search in YAML files for keys and values. positional arguments: TEXT Text string to look for (one-only, required) (default: None) FILE Look in file(s) for text string (at least one, required) (default: None) options: -h, --help show this help message and exit --version show program's version number and exit -v, --verbose Display more processing info (default: False) -d, --dump-config Dump default configuration file to stdout (default: False) -s, --save-config save active config to default filename (.yagrep.yml) and exit (default: False) -f, --filter Filter out data not matching input string (no paths) (default: False) -l, --lookup Lookup by key and return list of values for any matches (default: False)
Yet another helper script is included for sorting large (YAML) lists.
The yasort
script also uses its own configuration file, creatively
named .yasort.yaml
. The above applies equally to this config file.
$ yasort -h usage: yasort [-h] [--version] [-v] [-d] [-s] [FILE ...] Sort YAML lists and write new files. positional arguments: FILE Process input file(s) to target directory (default: None) options: -h, --help show this help message and exit --version show program's version number and exit -v, --verbose Display more processing info (default: False) -d, --dump-config Dump default configuration file to stdout (default: False) -s, --save-config save active config to default filename (.yasort.yml) and exit (default: False)
All of the optional arguments for yasort
are essentially orthogonal to
sorting, thus the only required argument for normal usage is one or more
input files. All of the user settings are in the default configuration file
shown below; use the --save-config
option to create your own config file.
Default yasort.yaml:
---
# comments should be preserved
file_encoding: 'utf-8'
default_yml_ext: '.yaml'
output_dirname: 'sorted-out'
default_parent_key: 'controls'
default_sort_key: 'rules'
has_parent_key: true
preserve_quotes: true
process_comments: false
mapping: 4
sequence: 6
offset: 4
NIST control ID munging
The demo scripts and oscal
module illustrate various forms of control ID
normalization in order to match control IDs from multiple input sources.
Currently, there are 2 primary ID formats, and which one to use is mainly
a user choice:
- AC-01(01)(a) - uppercase with parentheses
- ac-01.01.a - lowercase with dots
Nested controls follow the upstream pattern of alternating lettered and numbered sub-bullets for each level, and the latest rev5 controls add leading zeros.
The xform_id
function is idempotent with the following caveats:
- extraneous whitespace is always dropped
- leading zeros are added to single digit values where needed
XML <==> YAML conversion
We mainly test ymltoxml on mavlink XML message definitions and NIST/SSG content files, so round-trip conversion may not work at all on arbitrarily complex XML files with namespaces, etc. The current round-trip is not exact, due to the following:
- missing encoding is added to version tag
- leading/trailing whitespace in text elements and comments is not preserved
- XML - elements with self-closing tags are converted to full closing tags
- XML - empty elements on more than one line are not preserved
For the files tested (eg, mavlink) the end result is cleaner/shinier XML.
The following covers two types of workflows, one for tool usage in other (external) projects, and one for (internal) tool development.
The ymltoxml tools are intended to be part of a larger workflow, ie, developing custom mavlink message dialects and generating/deploying the resulting mavlink language interfaces. To be more specific, for this example we use a mavlink-compatible component running on a micro-controller, thus the target language bindings are C and C++.
Tool requirements for the full mavlink workflow:
- initially just recent pymavlink, Python, and Tox
Both mavlink and pymavlink require a (host) GCC toolchain for full builds, however, the basic workflow to generate mavlink library headers requires only Git, Python, and Tox.
The yasort/yagrep tools are also intended to be part of a larger workflow, mainly working with SCAP content, ie, the scap-security-guide source files (or just content). It is currently used to sort profiles with large numbers of rules, as well as create control files and analyze existing controls.
The yasort configuration file defaults are based on existing yaml structure, but feel free to change them for another use case. To adjust how the sorting works, make a local config file (see above) and edit as needed the following options:
output_dirname: | directory for output file(s) |
---|---|
default_parent_key: | parent key if sort target is sublist |
default_sort_key: | the key you want to sort |
has_parent_key: | set true if sorting a sublist |
default_yml_ext: | change the output file extension |
The rest of the options are for YAML formatting/flow style (see the ruamel documentation for formatting details)
As long as you have git and at least Python 3.8, then the "easy" dev workflow is to clone this repository and install Tox via your system package manager, eg:
$ sudo apt-get update $ sudo apt-get install tox
After cloning this repository, you can run the repo checks with the
tox
command. It will build a virtual python environment with
all the dependencies and run the specified commands, eg:
$ git clone https://github.com/sarnold/yaml-tools.git $ cd yaml-tools/ $ tox -e py
The above will run the tests using your (default) system Python; to specify the Python version and host OS type, run something like:
$ tox -e py39-linux
To generate a coverage file, run something like the following:
$ tox -e py,coverage
Additional tox
commands:
tox -e changes
(re)generate the changelog filetox -e conv
round-trip conversion test on mavlink dialecttox -e dev
pip "developer" installtox -e style
will run flake8 style checkstox -e lint
will run pylint (somewhat less permissive than PEP8/flake8 checks)tox -e mypy
will run mypy import and type checkingtox -e isort
will run isort import checkstox -e clean
will remove temporary test files
To build/lint the api docs, use the following tox commands:
tox -e docs
build the documentation using sphinx and the api-doc plugintox -e docs-lint
build the docs and run the sphinx link checking
Some required Linux environments can be old enough to require "special"
handling (eg, Ubuntu bionic or focal) as shown below. If the Tox package
version available via system package manager is less than 4.x, it may be
possible to install the latest Tox version into XDG $HOME directories, eg,
$HOME/.local/bin
. Note that Ubuntu allows this but Debian frowns on it
so this option may not available.
If the above procedure is supported, try something like the following:
$ sudo apt-get update $ sudo apt-get install python3-venv python3-pip $ sudo apt-get purge python3-virtualenv $ python3 -m pip install -U pip $ source ~/.profile $ which pip3 /home/user/.local/bin/pip3 $ pip3 install tox $ tox --version
Ensure the above version is 4.x and the path displayed starts with
$HOME/.local
.
We use the gitchangelog action to generate our changelog file and GH
Release page, as well as the gitchangelog commit message prefix "tag"
modifiers to help it categorize/filter commits for a tidier changelog.
Please use the appropriate ACTION modifiers in any Pull Requests. Some
examples of commit message summary "tags" are shown in .gitchangelog.rc
file and reproduced below:
new: usr: support of bazaar implemented chg: re-indentend some lines !cosmetic new: dev: updated code to be compatible with last version of killer lib. fix: pkg: updated year of licence coverage. new: test: added a bunch of test around user usability of feature X. fix: typo in spelling my name in comment. !minor
See the following docs page (or generate-changelog.rst on Github) for more details.
This repo is also pre-commit enabled for various linting and format checks. The checks run automatically on commit and will fail the commit (if not clean) with some checks performing simple file corrections.
If other checks fail on commit, the failure display should explain the error
types and line numbers. Note you must fix any fatal errors for the
commit to succeed; some errors should be fixed automatically (use
git status
and git diff
to review any changes).
See the following pages for more information on gitchangelog and pre-commit.
You will need to install pre-commit before contributing any changes; installing it using your system's package manager is recommended, otherwise install with pip into your usual virtual environment using something like:
$ sudo emerge pre-commit --or-- $ pip install pre-commit
then install it into the repo you just cloned:
$ git clone https://github.com/sarnold/yaml-tools $ cd yaml-tools/ $ pre-commit install
It's usually a good idea to update the hooks to the latest version:
pre-commit autoupdate
This project is licensed under the LGPL license - see the LICENSE file for details.