The Auditree data gathering and reporting tool.
Auditree harvest
is a command line tool that assists with the gathering and
formatting of data into human readable reports. Auditree harvest
allows
a user to easily retrieve historical raw data, in bulk, from a Git repository
and optionally format that raw data to meet reporting needs. Auditree
harvest
is meant to retrieve and report on historical evidence from an
evidence locker. It is, however, not limited to just processing evidence. Any
file found in a Git repository hosting service can be processed by Auditree harvest
.
- Supported for execution on OSX and LINUX.
- Supported for execution with Python 3.6 and above.
Python 3 must be installed, it can be downloaded from the Python site or installed using your package manager.
Python version can be checked with:
python --version
or
python3 --version
The harvest
tool is available for download from PyPI.
It is best practice, but not mandatory, to run harvest
from a dedicated Python
virtual environment. Assuming that you have the Python virtualenv
package already installed, you can create a virtual environment named venv
by
executing virtualenv venv
which will create a venv
folder at the location of
where you executed the command. Alternatively you can use the python venv
module
to do the same.
python3 -m venv venv
Assuming that you have a virtual environment and that virtual environment is in
the current directory then to install a new instance of harvest
, activate
your virtual environment and use pip
to install harvest
like so:
. ./venv/bin/activate
pip install auditree-harvest
As we add functionality to harvest
users will want to upgrade their harvest
package regularly. To upgrade harvest
to the most recent version do:
. ./venv/bin/activate
pip install auditree-harvest --upgrade
See pip documentation for additional options for using pip
.
Since Auditree harvest
interacts with Git repositories, it requires Git remote
hosting service credentials in order to do its thing. Auditree harvest
will by
default look for a username
and token
in a ~/.credentials
file. You can
override the credentials file location by using the --creds
option on a harvest
CLI execution. Valid section headings include github
, github_enterprise
, bitbucket
,
and gitlab
. Below is an example of the expected credentials entry.
[github]
username=your-gh-username
token=your-gh-token
To collate historical versions of a file from a Git repository hosting service
like Github, provide the repository URL (repo
positional argument), the
relative path to the file within the remote repository including the file name
(filepath
positional argument) and an optional date range (--start
and --end
arguments). You can also, optionally, provide the local Git repository path
(--repo-path
argument), if the repository already exists locally and you wish
to override the remote repository download behavior.
harvest collate https://github.com/org-foo/repo-bar /raw/baz/baz.json --start 20191201 --end 20191212 --repo-path ./bar-repo
- File versions are written to the current local directory where
harvest
was executed from. - File versions are prefixed by the commit date in
YYYYMMDD
format. - File versions are gathered with daily granularity.
- Only the latest version of a file for a given day is retrieved.
- If a file did not change on a date then no file version is written for that date. Instead the latest version prior to that date serves as the version of that file for that date.
- If you don't provide a
--start
and an--end
then the latest version of a file is retrieved. - If you only provide a
--start
date file versions from the start date to the current date are retrieved. - If you only provide an
--end
date the latest version of a file for the end date is retrieved.
To run a report using content contained in a Git repository hosting service
like Github, provide the repository URL (repo
positional argument), the report
package (package
), the report name (name
positional argument) and include
any configuration that the report requires (--config
) as a JSON string. You
can also, optionally, provide the local Git repository path (--repo-path
argument), if the repository already exists locally and you wish to override
the remote repository download behavior.
harvest report https://github.com/org-foo/repo-bar auditree_arboretum check_results_summary --config '{"start":"20191212","end":"20191221"}'
To see a full summary of available reports within any package (like auditree-arboretum
) do:
harvest reports auditree_arboretum --list
To see details on a specific report that include usage example do something like:
harvest reports auditree_arboretum --detail check_results_summary
Reports should be hosted with the fetchers/checks that collect the evidence for
the reports process. Within auditree-arboretum
this means the code lives in the
appropriate provider directory. Contributing common harvest reports are as follows:
- Adhere to the auditree-arboretum contribution guidelines - TODO add link.
- Reports go in the "reports" folder by provider.
- Create a python module with a class that extends the BaseReporter
class.
- The
harvest
CLI will use the report module name as the name of the report (sans the .py extention). - Only one report class per report module is permitted.
- The
- In the new report class the expectations are as follows:
- Provide a module level docstring that contains:
- A single line summary
- A detailed description of the report that includes evidence/files being processed and expected configuration
- At least one usage example
- Use the check results summary report docstring as an example/template.
harvest
uses this docstring to display available reports and their details to the user.
- Provide/Override the
report_filename
property to return the name of the report (including extension).harvest
uses this property to apply a report template (if desired) and to determine which writer function to use when writing the report to a file. Use the check results summary report report_filename property and the Python packages summary report report_filename property as examples. - Provide/Override the
generate_report
method. This is where you put your evidence processing and report formatting logic. Use the check results summary report generate_report method as an example.harvest
takes the optional--config
command line argument as a JSON string when executing a report, converts it to a dictionary and attaches it as theconfig
attribute to your report object. Use the report object'sconfig
attribute in thegenerate_report
method if you plan to have report specific configuration options.- Your report object also has a method that retrieves an evidence file for
a given date. Use the report object's
get_file_content
method when retrieving evidence from an evidence locker. - Generating CSV reports:
harvest
uses the Python CSV writer to write out the report file. So be sure that yourgenerate_report
method returns a list of dictionaries that adheres to the expectations of the Python CSV writer.
- Generating reports from a Jinja2 template:
- Add a report template named the same as your
report_filename
property with a.tmpl
extension.harvest
will start to look for the template in the same directory as the report module. So as long as it exists within that directory structure,harvest
will find it. Use python_packages_summary.md.tmpl as an example. harvest
will look for this template file as part of your report processing and, if found, will pass yourgenerate_report
returned content through the template logic.- Your
generate_report
returned content should be a dictionary with everything necessary for your report template to render the desired report appropriately. - The report template can access the "raw" content generated by
generate_report
through a dictionary nameddata
and also has access to the report's attributes through thereport
object. Use python_packages_summary.md.tmpl as an example.
- Add a report template named the same as your
- Generating reports without templates:
- You just want to generate report content directly from
generate_report
? No problem. Just generate a string as the report content or a list of strings as the rows of the report content andharvest
will do the rest.
- You just want to generate report content directly from
- Provide a module level docstring that contains:
If you find that you have a specific reporting need that does not fit in as a common
harvest
report, no problem. Just develop the report in a separate repo/project
following the same guidelines as above. As long as the package is importable by
python and you tell harvest
what package to look for your report(s) in via the CLI,
it will handle the rest.