Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add CLI command to dump inputs/outputs of CalcJob/WorkChain #6276

Merged
merged 30 commits into from
May 27, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
30 commits
Select commit Hold shift + click to select a range
6ebfe8d
CLI: Add dumping functionality.
GeigerJ2 Apr 11, 2024
d8bb818
CLI: Add dumping functionality.
GeigerJ2 Apr 11, 2024
19922fd
Fix process dump and docstring formatting
GeigerJ2 Apr 11, 2024
863cc61
Echo missing plugin for `--use-presubmit`
GeigerJ2 Apr 15, 2024
c2244af
Some cleanup and refactor
GeigerJ2 Apr 18, 2024
e369359
Big refactor to make code more concise
GeigerJ2 Apr 18, 2024
f5adf17
Add first version of `--flat` option.
GeigerJ2 Apr 22, 2024
1075416
Shelfed `--use-presubmit` option
GeigerJ2 Apr 22, 2024
671538c
Updated tests for calcjob_dumps after code changes
GeigerJ2 Apr 24, 2024
009b2aa
Finalized tests apart from YAML dumping
GeigerJ2 Apr 25, 2024
5b0e97f
Output of `process status/report/show/` to README
GeigerJ2 Apr 25, 2024
5dac992
Naming: `dump` -> `process_dump` in `cmd_process`
GeigerJ2 Apr 25, 2024
9b0877d
Moved logic for calcjob_io_paths to own function
GeigerJ2 Apr 25, 2024
af4a03e
test
GeigerJ2 Apr 26, 2024
a1930cb
♻️ First working OOP-version using `ProcessDumper`
GeigerJ2 Apr 26, 2024
f7a3f00
All functionality of the Python API fully tested
GeigerJ2 Apr 29, 2024
fc3c181
Add documentation in `How to work with data`
GeigerJ2 Apr 29, 2024
b5a34a9
Fix check for `CalculationNode` in `dump`.
GeigerJ2 Apr 29, 2024
b5e1a4e
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Apr 29, 2024
5464a71
Fix annotations for 3.9 test suite
GeigerJ2 Apr 29, 2024
07ac0e1
Fix `dump_node_yaml` before `CalcJob` dump
GeigerJ2 May 2, 2024
a62f73c
Fix failing test due to missing metadata YAML
GeigerJ2 May 7, 2024
8db2e05
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] May 7, 2024
1cbe414
Final cleanup
GeigerJ2 May 15, 2024
ae9a912
Add `_workflow_dump` and change link node dumping
GeigerJ2 May 16, 2024
ac2acb4
Updated tests
GeigerJ2 May 16, 2024
deb8867
Resolve comments of reviews by Seb and Alex
GeigerJ2 May 19, 2024
8cf53af
Remove underscores for metadata.yaml properties
GeigerJ2 May 21, 2024
74facc2
Update documentation
GeigerJ2 May 22, 2024
497b14c
README as `.md` and wrap `cmd_process` outputs
GeigerJ2 May 23, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
64 changes: 64 additions & 0 deletions docs/source/howto/data.rst
Original file line number Diff line number Diff line change
Expand Up @@ -78,6 +78,70 @@ Ways to find and retrieve data that have previously been imported are described
If none of the currently available data types, as listed by ``verdi plugin list``, seem to fit your needs, you can also create your own custom type.
For details refer to the next section :ref:`"How to add support for custom data types"<topics:data_types:plugin>`.

.. _how-to:data:dump:

Dumping data to disk
--------------------

.. versionadded:: 2.6

It is now possible to dump your executed workflows to disk in a hierarchical directory tree structure. This can be
particularly useful if one is not yet familiar with the ``QueryBuilder`` or wants to quickly explore input/output files
using existing shell scripts or common terminal utilities, such as ``grep``. The dumping can be achieved with the command:

.. code-block:: shell
verdi process dump <pk>
For our beloved ``MultiplyAddWorkChain``, we obtain the following:

.. code-block:: shell
$ verdi process dump <pk> -p dump-multiply_add
Success: Raw files for WorkChainNode <pk> dumped into folder `dump-multiply_add`.
.. code-block:: shell
$ tree -a dump-multiply_add
dump-multiply_add
├── README.md
├── .aiida_node_metadata.yaml
├── 01-multiply
│ ├── .aiida_node_metadata.yaml
│ └── inputs
│ └── source_file
└── 02-ArithmeticAddCalculation
├── .aiida_node_metadata.yaml
├── inputs
│ ├── .aiida
│ │ ├── calcinfo.json
│ │ └── job_tmpl.json
│ ├── _aiidasubmit.sh
│ └── aiida.in
└── outputs
├── _scheduler-stderr.txt
├── _scheduler-stdout.txt
└── aiida.out
The ``README.md`` file provides a description of the directory structure, as well as useful information about the
top-level process. Further, numbered subdirectories are created for each step of the workflow, resulting in the
``01-multiply`` and ``02-ArithmeticAddCalculation`` folders. The raw calculation input and output files ``aiida.in`` and
``aiida.out`` of the ``ArithmeticAddCalculation`` are placed in ``inputs`` and ``outputs``. In addition, these also
contain the submission script ``_aiidasubmit.sh``, as well as the scheduler stdout and stderr, ``_scheduler-stdout.txt``
and ``_scheduler-stderr.txt``, respectively. Lastly, the source code of the ``multiply`` ``calcfunction`` presenting the
first step of the workflow is contained in the ``source_file``.

Upon having a closer look at the directory, we also find the hidden ``.aiida_node_metadata.yaml`` files, which are
created for every ``ProcessNode`` and contain additional information about the ``Node``, the ``User``, and the
``Computer``, as well as the ``.aiida`` subdirectory with machine-readable AiiDA-internal data in JSON format.

Since child processes are explored recursively, arbitrarily complex, nested workflows can be dumped. As already seen
above, the ``-p`` flag allows to specify a custom dumping path. If none is provided, it is automatically generated from
the ``process_label`` (or ``process_type``) and the ``pk``. In addition, the command provides the ``-o`` flag to
overwrite existing directories, the ``-f`` flag to dump all files for each ``CalculationNode`` of the workflow in a flat
directory structure, and the ``--include-inputs/--exclude-inputs`` (``--include-outputs/--exclude-outputs``) flags to
also dump additional node inputs (outputs) of each ``CalculationNode`` of the workflow into ``node_inputs``
(``node_outputs``) subdirectories. For a full list of available options, call :code:`verdi process dump --help`.

.. _how-to:data:import:provenance:

Expand Down
1 change: 1 addition & 0 deletions docs/source/reference/command_line.rst
Original file line number Diff line number Diff line change
Expand Up @@ -367,6 +367,7 @@ Below is a list with all available subcommands.
Commands:
call-root Show root process of the call stack for the given processes.
dump Dump process input and output files to disk.
kill Kill running processes.
list Show a list of running or terminated processes.
pause Pause running processes.
Expand Down
84 changes: 84 additions & 0 deletions src/aiida/cmdline/commands/cmd_process.py
GeigerJ2 marked this conversation as resolved.
Show resolved Hide resolved
Original file line number Diff line number Diff line change
Expand Up @@ -481,3 +481,87 @@ def process_repair(manager, broker, dry_run):
if pid not in set_process_tasks:
process_controller.continue_process(pid)
echo.echo_report(f'Revived process `{pid}`')


@verdi_process.command('dump')
@arguments.PROCESS()
GeigerJ2 marked this conversation as resolved.
Show resolved Hide resolved
@options.PATH()
@options.OVERWRITE()
@click.option(
'--include-inputs/--exclude-inputs',
default=True,
show_default=True,
help='Include the linked input nodes of the `CalculationNode`(s).',
)
@click.option(
'--include-outputs/--exclude-outputs',
default=False,
show_default=True,
help='Include the linked output nodes of the `CalculationNode`(s).',
)
@click.option(
'--include-attributes/--exclude-attributes',
default=True,
show_default=True,
help='Include attributes in the `.aiida_node_metadata.yaml` written for every `ProcessNode`.',
)
@click.option(
'--include-extras/--exclude-extras',
default=True,
show_default=True,
help='Include extras in the `.aiida_node_metadata.yaml` written for every `ProcessNode`.',
)
@click.option(
'-f',
'--flat',
is_flag=True,
default=False,
help='Dump files in a flat directory for every step of the workflow.',
)
def process_dump(
process,
path,
overwrite,
include_inputs,
include_outputs,
include_attributes,
include_extras,
flat,
) -> None:
"""Dump process input and output files to disk.
Child calculations/workflows (also called `CalcJob`s/`CalcFunction`s and `WorkChain`s/`WorkFunction`s in AiiDA
jargon) run by the parent workflow are contained in the directory tree as sub-folders and are sorted by their
creation time. The directory tree thus mirrors the logical execution of the workflow, which can also be queried by
running `verdi process status <pk>` on the command line.
By default, input and output files of each calculation can be found in the corresponding "inputs" and
"outputs" directories (the former also contains the hidden ".aiida" folder with machine-readable job execution
settings). Additional input and output files (depending on the type of calculation) are placed in the "node_inputs"
and "node_outputs", respectively.
Lastly, every folder also contains a hidden, human-readable `.aiida_node_metadata.yaml` file with the relevant AiiDA
GeigerJ2 marked this conversation as resolved.
Show resolved Hide resolved
node data for further inspection.
"""

from aiida.tools.dumping.processes import ProcessDumper

process_dumper = ProcessDumper(
include_inputs=include_inputs,
include_outputs=include_outputs,
include_attributes=include_attributes,
include_extras=include_extras,
overwrite=overwrite,
flat=flat,
)

try:
dump_path = process_dumper.dump(process_node=process, output_path=path)
except FileExistsError:
echo.echo_critical(
'Dumping directory exists and overwrite is False. Set overwrite to True, or delete directory manually.'
)
except Exception as e:
echo.echo_critical(f'Unexpected error while dumping {process.__class__.__name__} <{process.pk}>:\n ({e!s}).')

echo.echo_success(f'Raw files for {process.__class__.__name__} <{process.pk}> dumped into folder `{dump_path}`.')
21 changes: 21 additions & 0 deletions src/aiida/cmdline/params/options/main.py
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,8 @@
###########################################################################
"""Module with pre-defined reusable commandline options that can be used as `click` decorators."""

import pathlib

import click

from aiida.brokers.rabbitmq.defaults import BROKER_DEFAULTS
Expand Down Expand Up @@ -77,6 +79,8 @@
'OLDER_THAN',
'ORDER_BY',
'ORDER_DIRECTION',
'OVERWRITE',
'PATH',
'PAST_DAYS',
'PAUSED',
'PORT',
Expand Down Expand Up @@ -743,3 +747,20 @@ def set_log_level(_ctx, _param, value):
is_flag=True,
help='Print the full traceback in case an exception is raised.',
)

PATH = OverridableOption(
GeigerJ2 marked this conversation as resolved.
Show resolved Hide resolved
'-p',
'--path',
type=click.Path(path_type=pathlib.Path),
show_default=False,
help='Base path for operations that write to disk.',
)

OVERWRITE = OverridableOption(
'--overwrite',
'-o',
is_flag=True,
default=False,
show_default=True,
help='Overwrite file/directory if writing to disk.',
)
4 changes: 2 additions & 2 deletions src/aiida/engine/daemon/execmanager.py
Original file line number Diff line number Diff line change
Expand Up @@ -25,7 +25,7 @@

from aiida.common import AIIDA_LOGGER, exceptions
from aiida.common.datastructures import CalcInfo, FileCopyOperation
from aiida.common.folders import SandboxFolder
from aiida.common.folders import Folder, SandboxFolder
from aiida.common.links import LinkType
from aiida.engine.processes.exit_code import ExitCode
from aiida.manage.configuration import get_config_option
Expand Down Expand Up @@ -66,7 +66,7 @@ def upload_calculation(
node: CalcJobNode,
transport: Transport,
calc_info: CalcInfo,
folder: SandboxFolder,
folder: Folder,
inputs: Optional[MappingType[str, Any]] = None,
dry_run: bool = False,
) -> RemoteData | None:
Expand Down
1 change: 1 addition & 0 deletions src/aiida/tools/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -24,6 +24,7 @@

from .calculations import *
from .data import *
from .dumping import *
from .graph import *
from .groups import *
from .visualization import *
Expand Down
11 changes: 11 additions & 0 deletions src/aiida/tools/dumping/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
###########################################################################
# Copyright (c), The AiiDA team. All rights reserved. #
# This file is part of the AiiDA code. #
# #
# The code is hosted on GitHub at https://github.com/aiidateam/aiida-core #
# For further information on the license, see the LICENSE.txt file #
# For further information please visit http://www.aiida.net #
###########################################################################
"""Modules related to the dumping of AiiDA data."""

__all__ = ('processes',)
Loading
Loading