Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement a better Context class #97

Merged
merged 21 commits into from
Nov 28, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
21 commits
Select commit Hold shift + click to select a range
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions CHANGES.txt
Original file line number Diff line number Diff line change
Expand Up @@ -5,11 +5,13 @@ IMPORTANT NOTES:

1. This release will drop Python3.6 support! (#88, #95)
2. This release will drop support for legacy task names! (#96)
3. This release introduces dotted keys (#97)


Features
--------
- #96: Drop legacy task names
- #97: Better context class. This allows dotted keys!


Fixes
Expand Down
24 changes: 23 additions & 1 deletion docs/sphinx/base-tasks.rst
Original file line number Diff line number Diff line change
Expand Up @@ -79,13 +79,35 @@ current values in the ScriptEngine context. This can be used to clean data::
- two
# mylist is ['one', 'two']
- base.context:
mylist: null # "remove" the value if mylist
mylist: null # "remove" the value of mylist
- base.context:
mylist:
- 3
- 4
# mylist is now [3, 4]

Since the ScriptEngine context will usually hold a lot of information, it is
most of the time helpful to structure the parameters in nested levels. However,
this can lead to overly verbose scripts, such as::

- base.context:
my:
deeply:
nested:
parameter: value

It is therefore allowed in ScriptEngine scripts to refer to context parameters
using "dotted keys", as in::

- base.context:
my.deeply.nested.parameter: value

This feature allows a shorter notation wherever context parameters are accessed
by their names.

.. versionadded:: 1.0
Dotted key access.


``base.context.from``
^^^^^^^^^^^^^^^^^^^^^
Expand Down
7 changes: 3 additions & 4 deletions docs/sphinx/cli.rst
Original file line number Diff line number Diff line change
Expand Up @@ -73,11 +73,10 @@ switched of with the ``--nocolor`` argument.

As seen in the output of ``se --help`` above, ScriptEngine lists all
available task in the current installation. ScriptEngine uses dynamic task
loading (see :ref:`Concepts`) and additional task can be installed from
loading (see :ref:`concepts:concepts`) and additional task can be installed from
Python packages. In the example above, all tasks from the build-in ``base.*``
package are available. Furthermore, the ``hpc.slurm.sbatch`` task is
provided, which comes from the ``scriptengine-tasks-hpc`` (go `there`_) Python
package.
package are available. Furthermore, the ``hpc.slurm.sbatch`` task is provided,
which comes from the ``scriptengine-tasks-hpc`` (go `there`_) Python package.

Note that ScriptEngine task names follow a namespace scheme to prevent name
clashes for tasks from different packages.
Expand Down
52 changes: 42 additions & 10 deletions docs/sphinx/concepts.rst
Original file line number Diff line number Diff line change
Expand Up @@ -50,9 +50,9 @@ tasks in two ways:
* jobs can add conditionals and loops to tasks.

Corresponding to these two cases, jobs use the special ``do`` keyword to specify
sequences of tasks (see :ref:`scripts:Do`), and/or ``when`` or ``loop`` clauses
clauses to specify :ref:`scripts:conditionals` and :ref:`scripts:loops`,
respectively.
sequences of tasks (see :ref:`do <scripts:do>`), and/or ``when`` or ``loop``
clauses clauses to specify :ref:`conditionals <scripts:conditionals>` and
:ref:`loops <scripts:loops>`, respectively.


Scripts
Expand Down Expand Up @@ -106,25 +106,55 @@ An important concept in ScriptEngine is the task context, or short, the
key, value pairs. ScriptEngine tasks can store and retrieve information from the
context.

When a ScriptEngine instance is created, the context is initialised. Some
information about the execution environment is stored by the ScriptEngine
instance in the new context. Then, it is passed to every task that is executed.
Usually, the context will be populated with information as tasks are processed.
When a ScriptEngine :ref:`instance <concepts:scriptengine instances>` is
created, the context is initialised. Some information about the execution
environment is stored by the ScriptEngine instance in the new context. Then, it
is passed to every task that is executed. Usually, the context will be
populated with information as tasks are processed.

We have already seen the usage of the context in the "Hello world" example
above. The ``context`` task stored a parameter named ``planet`` in the context
and the ``echo`` task used the information from the context to display its
message.
above. The ``base.context`` task stored a parameter named ``planet`` in the
context and the ``base.echo`` task used the information from the context to
display its message.

Since the context is a Python dictionary, it can store any Python data types.
This is, for example, often used to structure information by storing further
dictionaries in the context. Numbers and dates are other examples for useful
data types for context information.

The ScriptEngine task context extends the functionality of a Python dictionary
in two important aspects:

- allow for "dotted keys" in order to access nested dictionary values,
- allow for merging of contexts with the help of `deepmerge
<https://deepmerge.readthedocs.io/en/latest/>`_,
- allow to store and load the context from/to a file.

Dotted keys are helpful for writing ScriptEngine scripts in YAML, because they
can substantially shorten the syntax when working with the context. See the
description of the :ref:`base-tasks:``base.context``` task for examples. Using
dotted keys can also simplify the access to context parameters in Jinja
experssions.

Context merging is a central concept for ScriptEngine. When tasks are executed,
they are allowed to update the context. These updates are implemented as deep
merges of the context dictionary, which makes it possible to add keys to nested
levels of the dictionary, or add items to lists.

Last not least, storing the context in, and loading from, a file, allows
ScriptEngine to achieve persistency. This enables, among other possibilities, to
pick off the context from a previous run.

.. versionadded:: 1.0
Dotted keys and context load/store.


YAML
----

Short examples of important YAML structures follow below. For further
explanation and links, refer to the `YAML homepage <https://yaml.org/>`_.

YAML syntax for lists::

- apple
Expand Down Expand Up @@ -182,3 +212,5 @@ example::

Jinja2 Templating
-----------------

`Jinja homepage and documentation <https://jinja.palletsprojects.com>`_
1 change: 0 additions & 1 deletion environment.yml
Original file line number Diff line number Diff line change
Expand Up @@ -8,5 +8,4 @@ dependencies:
- jinja2
- python-dateutil
- deepmerge
- deepdiff>=5.7.0,!=6.2.0,!=6.2.1
- pip
1 change: 0 additions & 1 deletion pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,6 @@ dependencies = [
"importlib_metadata; python_version<'3.8'",
"python-dateutil",
"deepmerge",
"deepdiff>=5.7.0,!=6.2.0,!=6.2.1",
"PyYAML",
"jinja2",
]
Expand Down
1 change: 0 additions & 1 deletion setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,6 @@
"importlib_metadata; python_version<'3.8'",
"python-dateutil",
"deepmerge",
"deepdiff>=5.7.0,!=6.2.0,!=6.2.1",
"PyYAML",
"jinja2",
],
Expand Down
202 changes: 115 additions & 87 deletions src/scriptengine/context.py
Original file line number Diff line number Diff line change
@@ -1,96 +1,124 @@
import sys
from copy import deepcopy
from collections import UserDict
from collections.abc import Mapping
from typing import Any

from deepdiff import DeepDiff, Delta
import yaml
from deepmerge import always_merger

_python_version = (sys.version_info.major, sys.version_info.minor)
KEY_SEP = "."


def save_copy(context):
"""For versions other than 3.6, this is just copy.deepcopy.
For 3.6, we have to remove ["se"]["instance"] before deepcopying and restore
afterwards, because this constitutes some recursive reference that 3.6 has a
problem with.
class Context(UserDict):
"""
if _python_version != (3, 6):
return deepcopy(context)
singleton = object()
try:
se_instance = context["se"]["instance"]
context["se"]["instance"] = None
except KeyError:
se_instance = singleton
copied_context = deepcopy(context)
if se_instance is not singleton:
copied_context["se"]["instance"] = context["se"]["instance"] = se_instance
return copied_context


def context_delta(first, second):
"""The deepdiff.Delta's __init__ and __add__ functions use copy.deepcopy
internally, which does not work with the SE context (see above).
Therefore, we ignore context["se"]["instance"] before creating Delta. This
means, that for Python 3.6 this item in the context is not part of the Delta,
even if it was changed between first and second!
Also, we have to create the Delta with mutate=True to avoid another deepcopy.
The ScriptEngine Context provides context information for Tasks

https://scriptengine.readthedocs.io/en/latest/concepts.html#task-context
The Context is a special dict that allows
* dotted keys, i.e. c["foo.bar"] is equivalent to c["foo"]["bar"]
* deep merges of other Mappings (via deepmerge.always_merger)
* save and load the data to/from a file-like object

Methods
-------
merge(other)
Deep merges other into the context

reset(keep=None)
Deletes all data from the context, except for the keys listed in 'keep'

save(stream)
...

load(stream)
...
"""
if _python_version != (3, 6):
return Delta(DeepDiff(first, second))

singleton = object()
try:
first_se_instance = first["se"]["instance"]
first["se"]["instance"] = None
except KeyError:
first_se_instance = singleton
try:
second_se_instance = second["se"]["instance"]
second["se"]["instance"] = None
except KeyError:
second_se_instance = singleton
delta = Delta(DeepDiff(first, second), mutate=True)
if first_se_instance is not singleton:
first["se"]["instance"] = first_se_instance
if second_se_instance is not singleton:
second["se"]["instance"] = second_se_instance
return delta


class Context(dict):
pass


class ContextUpdate:
def __init__(self, base=None, second=None):
if base is None:
self.merge = self.delta = None
elif second is None:
if not isinstance(base, dict):
raise TypeError(
"First argument of ContextUpdate() must be None or a dict"
)
self.merge = base
self.delta = None

def __getitem__(self, key: Any) -> Any:
"""Return x[key] where key is possibly a dotted key"""
def iter_getitem(data, subkey):
try:
return data[subkey]
except TypeError: # dotted key with too many components
raise KeyError(subkey)
except KeyError:
pass
try:
first, _, remain = subkey.partition(KEY_SEP)
except AttributeError:
raise KeyError(subkey)
return iter_getitem(data[first], remain)

try:
return iter_getitem(self.data, key)
except KeyError as e:
raise KeyError(
f"{key} (subkey {e} not found)" if str(key) != str(e) else key
) from None

def __setitem__(self, key: Any, item: Any) -> None:
"""Set x[key]=item where key is possibly a dotted key"""
try:
keys = key.split(KEY_SEP)
except AttributeError:
self.data[key] = item
else:
self.merge = None
self.delta = context_delta(base, second)

def __radd__(self, other):
if isinstance(other, dict):
if self.merge:
return always_merger.merge(other, self.merge)
elif self.delta:
return other + self.delta
return other
raise TypeError(
f"Unsupported operand types for +: '{type(other)}' and 'ContextUpdate'"
)

def __repr__(self):
if self.merge:
return f"<ContextUpdate: merge {self.merge}>"
elif self.delta:
return f"<ContextUpdate: {self.delta}>"
d = self.data
for k in keys[:-1]:
# allow overwriting of non-mapping keys
if k not in d or not isinstance(d[k], Mapping):
d[k] = {}
d = d[k]
# Make sure that nested dotted keys are resolved!
d[keys[-1]] = Context(item).data if isinstance(item, Mapping) else item

def __contains__(self, key: object) -> bool:
def iter_contains(data, subkey):
if subkey in data:
return True
try:
first, _, remain = subkey.partition(KEY_SEP)
except AttributeError:
return False
if first in data:
return iter_contains(data[first], remain)
return False

return iter_contains(self.data, key)

def __str__(self) -> str:
return f"Context({self.data})"

def __add__(self, other):
if isinstance(other, Mapping):
self.merge(other)
return self
return NotImplemented

def merge(self, other):
if isinstance(other, Context):
always_merger.merge(self.data, other.data)
elif isinstance(other, Mapping):
always_merger.merge(self.data, other)
else:
return "<ContextUpdate: None>"
raise TypeError(f"can not merge Context and {type(other).__name__}")

def reset(self, keep=None):
save_copy = Context()
for k in keep or []:
if k in self:
save_copy[k] = self[k]
self.data.clear()
for k in save_copy:
self[k] = save_copy[k]

def load(self, stream):
self.data = yaml.safe_load(stream)

def save(self, stream):
yaml.dump(self.data, stream, sort_keys=False)


# from dotty_dict import Dotty
# class Context(Dotty):
# def __init__(self, dictionary=None):
# super().__init__(dictionary or {}, no_list=True)
Loading
Loading