Skip to content

Commit

Permalink
Merge pull request #32 from wpbonelli/sdd
Browse files Browse the repository at this point in the history
scope pruning, copy editing, data model description fix
  • Loading branch information
wpbonelli authored Sep 6, 2024
2 parents d06424f + 907e851 commit e66b09d
Show file tree
Hide file tree
Showing 3 changed files with 73 additions and 150 deletions.
205 changes: 59 additions & 146 deletions docs/dev/sdd.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,8 +5,6 @@

- [Principles](#principles)
- [Overview](#overview)
- [Runtime](#runtime)
- [Plugins](#plugins)
- [Objects](#objects)
- [Counting the ways](#counting-the-ways)
- [Context resolution](#context-resolution)
Expand Down Expand Up @@ -53,118 +51,50 @@ the object model in rare instances.

## Overview

FloPy can provide a generic framework for hydrologic
models.
FloPy can provide a basic set of building blocks with
which MODFLOW 6 input parameters can be defined and
configured: **parameters** and **contexts** (groups
of parameters and possibly nested contexts).

FloPy can consist of plugins, each defining a wrapper
for a given hydrologic program. Programs are expected
to provide an unambiguous input specification.
Ideally, the input framework might be agnostic to the
program it represents. This could make it possible to
support other programs with a consistent/unified API,
though that is not an immediate goal.

FloPy can provide a basic set of building blocks with
which a program's input parameters can be defined and
configured. This will consist of parameters in nested
contexts. (This is made more rigorous below.)

Provided an input specification for a program, FloPy
can generate an object-oriented Python interface for
it. This will consist of an **object model** (input
data model) and **IO module** (data access layer).

Once these exist, they *are* the specification —
specification documents should be derivable from them
in reverse.

FloPy will provide a **plugin runtime** which accepts
a program selection and an input configuration.

The runtime will validate the latter, run the program,
report its progress, and make its results available.

### Runtime

FloPy will provide a plugin runtime whose purpose is to
wrap/run arbitrary hydrologic programs. **Simulation**
is the fundamental abstraction: we could consider the
simulation a *plan for how to execute a program*.

This at odds with the standard terminology in MODFLOW 6,
where a simulation means the runtime itself. FloPy, as
an interface to programs, could reasonably call the thing
that becomes the simulation the simulation; seems benign
(and maybe even appropriate) effacement of a meaningful
distinction for reasons of precedent and familiarity.

A distinct abstraction could represent the "task" that
runs the program. A third could represent its output.
The latter should be derivable from the simulation, if
results are available in a given workspace, so results
can still be retrieved easily in a subsequent session,
or by someone else provided the workspace contents.

Runs could have an autogenerated GUID and an optional
name. Anonymous runs' names could default to the GUID.

Scheduling seems like it may benefit from asynchrony.
While programs should ideally make maximal use of the
resources provided to them, one might want to run more
than one single-threaded program at once, without the
need for a separate Python interpreter for each one.

An awaitable (coroutine-based) API, returning futures
instead of blocking, could allow an arbitrary number
of concurrent runs.

If this is pursued, a synchronous alternative should
be provided which runs programs directly as done now.

### Plugins

A developer will first implement a plugin supporting some
hydrologic modeling program. This *must* include an input
specification and *may* involve overriding hooks provided
by the FloPy framework to customize various behavior. If
the program requires input in a bespoke language, or its
specification is written in such a language, the plugin
*must* include an explicit language specification, which
is used to generate a parser for the language. By default
FloPy will support standard data interchange formats such
such as JSON, TOML, and YAML.

Provided a valid input specification, FloPy will generate
a Python interface for the given program. At minimum this
consists of an object model and a data access layer. The
former is user-facing, and will invoke the latter behind
the scenes to read and write plugin input configurations.

A model developer will interact with the plugin, using the
interface layer to construct a model, and with the runtime
framework, to run the model. The interface layer will also
provide access to model results.
Provided an **input specification** for MF6, FloPy
can generate an object-oriented Python interface,
consisting of an **object model** (components of
the simulation and their input/output contexts)
and an **IO module** (data access layer).

### Objects
Once these exist, an input specification should be
derivable from them in reverse.

We want:
A **runtime** can validate the configuration, run the
program, report its progress, and expose its results.

- an intuitive, consistent, & expressive interface
to a broad range of programs
FloPy will support the MODFLOW 6 input language as well as
standard data interchange formats such such as JSON, TOML,
and YAML.

- a small core codebase and a largely autogenerated
user-facing input data model
### Objects

- an unsurprising and uncomplicated core framework
accessible to new contributors
Ideally, an object model might have

- more consistent (and fewer) points of entry
- a small core codebase and a largely autogenerated
user-facing API

- easy access to a program's input specification
- consistent (and few) points of entry

- easy access to a simulation's input configuration
- easy access to components' input specification

- hierarchical namespacing and context resolution
- easy access to components' current input data

- automatic enforcement of program invariants
- hierarchical namespacing and context-awareness

...and more.
- validation capabilities, and a way to detect when a
simulation is "dirty": needing validation again after
parameters have changed.

#### Counting the ways

Expand Down Expand Up @@ -243,76 +173,59 @@ specification of arbitrary structure and content to a program-agnostic data mode

A **parameter** is a program input variable.

A parameter is a leaf in the **context tree**. The
A parameter is a leaf in the **context**. A context
is a map of parameters and/or nested contexts. The
simulation is the root.

A parameter is a primitive value or a **composite**
of such.
A parameter is a primitive value or a **composite**.

Primitive parameters are **scalar** (int, float, bool,
string, path) or **array-like**.
string, path), **array-like**, or **tabular**.

> [!NOTE]
> Ideally a data model would be dependency-agnostic,
but we view NumPy as de facto standard library and
accept its array primitives — especially as
they have made recent advancements in type hinting.
but NumPy's array primitives are nearly universal.
If there is ever need to define array abstractions
of our own, we could take inspiration from
[astropy](https://github.com/astropy/astropy).

Composite parameters are **record** and **union**
(product and sum, respectively) types, as well as
**lists** of primitives or records. A record is a
named and ordered tuple of primitives.
**lists** of records.

> [!NOTE]
> Records are shown as `Dict` for demonstration,
but need implementing as an `attrs`-based class
so the parameter specification is discoverable
upon import.
A record is a tuple of scalar parameters.

A list may constrain its elements to parameters of
a single scalar or record type, or may hold unions
of such.

> [!NOTE]
> On this view an MF6 keystring is a `typing.Union`
of records and a period block is a list of `Union`s
of records. Most packages' `packagedata` block, on
the other hand, have a regular shape, and can thus
be considered tabular.

A context is a map of parameters. So is a record;
the operative difference is that composites cannot
contain nested parameters. A context is a non-leaf
node in the tree which can contain both parameters
and other contexts.

We envision a nested hierarchy of `attrs`-based
classes, all acting like dictionaries, making up
the context tree. These will include composites:
strongly typed records and unions will be more
convenient to work with.

So, FloPy can define a parameter as:
The data model can be specified roughly as:

```python
from typing import Dict, List
from numpy.typing import ArrayLike
from pandas import DataFrame

Scalar = bool | int | float | str | Path
Record = Dict[str, Scalar | List[Scalar]]
List = List[Scalar | Record]

Scalar = Union[bool, int, float, str, Path]
Record = Tuple[Scalar, ...]
Array = ArrayLike
Table = DataFrame
Param = Scalar | Record | List | Array | Table
Param = Union[Scalar, Array, Table, Record]
```

This is proposed as a general foundation onto which
it should be possible to map input specifications
for a wide range of programs, not only MODFLOW 6.
An MF6 keystring can be represented as a union of
records. Period blocks are lists of record unions.
Most packages' `packagedata` block, on the other
hand, are tabular (regularly shaped), and can be
represented with a `DataFrame`.

TODO: how to specify the table schema/dtypes?

A nested hierarchy of `attrs`-based classes can
form the context tree and composite parameters,
though records are simply tuples (and contexts
dictionaries) from a serializer's perspective.

It should be possible to map input specifications
for a wide range of programs onto this foundation,
not only MODFLOW 6.

#### Arrays

Expand Down
11 changes: 8 additions & 3 deletions docs/dev/srs.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,9 +4,14 @@
<!-- DON'T EDIT THIS SECTION, INSTEAD RE-RUN doctoc TO UPDATE -->

- [Introduction](#introduction)
- [Intended audience](#intended-audience)
- [Value proposition](#value-proposition)
- [Project scope](#project-scope)
- [Product scope](#product-scope)
- [Product value](#product-value)
- [Intended audience:](#intended-audience)
- [Intended use:](#intended-use)
- [Use cases](#use-cases)
- [System requirements and functional requirements](#system-requirements-and-functional-requirements)
- [External interface requirements](#external-interface-requirements)
- [Non-functional requirements (NRFs)](#non-functional-requirements-nrfs)
- [Motivation](#motivation)
- [Consistency](#consistency)
- [Maintenance](#maintenance)
Expand Down
7 changes: 6 additions & 1 deletion flopy4/attrs.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,7 @@
from typing import (
Any,
Optional,
Tuple,
TypeVar,
Union,
)
Expand All @@ -28,7 +29,11 @@
"""A table input parameter."""


Param = Union[Scalar, Array, Table]
Record = Tuple[Scalar, ...]
"""A record input parameter."""


Param = Union[Scalar, Array, Table, Record]
"""An input parameter."""


Expand Down

0 comments on commit e66b09d

Please sign in to comment.