Skip to content

Commit

Permalink
Add application design ADR (#1)
Browse files Browse the repository at this point in the history
* wip

* Apply suggestions from code review

Co-Authored-By: Hector Castro <[email protected]>

* Apply more suggestions

* leave api versioning as todo

* Fix some sbt problems from initial setup

* Update CD -- feature flags and API versioning

* Remove Vagrant from README

* Note consequences, a security concern, and no bookkeeping fields
  • Loading branch information
jisantuc authored Oct 2, 2019
1 parent 0ce83fe commit 65e42bf
Show file tree
Hide file tree
Showing 4 changed files with 237 additions and 12 deletions.
17 changes: 7 additions & 10 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,27 +3,24 @@ An API project that uses http4s and doobie

## Requirements

- Vagrant 2.0+
- VirtualBox 4.3+
- `docker`
- `docker-compose`

## Quick Setup
```
./scripts/setup
./scripts/update
```

Then `ssh` into the machine when that is complete and start the server with `./scripts/server`. In another shell inside the VM you should be able to make a request:

`http :8080/api/hello/world`

### VM rsync

In order to sync file changes to the Vagrant VM, you'll need to run `vagrant rsync-auto` in a separate terminal while developing. Otherwise files will only be copied into the VM when it boots. `rsync` is used because when compiling inside the VM traditional file-sharing can be slow and cumbersome.

## Developing

The VM includes [HTTPie](https://httpie.org/) and the
It's easier to test changes with the API with [HTTPie](https://httpie.org/) and the
[HTTPie JWT Auth plugin](https://github.com/teracyhq/httpie-jwt-auth) in order to make
authentication with APIs as seamless as possible. All following commands should be run within the VM.
authentication with APIs as seamless as possible. All following commands assume
you have those available.

### Nexus Repository Manager

Expand All @@ -44,4 +41,4 @@ Running migrations and other tasks are managed through `./scripts/migrate`.
### Development workflow
Usually at least two terminals should be open for doing development. The first terminal is where `sbt` should be run (`./scripts/console sbt`). In this terminal tests can be run, projects compiled, and the server assembled.

The other terminal is where the server should be run `./scripts/server` or other one-off commands. To see changes you made to the API live you will need to first assemble the `jar` for the `api` server.
The other terminal is where the server should be run `./scripts/server` or other one-off commands. To see changes you made to the API live you will need to first `assemble` the `jar` for the `api` server.
2 changes: 1 addition & 1 deletion app-backend/build.sbt
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@ lazy val commonSettings = Seq(
organization := "com.rasterfoundry",
name := "granary",
version := "0.0.1-SNAPSHOT",
scalaVersion := "2.12.10",
scalaVersion := "2.12.8",
scalafmtOnCompile := true,
scapegoatVersion := Versions.ScapegoatVersion,
scalacOptions := Seq(
Expand Down
2 changes: 1 addition & 1 deletion app-backend/project/Dependencies.scala
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@ object Versions {
val LogbackVersion = "1.2.3"
val ScapegoatVersion = "1.3.8"
val CirceVersion = "0.12.1"
val DoobieVersion = "0.7.00.8.02"
val DoobieVersion = "0.8.2"
}

object Dependencies {
Expand Down
228 changes: 228 additions & 0 deletions docs/architecture/adr-0001-app-setup-and-goals.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,228 @@
# Context

We need to create an application without a dependency on Raster Foundry that has
the following capabilities:

- run models on imagery
- kick off model runs when it learns about new imagery
- run models without knowing anything about the content of those models
- summarize past model results
- allow users to respond to model run events

Additionally, we'd like the application to meet several design goals that aren't
core to the pursuit of those features:

- we should deploy continuously
- infrastructure should be kept as light as possible

The main question this ADR seeks to answer is how the API works and how
communication between the external world and the API and the API and the model
runner will work. This ADR will not attempt to answer questions of what a user interface
should accomplish, since that is not germane to the short-term (three month-ish)
uses we envision for this project.

## Specific Components

### Summary

The datamodel for the application includes three core entities, modeled here as
scala case classes without imports (and without bookkeeping fields like `id`):

```scala
// An area of interest
case class AOI(
// the geographic bounding box imagery must intersect with
geom: Projected[Geometry],
// the largest acceptable ground sampling distance for new imagery
targetResolution: Double,
// required wavelengths that some of the image's bands must overlap
wavelengths: List[Double],
// what model this AOI is associated with
modelId: UUID
)

// A model
case class Model(
// the name of the model
modelName: String,
// which job definition to use to run predictions using this model
// the container can't be overridden for a job definition, so we must
// know this at job submission time
jobDefinition: String,
// a description of the prediction command, to which a new image and
// a notification endpoint can be appended
command: List[String]
)

// A model run result
case class Prediction(
// the model that this prediction is for
modelId: UUID,
// the AOI that this prediction is for
aoiId: UUID,
// an absolute uri for GeoJSON holding the results of this prediction
uri: URI
)
```

`Model`s will be renamed to something less overloaded for developer sanity at
some later date.

### New imagery from the outside world

When new imagery becomes available from the outside world, this application will
need to match it against models that are interested in that kind of imagery. This
requires tracking some kind of metadata about the incoming imagery. We
developed a fairly complex datamodel in Raster Foundry that tracks information about
imagery, including component files for that imagery and band-level information.
However, we are not a spec, and to the best of our knowledge no one relies on the
Raster Foundry datamodel to describe what imagery updates look like.

Instead, the API will demand that incoming imagery adheres to the [STAC electro-optical](https://github.com/radiantearth/stac-spec/tree/master/extensions/eo)
spec extension. There are several reasons to prefer using STAC for a datamodel.
These include tapping into ongoing work in the geospatial community to standardize
data exchange, making validation easier for anyone hoping to rely on this API,
and reusing our own work around a type-safe STAC datamodel in `geotrellis-server`.

### Running models against new imagery

`POST`ing a new image to the API will cause that image to be matched against
available models. A `model` is some function from raster to raster, raster to
geojson, or raster to number. For simplicity, we'll start with the raster to geojson
case, since it will save us some complexity on figuring out what kinds of
contracts we need to enforce for the other kinds of models and is sufficient for
object detection, chip classification, and semantic segmentation tasks.
`Model`s know what sorts of imagery are acceptable for them to obtain meaningful
results:

- an array of bandwidth requirements
- a minimum spatial resolution
- an `AOI`

We can determine this information for any incoming imagery that adheres to the
`eo` spec using band information.

`Model`s also need to know their own requirements to run. In particular, they
need sufficient information to be able to fill in a `SubmitJob` request for `AWS`
Batch. This includes resource requirements, a docker container accessible from
within the hosting AWS account, a compute environment name, and a command to run.
We require `model`s to carry this information to avoid having to infer it or
guess at it based on heuristics.

Once we identify that a `model` should run against an image, we'll kick off an
AWS Batch job with the model's parameters. The `model`'s command _must_ accept a
STAC item with the `eo` extension as its last parameter.

An important security concern is that the compute environment for running models
must be in a VPC with general internet access (or at least a gateway to make POST
requests to the application), but _without_ access to the database, to prevent
users, even users who've somehow compromised the database credentials, from
executing arbitrary code. If we as a team take responsibility for deployment
code, we should make sure that the resulting configuration passes this check.

### Notifying the API

Since `model`s will be run in some asynchronous workflow, they'll need a way to
notify the API. The form that notification should take is a [STAC item with the labeling extension](https://github.com/radiantearth/stac-spec/tree/master/extensions/label).
Reasons to lean on STAC here are similar to the reasons for leaning on it for new imagery. The API will accept a `POST` to a particular model
that contains a STAC item with an asset that points to predictions on the new im
agery.

It's possible that future work could allow discrimination between predictions th
at need time to run an asynchronous process and models that can do synchronous
prediction. Since we don't know this is necessary or even possible, we'll assume
for now that all predictions happen asynchronously and that all updates to the
Granary API will happen via a later `POST` request.

### Notifying end users

When we create `model`s, we will also create AWS Simple Notification Service
topics for those models. For our purposes we can think of these as an
inexhaustible resource, since each AWS account is allowed 100,000 topics. At a
time in the future when we think we are in danger of going over that limit we
can think about a new strategy or ask for a service limit increase.
Whenever a model is updated -- either a new prediction run has been kicked off,
results are available, or metadata about that model has changed -- we'll send
an event to the SNS topic for that model. The application will be deployable
in any AWS account (including a client's account), so we assume that clients
also have the ability to subscribe to those SNS topics and can subscribe
to notifications.

### Impact of continuous deployment

There are several consequences of the choice to aim for continuous deployment:

- Continuous integration checks passing on a PR _must_ mean that the branch is
safe to deploy. This will require more care around migrations than usual and
a more thorough exercise of the application in tests than we have in Raster
Foundry. As long as this repository is just an API, that probably means
generative testing of workflows modeled through API interaction both for
correctness and performance regressions. I believe that we already have the
tools and expertise on the team to do this.
- The application should also include smoke tests. Smoke tests will be a new
feature of CI for us. We should be able to rely on the small application scope
to ensure that we can define a reasonable standard for "not on fire."
- This repository probably shouldn't follow the `git flow` pattern, since the
relationship between `develop` and `master` will be pretty
confused (`develop = master` at all times). It is currently `develop` for
consistency and to avoid the appearance of having made that choice final before
this ADR has been reviewed.
- To enable the frontend to change without the backend's being ready, we will need
to work out a sensible story about feature flags. We'll also need to enable the
backend to change, even in breaking ways, without the frontend needing to know
or care. A likely candidate to solve the latter problem is API versioning. Later
ADRs should address implementation details for both of these problems.

### Impact of keeping infrastructure as light as possible

One example of how this motivation has already played out is the choice of
how to run models. We originally considered two possibilities for how to submit
a model run.

The first option was that we demand that consumers deploy some server somewhere
that accepts `POST`s of new imagery, responds with information about where the
output will be written, and eventually notifies the Granary API that it can go
fetch the imagery. The second option is what's written above.

While the option we chose requires more infrastructure -- another AWS service,
the creation of compute environments, job definitions, and roles necessary for
running batch jobs -- this is a case where the "as possible" part of the
motivation does a lot of work. Given that the service is designed for data
scientists easily to run predictions against new imagery, we had to weigh that
increase in infrastructure complexity against the presumed needs of potential
users. Since we believe this potential user's strike zone more likely includes
creating a docker container for running predictions than deploying a RESTful
model-as-a-service with its own (potentially) asynchronous task runner, we
chose for Granary to own this particular piece of infrastructure complexity.

A second example is in the choice of SNS as the notification service. An
alternative was to include webhooks in the application, where users would
tell Granary where they wanted notifications sent. This would have required
us to understand the availability characteristics of those endpoints as well
as to model in the application the possibility that users might want multiple
services to be notified. Instead, by pushing to SNS, we trade consuming a
small part of another AWS service's API for needing to model and respond
appropriately to an unbounded set of external resources. SNS allows us to
achieve a clear separation of concerns.

# Consequences

Because we chose continuous deployment, we'll need to:

- revert the default branch to `master`
- create a pull request template that expresses the constraints
we've committed to by choosing continuous deployment
- set up CI to deploy on every merge to `master`

We'll also need to do standard startup work for a new Scala backend. Much
of that work was free, thanks to the azavea.g8 template. However, we'll
need to create a datamodel and `DAO`s and routes. We're good at this :tm:
now, so I don't think this needs much elaboration, and most of the details
will come out in implementation I think.

We'll also need to make a decision about where deployment code lives. Since this
is supposed to be open source and portable, I think it makes sense for us
to bring deployment into the open source repository. The downside to that is
that our open source tool will come with some vendor lock-in. On the other
hand, if deployment lives separately, our small open source tool will
probably never be deployed by anyone else.

0 comments on commit 65e42bf

Please sign in to comment.