Add application design ADR (#1)

* wip * Apply suggestions from code review Co-Authored-By: Hector Castro <[email protected]> * Apply more suggestions * leave api versioning as todo * Fix some sbt problems from initial setup * Update CD -- feature flags and API versioning * Remove Vagrant from README * Note consequences, a security concern, and no bookkeeping fields
raster-foundry · Oct 2, 2019 · 65e42bf · 65e42bf
1 parent 0ce83fe
commit 65e42bf
Show file tree

Hide file tree

Showing 4 changed files with 237 additions and 12 deletions.
diff --git a/README.md b/README.md
@@ -3,27 +3,24 @@ An API project that uses http4s and doobie
 
 ## Requirements
 
-- Vagrant 2.0+
-- VirtualBox 4.3+
+- `docker`
+- `docker-compose`
 
 ## Quick Setup
 ```
-./scripts/setup
+./scripts/update
 ```
 
 Then `ssh` into the machine when that is complete and start the server with `./scripts/server`. In another shell inside the VM you should be able to make a request:
 
 `http :8080/api/hello/world`
 
-### VM rsync
-
-In order to sync file changes to the Vagrant VM, you'll need to run `vagrant rsync-auto` in a separate terminal while developing. Otherwise files will only be copied into the VM when it boots. `rsync` is used because when compiling inside the VM traditional file-sharing can be slow and cumbersome.
-
 ## Developing
 
-The VM includes [HTTPie](https://httpie.org/) and the
+It's easier to test changes with the API with [HTTPie](https://httpie.org/) and the
 [HTTPie JWT Auth plugin](https://github.com/teracyhq/httpie-jwt-auth) in order to make
-authentication with APIs as seamless as possible. All following commands should be run within the VM.
+authentication with APIs as seamless as possible. All following commands assume
+you have those available.
 
 ### Nexus Repository Manager
 
@@ -44,4 +41,4 @@ Running migrations and other tasks are managed through `./scripts/migrate`.
 ### Development workflow
 Usually at least two terminals should be open for doing development. The first terminal is where `sbt` should be run (`./scripts/console sbt`). In this terminal tests can be run, projects compiled, and the server assembled.
 
-The other terminal is where the server should be run `./scripts/server` or other one-off commands. To see changes you made to the API live you will need to first assemble the `jar` for the `api` server.
+The other terminal is where the server should be run `./scripts/server` or other one-off commands. To see changes you made to the API live you will need to first `assemble` the `jar` for the `api` server.
diff --git a/app-backend/build.sbt b/app-backend/build.sbt
@@ -9,7 +9,7 @@ lazy val commonSettings = Seq(
   organization := "com.rasterfoundry",
   name := "granary",
   version := "0.0.1-SNAPSHOT",
-  scalaVersion := "2.12.10",
+  scalaVersion := "2.12.8",
   scalafmtOnCompile := true,
   scapegoatVersion := Versions.ScapegoatVersion,
   scalacOptions := Seq(

diff --git a/app-backend/project/Dependencies.scala b/app-backend/project/Dependencies.scala
@@ -9,7 +9,7 @@ object Versions {
   val LogbackVersion = "1.2.3"
   val ScapegoatVersion = "1.3.8"
   val CirceVersion = "0.12.1"
-  val DoobieVersion = "0.7.00.8.02"
+  val DoobieVersion = "0.8.2"
 }
 
 object Dependencies {

diff --git a/docs/architecture/adr-0001-app-setup-and-goals.md b/docs/architecture/adr-0001-app-setup-and-goals.md
@@ -0,0 +1,228 @@
+# Context
+
+We need to create an application without a dependency on Raster Foundry that has
+the following capabilities:
+
+- run models on imagery
+- kick off model runs when it learns about new imagery
+- run models without knowing anything about the content of those models
+- summarize past model results
+- allow users to respond to model run events
+
+Additionally, we'd like the application to meet several design goals that aren't
+core to the pursuit of those features:
+
+- we should deploy continuously
+- infrastructure should be kept as light as possible
+
+The main question this ADR seeks to answer is how the API works and how
+communication between the external world and the API and the API and the model
+runner will work. This ADR will not attempt to answer questions of what a user interface
+should accomplish, since that is not germane to the short-term (three month-ish)
+uses we envision for this project.
+
+## Specific Components
+
+### Summary
+
+The datamodel for the application includes three core entities, modeled here as
+scala case classes without imports (and without bookkeeping fields like `id`):
+
+```scala
+// An area of interest
+case class AOI(
+  // the geographic bounding box imagery must intersect with
+  geom: Projected[Geometry],
+  // the largest acceptable ground sampling distance for new imagery
+  targetResolution: Double,
+  // required wavelengths that some of the image's bands must overlap
+  wavelengths: List[Double],
+  // what model this AOI is associated with
+  modelId: UUID
+)
+
+// A model
+case class Model(
+  // the name of the model
+  modelName: String,
+  // which job definition to use to run predictions using this model
+  // the container can't be overridden for a job definition, so we must
+  // know this at job submission time
+  jobDefinition: String,
+  // a description of the prediction command, to which a new image and
+  // a notification endpoint can be appended
+  command: List[String]
+)
+
+// A model run result
+case class Prediction(
+  // the model that this prediction is for
+  modelId: UUID,
+  // the AOI that this prediction is for
+  aoiId: UUID,
+  // an absolute uri for GeoJSON holding the results of this prediction
+  uri: URI
+)
+```
+
+`Model`s will be renamed to something less overloaded for developer sanity at
+some later date.
+
+### New imagery from the outside world
+
+When new imagery becomes available from the outside world, this application will
+need to match it against models that are interested in that kind of imagery. This
+requires tracking some kind of metadata about the incoming imagery. We
+developed a fairly complex datamodel in Raster Foundry that tracks information about
+imagery, including component files for that imagery and band-level information.
+However, we are not a spec, and to the best of our knowledge no one relies on the
+Raster Foundry datamodel to describe what imagery updates look like.
+
+Instead, the API will demand that incoming imagery adheres to the [STAC electro-optical](https://github.com/radiantearth/stac-spec/tree/master/extensions/eo)
+spec extension. There are several reasons to prefer using STAC for a datamodel.
+These include tapping into ongoing work in the geospatial community to standardize
+data exchange, making validation easier for anyone hoping to rely on this API,
+and reusing our own work around a type-safe STAC datamodel in `geotrellis-server`.
+
+### Running models against new imagery
+
+`POST`ing a new image to the API will cause that image to be matched against
+available models. A `model` is some function from raster to raster, raster to
+geojson, or raster to number. For simplicity, we'll start with the raster to geojson
+case, since it will save us some complexity on figuring out what kinds of
+contracts we need to enforce for the other kinds of models and is sufficient for
+object detection, chip classification, and semantic segmentation tasks.
+`Model`s know what sorts of imagery are acceptable for them to obtain meaningful
+results:
+
+- an array of bandwidth requirements
+- a minimum spatial resolution
+- an `AOI`
+
+We can determine this information for any incoming imagery that adheres to the
+`eo` spec using band information.
+
+`Model`s also need to know their own requirements to run. In particular, they
+need sufficient information to be able to fill in a `SubmitJob` request for `AWS`
+Batch. This includes resource requirements, a docker container accessible from
+within the hosting AWS account, a compute environment name, and a command to run.
+We require `model`s to carry this information to avoid having to infer it or
+guess at it based on heuristics.
+
+Once we identify that a `model` should run against an image, we'll kick off an
+AWS Batch job with the model's parameters. The `model`'s command _must_ accept a
+STAC item with the `eo` extension as its last parameter.
+
+An important security concern is that the compute environment for running models
+must be in a VPC with general internet access (or at least a gateway to make POST
+requests to the application), but _without_ access to the database, to prevent
+users, even users who've somehow compromised the database credentials, from
+executing arbitrary code. If we as a team take responsibility for deployment
+code, we should make sure that the resulting configuration passes this check.
+
+### Notifying the API
+
+Since `model`s will be run in some asynchronous workflow, they'll need a way to
+notify the API. The form that notification should take is a [STAC item with the labeling extension](https://github.com/radiantearth/stac-spec/tree/master/extensions/label).
+Reasons to lean on STAC here are similar to the reasons for leaning on it for new imagery. The API will accept a `POST` to a particular model
+that contains a STAC item with an asset that points to predictions on the new im
+agery.
+
+It's possible that future work could allow discrimination between predictions th
+at need time to run an asynchronous process and models that can do synchronous
+prediction. Since we don't know this is necessary or even possible, we'll assume
+for now that all predictions happen asynchronously and that all updates to the
+Granary API will happen via a later `POST` request.
+
+### Notifying end users
+
+When we create `model`s, we will also create AWS Simple Notification Service
+topics for those models. For our purposes we can think of these as an
+inexhaustible resource, since each AWS account is allowed 100,000 topics. At a
+time in the future when we think we are in danger of going over that limit we
+can think about a new strategy or ask for a service limit increase.
+Whenever a model is updated -- either a new prediction run has been kicked off,
+results are available, or metadata about that model has changed -- we'll send
+an event to the SNS topic for that model. The application will be deployable
+in any AWS account (including a client's account), so we assume that clients
+also have the ability to subscribe to those SNS topics and can subscribe
+to notifications.
+
+### Impact of continuous deployment
+
+There are several consequences of the choice to aim for continuous deployment:
+
+- Continuous integration checks passing on a PR _must_ mean that the branch is
+  safe to deploy. This will require more care around migrations than usual and
+  a more thorough exercise of the application in tests than we have in Raster
+  Foundry. As long as this repository is just an API, that probably means
+  generative testing of workflows modeled through API interaction both for
+  correctness and performance regressions. I believe that we already have the
+  tools and expertise on the team to do this.
+- The application should also include smoke tests. Smoke tests will be a new
+  feature of CI for us. We should be able to rely on the small application scope
+  to ensure that we can define a reasonable standard for "not on fire."
+- This repository probably shouldn't follow the `git flow` pattern, since the
+  relationship between `develop` and `master` will be pretty
+  confused (`develop = master` at all times). It is currently `develop` for
+  consistency and to avoid the appearance of having made that choice final before
+  this ADR has been reviewed.
+- To enable the frontend to change without the backend's being ready, we will need
+  to work out a sensible story about feature flags. We'll also need to enable the
+  backend to change, even in breaking ways, without the frontend needing to know
+  or care. A likely candidate to solve the latter problem is API versioning. Later
+  ADRs should address implementation details for both of these problems.
+
+### Impact of keeping infrastructure as light as possible
+
+One example of how this motivation has already played out is the choice of
+how to run models. We originally considered two possibilities for how to submit
+a model run.
+
+The first option was that we demand that consumers deploy some server somewhere
+that accepts `POST`s of new imagery, responds with information about where the
+output will be written, and eventually notifies the Granary API that it can go
+fetch the imagery. The second option is what's written above.
+
+While the option we chose requires more infrastructure -- another AWS service,
+the creation of compute environments, job definitions, and roles necessary for
+running batch jobs -- this is a case where the "as possible" part of the
+motivation does a lot of work. Given that the service is designed for data
+scientists easily to run predictions against new imagery, we had to weigh that
+increase in infrastructure complexity against the presumed needs of potential
+users. Since we believe this potential user's strike zone more likely includes
+creating a docker container for running predictions than deploying a RESTful
+model-as-a-service with its own (potentially) asynchronous task runner, we
+chose for Granary to own this particular piece of infrastructure complexity.
+
+A second example is in the choice of SNS as the notification service. An
+alternative was to include webhooks in the application, where users would
+tell Granary where they wanted notifications sent. This would have required
+us to understand the availability characteristics of those endpoints as well
+as to model in the application the possibility that users might want multiple
+services to be notified. Instead, by pushing to SNS, we trade consuming a
+small part of another AWS service's API for needing to model and respond
+appropriately to an unbounded set of external resources. SNS allows us to
+achieve a clear separation of concerns.
+
+# Consequences
+
+Because we chose continuous deployment, we'll need to:
+
+- revert the default branch to `master`
+- create a pull request template that expresses the constraints
+  we've committed to by choosing continuous deployment
+- set up CI to deploy on every merge to `master`
+
+We'll also need to do standard startup work for a new Scala backend. Much
+of that work was free, thanks to the azavea.g8 template. However, we'll
+need to create a datamodel and `DAO`s and routes. We're good at this :tm:
+now, so I don't think this needs much elaboration, and most of the details
+will come out in implementation I think.
+
+We'll also need to make a decision about where deployment code lives. Since this
+is supposed to be open source and portable, I think it makes sense for us
+to bring deployment into the open source repository. The downside to that is
+that our open source tool will come with some vendor lock-in. On the other
+hand, if deployment lives separately, our small open source tool will
+probably never be deployed by anyone else.