NERC Arctic Office Projects API

API for NERC Arctic Office projects database.

See the BAS API documentation for how to use this API.

Purpose

This API is used to record details of projects related to the NERC Arctic Office. This API is primarily intended for populating the projects database in the Arctic Office website but is designed for general use where applicable.

Implementation

This API is implemented as a Python Flask application following the JSON API specification. A PostgreSQL database is used for storing information. OAuth is used for controlling access to this information, managed using Microsoft Azure.

Configuration

Application configuration is set within config.py. Options use global or per-environment defaults which can be overridden if needed using environment variables, or a .env file (Dot Env) file.

Options include values for application secrets, feature flags (used to enable to disable features) and connection strings (such as databases).

The application environment (development, production, etc.) is set using the FLASK_ENV environment variable. A sample dot-env file, .env.example, describes how to set any required, recommended or commonly changed options. See config.py for all available options.

Data models

Data for this API is held in a PostgreSQL database. The database structure is managed using alembic migrations, defined in migrations/. SQL Alchemy is used to access the database within the application, using models defined in arctic_office_projects_api/models.py.

Data representations

Marshmallow and Marshmallow JSON API are used to transform resources between a storage (database) and access (API) representation, using schemas defined in arctic_office_projects_api/schemas.py.

Examples of representation transformations include, hiding the database primary key and renaming unintuitive database field names to more useful attribute names.

Schemas in this application should inherit from arctic_office_projects_api.schemas.Schema with a meta property inherited from arctic_office_projects_api.schemas.Schema.Meta. These classes define custom functionality and defaults suitable for generating more complete JSON API responses.

Neutral IDs

Resources in this API are identified using a neutral identifier such as: 01D5M0CFQV4M7JASW7F87SRDYB.

Neutral identifiers are persistent, unique, random and independent of how data is stored or processed, as this may change and introduce breaking limitations/requirements. They are implemented using Universally Unique Lexicographically Sortable Identifiers (ULIDs).

Neutral identifiers are created as part of Data loading.

Data loading

Production data for this API is imported from a variety of sources.

In non-production environments, Database seeding is used to create fake, but realistic, data in non-production environments.

Science categories

Science categories are used to categorise research projects, for example that a project relates to sea-ice.

These categories are defined in well-known schemes to ensure well considered and systematic coverage of general or discipline specific categories. Categories are structured into a hierarchy to allow navigation from general to more specific terms, or inversely, to generalise a term.

The schemes used by this project are:

the Universal Decimal Classification (UDC) - Summary
the NASA Global Change Master Directory (GCMD) - Earth Science keywords
the UK Data Service - Humanities And Social Science Electronic Thesaurus (HASSET)

The UDC Summary scheme is used as a base scheme, covering all aspects of human knowledge. As this scheme is only a summary, it does not include detailed terms for any particular areas. The GCMD Earth Science keywords and UK Data Service HASSET schemes are used to provide additional detail for physical sciences and social sciences respectively, as these are areas that the majority of research projects included in this API lie within.

These schemes and their categories are implemented as RDF graphs that describe properties about each category, such as name, examples and aliases, and the relationships between categories using 'broader than' and 'narrower than' relations.

These graphs are expressed as RDF triples by each scheme authority (i.e. the UDC consortium, NASA and the UK Data Service respectively). A set of additional triples are used to link concepts (categories) between each concept scheme.

Scheme	Linked UDC Concept
GCMD Earth Science keywords	55 Earth Sciences. Geological sciences
UK Data Service HASSET	3 Social Sciences

Note: These linkages are unofficial and currently very course, linking the top concept(s) of the Earth Science and HASSET schemes to a single concept in the UDC.

A series of processing steps are used to load RDF triples/graphs from each scheme, generate linkages between schemes and export a series of categories and category schemes into a file that can be imported into this API using the import categories CLI command.

The categories and category schemes import file is included in this project as resources/science-categories.json and can imported without needing to perform any processing. See the Usage section for more information.

If additional category schemes need to be included, or existing schemes require updating, the processing steps will need to be ran again to generate a replacement import file. See the Development section for more information.

Note: There is currently no support for updating a category scheme in cases where its categories have changed and require re-mapping to project resources.

Organisations

Organisations are used to represent funders of research grants and/or home institutes/organisations of people.

Organisations are added to this API based on need (i.e. for a grant or a person). To avoid duplication, organisations are distinguished by their GRID ID, equivalent to ORCID iDs but for (academic) organisations.

Organisations are imported using a JSON encoded import file, with a structure defined and validated by a JSON Schema, defined in resources/organisations-schema.json, see the Usage section for more information.

Two import files are included in this project:

resources/funder-organisations.json - represents organisations that fund grants, includes UKRI research councils and the EU as a funding body
resources/people-organisations.json - represents the organisations individuals (PIs/CoIs) are members of

Note: These files should be expanded with additional organisations as needed.

Projects and Grants

Projects are used to represent activities, grants are used to represent the funding for these activities. All grants will have a project, however a project may not have a grant (i.e. for unfunded activities).

Note: In the future, grants may fund multiple activities or be part of larger grants (split awards). Projects may in turn be funded by multiple grants (matched or follow-on funding) and be part of larger programmes. See #21 and #22 for more information.

Projects and Grants are added to this API from third-party providers, their functionality and usage varies.

Note: The semantic difference between a grant and project is not clear cut and are used interchangeably by different providers. I.e. a 'project' in one system may represent a 'grant' in the context of this API, or may combine aspects of both together.

Gateway to Research

Gateway to Research (GTR) is a database of all research and innovation funded by UK Research and Innovation, the umbrella organisation for the UK's funding councils, including NERC and it's various Arctic funding programmes and grants.

GTR terms grants as 'projects'. Each project includes properties such as the reference, title, abstract, funding amount and categories. Relationships include people (PIs, CoIs and others), publications and outcomes. It is updated through Researchfish, currently on an annual basis by funders and reporting institutions.

GTR projects are imported into this project through a GTR provided API which represents each project as a series of related resources. A GTR project and its resources are created as resources in this API as below:

GTR Resource	GTR Attribute	API Resource	API Attribute	Notes
GTR Project	Title	Project	Title	Duplicated between Project and Grant
GTR Project	Abstract	Project	Abstract	Duplicated between Project and Grant
GTR Publication	DOI	Project	Publications	Duplicated between Project and Grant
-	-	Project	Access Duration	Set from project duration
GTR Fund	Start and End	Project	Project Duration	Set from grant duration
GTR Project	Identifier	Grant	Reference	-
GTR Project	Title	Grant	Title	Duplicated between Project and Grant
GTR Project	Abstract	Grant	Abstract	Duplicated between Project and Grant
GTR Publication	DOI	Grant	Publications	Duplicated between Project and Grant
GTR Fund	Start and End	Grant	Duration	-
GTR Project	Status	Grant	Status	-
GTR Fund	Amount	Grant	Total Funds	-
GTR Fund	Currency Code	Grant	Total Funds Currency	-
GTR Funder	ID	Grant	Funder	ID requires mapping to GRID ID
-	-	Allocation	Project	Implied
-	-	Allocation	Grant	Implied
GTR Person	First Name	People	First Name	-
GTR Person	Surname	People	Last Name	-
GTR Person	ORCID iD	People	ORCID iD	-
GTR Employer	ID	People	Organisation	ID requires mapping to GRID ID
GTR Person	ORCID iD or ID	Participant	Person	ID requires mapping to ORCID iD
-	-	Participant	Project	Implied
GTR Project	Rel	Participant	Role	Based on Rel value mapping
GTR Project	Research Subject and Research Topic	Categorisations	Category	ID requires mapping to Scheme Identifier
-	-	Categorisations	Project	Implied

Note: API attributes that are not listed in this mapping are not set and will be omitted.

There are automatic mappings used by this provider:

The Rel property between a GTR Project and GTR Person is used as the Participant role:
- PI_PER is mapped to ParticipantRole.InvestigationRole_PrincipleInvestigator
- COI_PER is mapped to ParticipantRole.InvestigationRole_CoInvestigator

There are mandatory, manual, mappings required by this provider:

GTR resources mapped to Organisations (GTR Funder and GTR Employer) do not include GRID IDs, or another identifier that can be mapped to a GRID ID automatically - an internal mapping is therefore used to map GTR IDs to Grid IDs
GTR People are mapped to People but do not always include an ORCID iD, or another identifier that can be mapped to an ORCID iD automatically - an internal mapping is therefore used to map GTR IDs to ORCID iDs GTR is not aware of
GTR Projects include attributes that map to Categories, but the terms used are not in a scheme supported by this API (see Science categories for more information) - an internal mapping is therefore used to map GTR Subject or Topic categories to Categories in this API

Mappings are currently defined in methods in the GTR importer class (arctic_office_projects_api/importers/gtr.py):

GTR Funder/Employer to Organisation mappings are defined in _map_to_grid_id()
GTR People to People mappings are defined in _map_id_to_orcid_ids()
GTR Project categories/topics to Category mappings are defined in _map_gtr_project_category_to_category_term()

In addition, any Organisations related to the grant being imported (funder) or people related to the grant being imported, need to already exist. See Organisations for more information.

See the Usage section on the command used to import a grant.

Finding new GTR projects to add

The repository is here: https://gtr.ukri.org/. Search for 'Arctic' and then use filters on the right
Click the 'csv' button at the top to get the list.

Importing projects/grants

Copy the latest json file here arctic_office_projects_api/bulk_importer/ and add the new projects.

Alter line 35 json_filename = '/usr/src/app/arctic_office_projects_api/bulk_importer/json/projects-2022-04-19.json' so it points to the new *.json file.

Log into the Heroku dashboard & go to the project. Click the 'More' button and click 'Open console'. Run this command:
python arctic_office_projects_api/bulk_importer/import_grants.py

Documentation

Usage and reference documentation for this API is hosted within the BAS API Documentation project. The sources for this documentation are held in this project. Through Continuous Deployment they are uploaded to a relevant version of this service in the API docs project, and it's continuous Deployment process triggered to rebuild the documentation site with any changes.

Documentation Type	Documentation Format	Source
Usage	Jekyll page (Markdown)	`docs/usage/usage.md`
Reference	OpenAPI (Yaml)	`openapi.yml`

Note: Refer to the Documentation forms and types section for more information on how these documentation sources are processed by the BAS API Documentation project.

Errors

Errors returned by this API are formatted according to the JSON API error specification.

API Errors are implemented as application exceptions inherited from arctic_office_projects_api.errors.ApiException. This can return errors directly as Flask responses, or as a Python dictionary or JSON string.

Errors may be returned individually as they occur (such as fatal errors), or as a list of multiple errors at the same time (such as validation errors). See the Returning an API error section for how to return an error.

Error tracking

To ensure the reliability of this API, errors are logged to Sentry for investigation and analysis.

Through Continuous Deployment, commits to the master branch create new staging Sentry releases. Tagged commits create new production releases.

Health checks

Endpoints are available to allow the health of this API to be monitored. This can be used by load balancers to avoid unhealthy instances or monitoring reporting tools to prompt repairs by operators.

[GET|OPTIONS] `/meta/health/canary`

Reports on the overall health of this service as a boolean healthy/unhealthy status.

Returns a 204 - NO CONTENT response when healthy. Any other response should be considered unhealthy.

Request IDs

To aid in debugging, all requests will include a X-Request-ID header with one or more values. This can be used to trace requests through different services such as a load balancer, cache and other layers. Request IDs are managed by the Request ID middleware. The X-Request-ID header is returned to users and other components as a response header.

See the Correlation ID documentation for how the BAS API Load Balancer handles Request IDs.

Reverse proxying

It is assumed this API will be ran behind a reverse proxy / load balancer. This can present problems with generating absolute URLs as the API does not know which protocol, host, port or path it is exposed to clients as.

I.e. using flask.url_for('main.index', _external=True), the API may produce a URL of http://localhost:1234, but clients expect https://api.bas.ac.uk/foo/.

The Reverse Proxy middleware is used to provide this missing context using configuration options and HTTP headers.

Component	Configuration Method	Configuration Key	Implemented by	Example Value
Protocol	Configuration Option	`PREFERRED_URL_SCHEME`	Flask	`https`
Host	HTTP Header	`X-Forwarded-Host`	Reverse Proxy middleware	`api.bas.ac.uk`
Path prefix	Configuration Option	`SERVICE_PREFIX`	Reverse Proxy middleware	`/foo/v1`

Authentication and authorisation

This service is protected by Microsoft Azure's Active Directory OAuth endpoints using the Flask Azure AD OAuth Provider for authentication and authorisation.

This API (as a service), and it's clients are registered as applications within Azure Active Directory. The app representing this service, defines application (rather than delegated) permissions that can be assigned to relevant client applications.

Clients request access tokens from Azure, rather than this API, using the Client Credentials code flow.

Access tokens are structured as JSON Web Tokens (JWTs) and should be specified as a bearer token in the authorization header by clients.

Suitable permissions in either the 'NERC BAS WebApps' or 'NERC' Azure tenancy will be required to register applications and assign permissions.

Environment	Azure Tenancy
Local Development	NERC BAS WebApps
Staging	NERC BAS WebApps
Production	NERC

Available scopes

Scope	Type	Name	Description
-	-	-	-

Registering API clients

See these instructions for how to register client applications.

Note: It is not yet possible to register clients programmatically due to limitations with the Azure CLI and Azure provider for Terraform.

Note: These instructions describe how to register a client of this API, see the Setup section for how to register this API itself as a service.

Assigning scopes to clients

See these instructions for how to assign permissions defined by this API to client applications.

Note: It is not yet possible to assign permissions programmatically due to limitations with the Azure CLI and Azure provider for Terraform.

Usage

This section describes how to manage existing instances of this project in any environment. See the Setup section for how to create instances.

Note: See the BAS API documentation for how to use this API.

For all new instances you will need to:

run Database migrations
import science categories
import organisations
import grants

For development or staging environments you may also need to:

run Database seeding

Flask CLI

Many of the tasks needed to manage instances of this project use the Flask CLI.

To run flask CLI commands in a local development environment:

run docker-compose up to start the application and database containers
in another terminal window, run docker-compose exec app ash to launch a shell within the application container
in this shell, run flask [command] to perform a command

To run flask CLI commands in a staging and production environment:

navigate to the relevant Heroku application from the Heroku dashboard
from the application dashboard, select More -> Run Console from the right hand menu
in the console overlay, enter ash to launch a shell within the application container
in this shell, run flask [command] to perform a command

Note: In any environment, run flask alone to list available commands and view basic usage instructions.

Run database migrations

Database migrations are used to control the structure of the application database for persisting Data models.

The Flask migrate package is used to provide a Flask CLI command for running database migrations:

$ flask db [command]

To view the current (applied) migration:

$ flask db current

To view the latest (possibly un-applied) migration:

$ flask db head

To update an instance to the latest migration:

$ flask db upgrade

To un-apply all migrations (effectively emptying the database):

WARNING: This will drop all tables in the application database, removing any data.

$ flask db downgrade base

Run database seeding

Note: This process only applies to instances in local development or staging environments.

Database seeding is used to populate the application with fake, but realistic data.

A custom Flask CLI command is included for running database seeding:

$ flask seed [command]

To seed predictable, stable, test data for use when Testing:

$ flask seed predictable

To seed 100 random, fake but realistic, projects and related resources for use in non-production environments:

$ flask seed random

Note: You need to have imported the science categories and funder organisations before running this command.

Import data

A custom Flask CLI command is included for importing various resources into the API:

$ flask import [resource] [command]

Importing science categories

To import categories and category schemes from a file:

$ flask import categories [path to import file]

For example:

$ flask import categories resources/science-categories.json

Note: The structure of the import file will be validated against the resources/categories-schema.json JSON Schema before import.

Note: Previously imported categories, identified by their namespace or subject, will be skipped if imported again. Their properties will not be updated.

Importing organisations

To import organisations from a file:

$ flask import organisations [path to import file]

For example:

$ flask import organisations resources/funder-organisations.json
$ flask import organisations resources/people-organisations.json

Note: The structure of the import file will be validated against the resources/organisations-schema.json JSON Schema before import.

Note: Previously imported organisations, identified by their GRID identifier, will be skipped if imported again. Their properties will not be updated.

Importing grants

To import a grant from Gateway to Research (GTR):

$ flask import grant gtr [grant reference]

For example:

$ flask import grant gtr NE/K011820/1

Using the bulk importer - shell into the app container & run:

python arctic_office_projects_api/bulk_importer/import_grants.py

Note: It may be necessary to add to the mappings here: arctic_office_projects_api/importers/gtr.py
Projects will fail to import if they cannot resolve these mappings.

_map_gtr_project_research_topic_to_category_term
_map_id_to_orcid_ids
_ror_dict

Note: It will take a few seconds to import each grant due to the number of GTR API calls needed to collect all relevant information (grant, fund, funder, people, employers, publications, etc.).

Note: Previously imported grants, identified by their Grant reference, will be skipped if imported again. Their properties will not be updated.

Setup

This section describes how to create new instances of this project in a given environment.

$ git clone https://gitlab.data.bas.ac.uk/web-apps/arctic-office-projects-api.git
$ cd arctic-office-projects-api

Terraform remote state

For environments using Terraform, state information is stored remotely as part of BAS Terraform Remote State project.

Remote state storage will be automatically initialised when running terraform init, with any changes automatically saved to the remote (AWS S3) backend, there is no need to push or pull changes.

Remote state authentication

Permission to read and/or write remote state information for this project is restricted to authorised users. Contact the BAS Web & Applications Team to request access.

See the BAS Terraform Remote State project for how these permissions to remote state are enforced.

Local development

Docker and Docker Compose are required to setup a local development environment of this API.

Local development - Docker Compose

If you have access to the BAS GitLab instance, you can pull the application Docker image from the BAS Docker Registry. Otherwise you will need to build the Docker image locally.

# If you have access to gitlab.data.bas.ac.uk
$ docker login docker-registry.data.bas.ac.uk
$ docker-compose pull
# If you don't have access
$ docker-compose build

Copy .env.example to .env and edit the file to set at least any required (uncommented) options.

To run the API using the Flask development server (which reloads automatically if source files are changed) and a local PostgreSQL database:

$ docker-compose up

See the Usage section for instructions on how to configure and use the application instance.

Local development - database

To run application Database migrations and Database seeding, open an additional terminal to run:

# database migrations
$ docker-compose run app flask db upgrade
# database seeding
$ docker-compose run app flask seed --count 3

To connect to the database in a local development environment:

Parameter	Value
Host	`localhost`
Port	`5432`
Database	`app`
Username	`app`
Password	`password`
Schema	`public`

To connect to the database using psql in a local development environment:

$ docker-compose exec app-db ash
$ psql -U app
= SELECT current_database();
> current_database 
> ------------------
> app
= \q
$ exit

Local development - auth

See these instructions for how to register the application as a service.

use BAS NERC Arctic Office Projects API Testing as the application name
choose Accounts in this organizational directory only as the supported account type
do not enter a redirect URL
from the API permissions section of the registered application's permissions page:
- remove the default 'User.Read' permission
from the manifest page of the registered application:
- change the accessTokenAcceptedVersion property from null to 2
- add an item, api://[appId], to the identifierUris array, where [appId] is the value of the appId property
- add these items to the appRoles property [1]

Note: It is not yet possible to register clients programmatically due to limitations with the Azure CLI and Azure provider for Terraform.

Note: This describes how to register this API itself as a service, see the Registering API clients section for how to register a client of this API.

Set the AZURE_OAUTH_TENANCY, AZURE_OAUTH_APPLICATION_ID and AZURE_OAUTH_CLIENT_APPLICATION_IDS options in the local .env file.

For testing the API locally, register and assign all permissions to a testing client:

see the Registering API clients section to register a local testing API client
- named BAS NERC Arctic Office Projects API Client Testing, using accounts in the home tenancy only, with no redirect URL
see the Assigning scopes to clients section to assign all permissions to this client

[1] Application roles for the BAS NERC Arctic Office Projects API:

Note: Replace [uuid] with a UUID.

{
  "appRoles": []
}

Staging

Docker, Docker Compose and Terraform are required to setup the staging environment of this API.

Access to the BAS Web & Applications Heroku account is needed to setup the staging environment of this API.

Note: Make sure the HEROKU_API_KEY and HEROKU_EMAIL environment variables are set within your local shell.

Staging - Heroku

$ cd provisioning/terraform
$ docker-compose run terraform
$ terraform init
$ terraform apply

This will create a Heroku Pipeline, containing staging and production applications with a Heroku PostgreSQL database add-on.

A config var (environment variable) will automatically be added to each application with it's corresponding database connection string. Other non-sensitive config vars should be set using Terraform.

Once running, add the appropriate configuration to the BAS API Load Balancer.

Configure the relevant variables in the GitLab Continuous Deployment configuration to enable the application Docker image to be deployed automatically.

See the Usage section for instructions on how to configure and use the deployed application instance.

Staging - Heroku sensitive config vars

Config vars should be set manually for sensitive settings. Other config vars should be set in Terraform.

Config Var	Config Value	Description
`SENTRY_DSN`	Available from Sentry	Identifier for application in Sentry error tracking

Staging - database

Heroku will automatically run Database migrations as part of a Heroku release phase.

The Docker Container used for this is defined in Dockerfile.heroku-release.

Database seeding needs to be ran manually through the Heroku dashboard:

select More -> Run console from the top-right
enter flask seed --count 3 as the command

To connect to the staging environment database, expand the Database Credentials section of the Heroku database settings.

WARNING!: Heroku databases require SSL connections using a self-signed certificate. Currently SSL validation is disabled to allow connections. This is not ideal and should be used with caution.

If connecting from PyCharm, under the advanced tab for the data source, set the sslfactory parameter to org.postgresql.ssl.NonValidatingFactory.

Staging - documentation

To upload and publish documentation, follow the relevant setup instructions in the BAS API Documentation project.

Staging - auth

Use the same BAS NERC Arctic Office Projects API Testing application registered in the Auth sub-section in the local development section.

Production

Docker, Docker Compose and Terraform are required to setup the production environment of this API.

Access to the BAS Web & Applications Heroku account is needed to setup the staging environment of this API.

Note: Make sure the HEROKU_API_KEY and HEROKU_EMAIL environment variables are set within your local shell.

Production - Heroku

See the Heroku sub-section in the staging section for general instructions.

Production - Heroku sensitive config vars

Config vars should be set manually for sensitive settings. Other config vars should be set in Terraform.

Config Var	Config Value	Description
`SENTRY_DSN`	Available from Sentry	Identifier for application in Sentry error tracking

Production - database

Heroku will automatically run Database migrations as part of a Heroku release phase.

The Docker Container used for this is defined in Dockerfile.heroku-release.

To connect to the production environment database, expand the Database Credentials section of the Heroku database settings.

WARNING!: Heroku databases require SSL connections using a self-signed certificate. Currently SSL validation is disabled to allow connections. This is not ideal and should be used with caution.

If connecting from PyCharm, under the advanced tab for the data source, set the sslfactory parameter to org.postgresql.ssl.NonValidatingFactory.

Production - documentation

To upload and publish documentation, follow the relevant setup instructions in the BAS API Documentation project.

Production - auth

Using the Auth sub-section in the local development section, register an additional Azure application with these differences:

tenancy: NERC
name: BAS NERC Arctic Office Projects API

Development

This API is developed as a Flask application.

Environments and feature flags are used to control which elements of this application are enabled in different situations. For example in the development environment, Sentry error tracking is disabled and Flask's debug mode is on.

New features should be implemented with appropriate Configuration options available. Sensible defaults for each environment, and if needed feature flags, should allow end-users to fine tune which features are enabled.

Ensure .env.example is kept up-to-date if any configuration options are added or changed.

Also ensure:

Integration tests are updated to prevent future regression
End-user documentation is updated
if needed, Database migrations, including reverse migrations, are written for database structure changes
if needed, Database seeding is in place for use in development environments and running tests
all application errors implement, or inherit from, AppException in arctic_office_projects_api/errors.py

Code Style - linting

PEP-8 style and formatting guidelines must be used for this project, with the exception of the 80 character line limit.

Flake8 is used to ensure compliance, and is ran on each commit through Continuous Integration.

To check compliance locally:

$ docker-compose run app poetry run flake8 arctic_office_projects_api --ignore=E501 --exclude migrations

Shell into the container & run:

$ poetry run flake8 arctic_office_projects_api --ignore=E501 --exclude migrations

To assist with linting run Black:

$ poetry run black arctic_office_projects_api

Dependencies

Python dependencies should be defined using Pip through the requirements.txt file. The Docker image is configured to install these dependencies into the application image for consistency across different environments. Dependencies should be periodically reviewed and updated as new versions are released.

To add a new dependency:

$ docker-compose run app ash
$ pip install [dependency]==
# this will display a list of available versions, add the latest to `requirements.txt`
$ exit
$ docker-compose down
$ docker-compose build

If you have access to the BAS GitLab instance, push the rebuilt Docker image to the BAS Docker Registry:

$ docker login docker-registry.data.bas.ac.uk
$ docker-compose push

Dependency vulnerability scanning

To ensure the security of this API, all dependencies are checked against Snyk for vulnerabilities.

Warning: Snyk relies on known vulnerabilities and can't check for issues that are not in it's database. As with all security tools, Snyk is an aid for spotting common mistakes, not a guarantee of secure code.

Some vulnerabilities have been ignored in this project, see .snyk for definitions and the Dependency exceptions section for more information.

Through Continuous Integration, on each commit current dependencies are tested and a snapshot uploaded to Snyk. This snapshot is then monitored for vulnerabilities.

Manually adding a scan:

Install the Snyk CLI tool (See Snyk docs, install using npm?)
Activate a venv and install the dependencies: poerty shell, poetry install
Run snyk test
Run snyk monitor --project-name=arctic-office-projects-api --org=antarctica

Dependency vulnerability exceptions

This project contains known vulnerabilities that have been ignored for a specific reason.

Py-Yaml yaml.load() function allows Arbitrary Code Execution
- currently no known or planned resolution
- indirect dependency, required through the bandit package
- severity is rated high
- risk judged to be low as we don't use the Yaml module in this application
- ignored for 1 year for re-review
SQL Injection vulnerability where group_by accepts user input
- a fix is available, but is currently unreleased
- direct dependency
- severity is high
- risk judged to be low as we don't use group by in any queries
- ignored for 1 month to prompt check for released version containing fix

Static security scanning

To ensure the security of this API, source code is checked against Bandit for issues such as not sanitising user inputs or using weak cryptography.

Warning: Bandit is a static analysis tool and can't check for issues that are only be detectable when running the application. As with all security tools, Bandit is an aid for spotting common mistakes, not a guarantee of secure code.

Through Continuous Integration, each commit is tested.

To check locally:

$ docker-compose run app bandit -r .

Returning an API error

To return an API error, define an exception which inherits from the arctic_office_projects_api.errors.ApiException exception.

For example:

from arctic_office_projects_api.errors import ApiException

class ApiFooError(ApiException):
    """
    Returned when ...
    """
    title = 'Foo'
    detail = 'Foo details'

Arbitrary structured/additional data can be included in a meta property. This information can be error or error instance specific.

from arctic_office_projects_api.errors import ApiException

class ApiFooError(ApiException):
    """
    Returned when ...
    """
    title = 'Foo'
    detail = 'Foo details'

    # error specific meta information
    meta = {
      'foo': 'bar'
    }

# Error instance specific meta information
error_instance = ApiFooError(meta={'foo': 'baz'})

See the ApiException class for other supported properties.

To return an API error exception as a flask response:

from arctic_office_projects_api import create_app
from arctic_office_projects_api.errors import ApiException

app = create_app('production')

class ApiFooError(ApiException):
    """
    Returned when ...
    """
    title = 'Foo'
    detail = 'Foo details'

@app.route('/error')
def error_route():
    """
    Returns an error
    """

    error = ApiFooError()
    return error.response()

Adding a Flask CLI command

Flask CLI commands are used to expose processes and actions that control a Flask application. These commands may be provided by Flask (such as listing all application routes), by third-party modules (such as managing Database Migrations) or custom to this project (such as for Importing data).

Custom/first-party commands are defined in arctic_office_projects_api/commands.py, registered in the create_app() factory method.

Note: Ensure tests are added for any custom commands. See tests/test_commands.py for examples.

Generating category import files

Note: This section is still experimental until it can be formalised as part of #34.

Experiments 6 and 7 of the RDF Experiments project are used to:

generate a series of a RDF triples linking the GCMD Earth Science keywords and UK Data Service HASSET schemes to the UDC Summary scheme (experiment 7)
loading the concepts from the UDC, GCMD and HASSET schemes and producing a JSON file that can be imported into this project (experiment 6)

Logging

In a request context, the default Flask log will include the URL and Request ID of the current request. In other cases, these fields are substituted with NA.

Note: When not running in Flask Debug mode, only messages with a severity of warning of higher will be logged.

Debugging

To debug using PyCharm:

Run -> Edit Configurations
Add New Configuration -> Python

In Configuration tab:

Script path: [absolute path to project]/manage.py
Python interpreter: Project interpreter (app service in project Docker Compose)
Working directory: [absolute path to project]
Path mappings: [absolute path to project]=/usr/src/app

Database migrations

All structural changes to the application database must be made using alembic database migrations, defined in migrations/.

Migrations should be generated from changes to Database models, to prevent differences between the model and the database, using the db migrate command. This will generate a new migration in migrations/versions, which should be reviewed to remove the auto-generated comments and check the correct actions will be carried out.

All migrations must include a reverse/down migration, as these are used to reset the database when Testing.

See the Usage section for instructions on applying database migrations.

Database models

All database access should use SQL Alchemy with models defined in arctic_office_projects_api/models.py. A suitable __repr__() method should be defined to aid in debugging. A suitable seed() method should be defined for seeding each model.

Database seeding

Database seeding is used to populate the application database with either:

predictable, stable, test data for use in Testing
random, fake but realistic, test data for use in development and staging environments

See the Usage section for instructions on running database seeding.

Faker

Faker is a library for generating fake data. It includes a range of providers for coomon attributes such as dates, names, addresses etc. with localisation into various languages and locales (e.g. en-GB). Faker is recommended for creating random, fake, data when seeding.

Custom Faker providers

Where Faker does not provide a required attribute, a custom provider can be created. New providers should follow the conventions established by the main Faker package. Custom providers should be defined in the arctic_office_projects_api.main.faker.providers module. When adding the custom provider to Faker, ensure the providers Provider class is added, rather than the module itself.

For example:

from faker import Faker
from arctic_office_projects_api.main.faker.providers.person import Provider as Person

faker = Faker('en_GB')
faker.add_provider(Person)  # a custom provider

person_gender = faker.male_or_female()  # use of a custom provider

Resource schemas

Marshmallow and Marshmallow JSON API are used to define schemas, in arctic_office_projects_api/schemas.py, that convert data between the form it's stored in (i.e. as a Model instance), and the form it should be displayed within the API (as a resource).

Schemas and models do not necessarily have a 1:1 mapping. A model may be based on a subset of model instances (e.g. only those with a particular set of attributes), or may combine multiple models to give a more useful resource.

Typically, models do not expose fields specific to how data is stored for example, such as primary keys in databases.

Pagination support

Where a schema will return a large number of items, pagination is recommended. The arctic_office_projects_api.schemas.Schema class supports a limited form of pagination whilst it is added to Marshmallow JsonAPI more completely.

Limitations include:

only page based pagination is supported, as opposed to offset/limit and cursor methods
only Flask SQL Alchemy Pagination objects are supported

When enabled this support will:

extract items in the current page to use as input
add links to the first, previous, current, next and last pages in the top-level links object

To use pagination:

set the many and paginate schema options to true
pass a Flask SQL Alchemy Pagination object to the dump() method

For example:

from flask import request, jsonify

from arctic_office_projects_api import create_app
from arctic_office_projects_api.models import Person
from arctic_office_projects_api.schemas import PersonSchema

app = create_app('production')

@app.route('/people')
def people_list():
    # Determine the pagination page number from the request, or default to page 1
    page = request.args.get('page', type=int)
    if page is None:
        page = 1

    # Get a Pagination object based on the current pagination page number and a fixed page size
    people = Person.query.paginate(page=page, per_page=app.config['APP_PAGE_SIZE'])

    # Enable pagination support on schema
    payload = PersonSchema(many=True, paginate=True).dump(people)

    return jsonify(payload.data)

Related resources support

Relationships between schemas can be expressed using the arctic_office_projects_api.schemas.Relationship class. This is a custom version of the Marshmallow JSON API.

Additions made to the arctic_office_projects_api.schemas.Schema class allow relationship and related resource responses to be returned.

Limitations include:

document and data level meta elements are not currently supported

Relationship responses

A relationship response returns the resource linkage between a resource and one or more other resource type.

For example, a Person resource may be related to one or more Participant resources:

{
  "data": [
    {
      "id": "01D5T4N25RV2062NVVQKZ9NBYX",
      "type": "participants"
    }
  ],
  "links": {
    "related": "http://localhost:9001/people/01D5MHQN3ZPH47YVSVQEVB0DAE/participants",
    "self": "http://localhost:9001/people/01D5MHQN3ZPH47YVSVQEVB0DAE/relationships/participants"
  }
}

To return a relationship response:

set the resource_linkage schema option to the related resource type

For example:

from flask import request, jsonify
from sqlalchemy.orm.exc import NoResultFound, MultipleResultsFound

from arctic_office_projects_api import create_app
from arctic_office_projects_api.models import Person
from arctic_office_projects_api.schemas import PersonSchema

app = create_app('production')

@app.route('/people/<person_id>/relationships/organisations')
def people_relationship_organisations(person_id: str):
    try:
        person = Person.query.filter_by(id=person_id).one()
        payload = PersonSchema(resource_linkage='organisation').dump(person)
        return jsonify(payload.data)
    except NoResultFound:
        return 'Not found error'
    except MultipleResultsFound:
        return 'Multiple resource conflict error'

Related resource responses

A related resources response returns the resources of a particular type related to a resource.

For example, a Person resource may be related to one or more Participant resources:

{
  "data": [
    {
      "attributes": {
        "foo": "bar"
      },
      "id": "01D5T4N25RV2062NVVQKZ9NBYX",
      "links": {
        "self": "http://localhost:9001/participants/01D5T4N25RV2062NVVQKZ9NBYX"
      },
      "relationships": {
        "person": {
          "data": {
            "id": "01D5MHQN3ZPH47YVSVQEVB0DAE",
            "type": "people"
           },
           "links": {
             "related": "http://localhost:9001/participants/01D5T4N25RV2062NVVQKZ9NBYX/people",
             "self": "http://localhost:9001/participants/01D5T4N25RV2062NVVQKZ9NBYX/relationships/people"
          }
        }
      },
      "type": "participants"
    }
  ],
    "links": {
      "self": "http://localhost:9001/people/01D5MHQN3ZPH47YVSVQEVB0DAE/relationships/participants"
  }
}

To return a related resource response:

set the related_resource schema option to the related resource type
set the many_related schema option to true where there may be multiple related resources (of a given type)

For example:

from flask import request, jsonify
from sqlalchemy.orm.exc import NoResultFound, MultipleResultsFound

from arctic_office_projects_api import create_app
from arctic_office_projects_api.models import Person
from arctic_office_projects_api.schemas import PersonSchema

app = create_app('production')

@app.route('/people/<person_id>/organisations')
def people_organisations(person_id: str):
    try:
        person = Person.query.filter_by(id=person_id).one()
        payload = PersonSchema(resource_resource='organisation').dump(person)
        return jsonify(payload.data)
    except NoResultFound:
        return 'Not found error'
    except MultipleResultsFound:
        return 'Multiple resource conflict error'

Testing

Integration tests

This project uses integration tests to ensure features work as expected and to guard against regressions and vulnerabilities.

The Python UnitTest library is used for running tests using Flask's test framework. Test cases are defined in files within tests/ and are automatically loaded when using the test Flask CLI command.

Tests are automatically ran on each commit through Continuous Integration.

It may be necesssary to create a test database in the app-db container called app_test


### Pytest testing
- `poetry run pytest tests`


For Coverage reports:
- `poetry run pytest --cov-report=html --cov=arctic_office_projects_api tests`
- Reports are generated in the htmlcov directory

#### Integration testing - auth

Where methods require authentication/authorisation locally issued tokens are used, using a temporary signing key.

### Continuous Integration

All commits will trigger a Continuous Integration process using GitLab's CI/CD platform, configured in `.gitlab-ci.yml`.

This process will run the application [Integration tests](#integration-tests).

Pip dependencies are also [checked and monitored for vulnerabilities](#dependency-vulnerability-scanning).

## Deployment

### Deployment - Local development

In development environments, the API is ran using the Flask development server through the project Docker container.

Code changes will be deployed automatically by Flask reloading the application where a source file changes.

See the [Local development](#local-development) sub-section in the [Setup](#setup) section for more information.

### Deployment - Staging

The staging environment is deployed on [Heroku](https://heroku.com) as an 
[application](https://dashboard.heroku.com/apps/bas-arctic-projects-api-stage) within a 
[pipeline](https://dashboard.heroku.com/pipelines/30f0864a-16e9-41c8-862d-866dd460ba20) in the `webapps@bas.ac.uk` 
shared account.

This Heroku application uses their 
[container hosting](https://devcenter.heroku.com/articles/container-registry-and-runtime) option running a Docker image 
built from the application image (`./Dockerfile`) with the application source included and development related features
disabled. This image (`./Dockerfile.heroku`) is built and pushed to Heroku on each commit to the `master` branch 
through [Continuous Deployment](#continuous-deployment).

An additional Docker image (`./Dockerfile.heroku-release`) is built to act as a 
[Release Phase](https://devcenter.heroku.com/articles/release-phase) for the Heroku application. This image is based on 
the Heroku application image and includes an additional script for running [Database migrations](#database-migrations). 
Heroku will run this image automatically before each deployment of this project.

### Deployment - Production

The production environment is deployed in the same way as the [Staging environment](#deployment-staging), using an
different Heroku [application](https://dashboard.heroku.com/apps/bas-arctic-projects-api-prod) as part of the same 
pipeline.

Deployments are also made through [Continuous Deployment](#continuous-deployment) but only on tagged commits.

### Continuous Deployment

A Continuous Deployment process using GitLab's CI/CD platform is configured in `.gitlab-ci.yml`. This will:

* build a Heroku specific Docker image using a 'Docker In Docker' (DIND/DND) runner and push this image to Heroku
* push [End-user documentation](#documentation) to the 
  [BAS API Documentation project](https://gitlab.data.bas.ac.uk/WSF/api-docs)
* create a Sentry release and associated deployment in the appropriate environment

This process will deploy changes to the *staging* environment on all commits to the *master* branch.

This process will deploy changes to the *production* environment on all tagged commits.

## Release procedure

### At release

For all releases:

1. create a release branch
2. if needed, build & push the Docker image
3. close release in `CHANGELOG.md`
4. push changes, merge the release branch into `master` and tag with version

The application will be automatically deployed into production using [Continuous Deployment](#continuous-deployment).

## Feedback

The maintainer of this project is the BAS Web & Applications Team, they can be contacted at: 
[servicedesk@bas.ac.uk](mailto:servicedesk@bas.ac.uk).

## Issue tracking

This project uses issue tracking, see the 
[Issue tracker](https://gitlab.data.bas.ac.uk/web-apps/arctic-office-projects-api/issues) for more 
information.

**Note:** Read & write access to this issue tracker is restricted. Contact the project maintainer to request access.

## License

© UK Research and Innovation (UKRI), 2019, British Antarctic Survey.

You may use and re-use this software and associated documentation files free of charge in any format or medium, under 
the terms of the Open Government Licence v3.0.

You may obtain a copy of the Open Government Licence at http://www.nationalarchives.gov.uk/doc/open-government-licence/


###

Add `import sqlalchemy_utils` to the migration file E.g: `migrations/versions/83da90ee9d2c_.py`

Files

README.md

Latest commit

History

README.md

File metadata and controls

NERC Arctic Office Projects API

Purpose

Implementation

Configuration

Data models

Data representations

Neutral IDs

Data loading

Science categories

Organisations

Projects and Grants

Gateway to Research

Finding new GTR projects to add

Importing projects/grants

Documentation

Errors

Error tracking

Health checks

[GET|OPTIONS] /meta/health/canary

Request IDs

Reverse proxying

Authentication and authorisation

Available scopes

Registering API clients

Assigning scopes to clients

Usage

Flask CLI

Run database migrations

Run database seeding

Import data

Importing science categories

Importing organisations

Importing grants

Setup

Terraform remote state

Remote state authentication

Local development

Local development - Docker Compose

Local development - database

Local development - auth

Staging

Staging - Heroku

Staging - Heroku sensitive config vars

Staging - database

Staging - documentation

Staging - auth

Production

Production - Heroku

Production - Heroku sensitive config vars

Production - database

Production - documentation

Production - auth

Development

Code Style - linting

Dependencies

Dependency vulnerability scanning

Dependency vulnerability exceptions

Static security scanning

Returning an API error

Adding a Flask CLI command

Generating category import files

Logging

Debugging

Database migrations

Database models

Database seeding

Faker

Custom Faker providers

Resource schemas

Pagination support

Related resources support

Relationship responses

Related resource responses

Testing

[GET|OPTIONS] `/meta/health/canary`