Data Hub API

Data Hub API provides an API into Data Hub for Data Hub clients. Using Data Hub API you can search for entities and manage companies, contacts and interactions.

More guides can be found in the docs folder.

To instantiate the development environment, please follow one of the three following sets of instructions depending on your use case:

Installation with Docker

This project uses Docker compose to setup and run all the necessary components. The docker-compose.yml file provided is meant to be used for running tests and development.

Note for Mac Users: By default, docker on Mac will restrict itself to using just 2GB of memory. This should be increased to at least 4GB to avoid running in to unexpected problems.

Clone the repository:

git clone https://github.com/uktrade/data-hub-api
cd data-hub-api

Create .env files from sample.env
```
cp sample.env .env
cp config/settings/sample.env config/settings/.env
```
If you're working with data-hub-frontend and mock-sso, DJANGO_SUPERUSER_SSO_EMAIL_USER_ID should be the same as MOCK_SSO_EMAIL_USER_ID in mock-sso environment definition in data-hub-frontend/docker-compose.frontend.yml
Build and run the necessary containers for the required environment:
```
docker-compose up
```
or
```
make start-dev
```
- It will take time for the API container to come up - it will run migrations on both DBs, load initial data, sync opensearch etc. Watch along in the api container's logs.
- NOTE: If you are using a linux system, the opensearch container may not come up successfully (data-hub-api_es_1) - it might be perpetually restarting. If the logs for that container mention something like max virtual memory areas vm.max_map_count [65530] is too low, increase to at least [262144], you will need to run the following on your host machine:
```
sudo sysctl -w vm.max_map_count=262144
```
  and append/modify the vm.max_map_count setting in /etc/sysctl.conf (so that this setting persists after restart):
```
vm.max_map_count=262144
```
  For more information, see the opensearch docs on vm.max_map_count.
Optionally, you may want to run a local copy of the data hub frontend. By default, you can run both the API and the frontend under one docker-compose project. See the instructions in the frontend readme to set it up. Alternatively use the make command documented below if you also want to bring up dnb-service

Installation with docker of data-hub-api, data-hub-frontend and dnb-service on same network

There is now a make command to bring up the three environments on a single docker network, allowing the services to talk to each other effortlessly

Clone the repositories

git clone https://github.com/uktrade/data-hub-api
git clone https://github.com/uktrade/data-hub-frontend
git clone https://github.com/uktrade/dnb-service
cd data-hub-api

Create .env files from sample.env
```
cp sample.env .env
cp config/settings/sample.env config/settings/.env
```
Ensure DJANGO_SUPERUSER_SSO_EMAIL_USER_ID is the same as MOCK_SSO_EMAIL_USER_ID in mock-sso environment definition in data-hub-frontend/docker-compose.frontend.yml and DJANGO_SUPERUSER_SSO_EMAIL_USER_ID the same as DJANGO_SUPERUSER_EMAIL in data-hub-api .env file otherwise the user may not exist

The DNB_SERVICE_BASE_URL should match the dnb-service domain you are trying to access (localhost, staging, uat etc). The DNB_SERVICE_TOKEN is a token generated by dnb-service using the django-rest-framework token authentication. For localhost, you will need to generate a token on dnb-service for your user. For staging, uat or prod, the current tokens can be found in vault which are associated with an already existing api user. See the dnb-service README for more information on how to find or generate new tokens.
Run make command
```
make start-frontend-api-dnb
```
- It will take time for the API container to come up - it will run migrations on both DBs, load initial data, sync opensearch etc. Watch along in the api container's logs.
- NOTE: If you are using a linux system, the opensearch container may not come up successfully (data-hub-api_es_1) - it might be perpetually restarting. If the logs for that container mention something like max virtual memory areas vm.max_map_count [65530] is too low, increase to at least [262144], you will need to run the following on your host machine:
```
sudo sysctl -w vm.max_map_count=262144
```
and append/modify the vm.max_map_count setting in /etc/sysctl.conf (so that this setting persists after restart):
```
vm.max_map_count=262144
```
For more information, see the opensearch docs on vm.max_map_count.
If you want to stop all the services, run the following make command
```
make stop-frontend-api-dnb
```

Native installation (without Docker)

Dependencies:

Python 3.10.x
PostgreSQL 12
redis 6.x
OpenSearch 1.x

Clone the repository:

git clone https://github.com/uktrade/data-hub-api
cd data-hub-api

Install Python 3.10.

See this guide for detailed instructions for different platforms.

Install system dependencies:

On Ubuntu:

sudo apt install build-essential libpq-dev python3.10-dev python3.10-venv

On macOS:

brew install libpq

Install postgres, if not done already, as this is required by psycopg2 in the requirements below

On Ubuntu:
```
sudo apt install postgresql postgresql-contrib
```
On macOS:
```
brew install postgresql
```

Create and activate the virtualenv:

python3.10 -m venv env
source env/bin/activate
pip install -U pip
or
formally to make sure you have the same version as what is used for cloudfoundry, use buildpack to install the same version e.g. https://github.com/cloudfoundry/python-buildpack/releases e.g. 22.1.2
python -m pip install pip==22.1.2

Install the dependencies:
```
pip install -r requirements-dev.txt
```
Create an .env settings file (it’s gitignored by default):
```
cp config/settings/sample.env config/settings/.env
```
Set DOCKER_DEV=False and LOCAL_DEV=True in .env

Make sure you have OpenSearch running locally. If you don't, you can run one in Docker:

docker run -p 9200:9200 -e "http.host=0.0.0.0" -e "transport.host=127.0.0.1" -e "plugins.security.disabled=true" opensearchproject/opensearch:1.2.4

Make sure you have redis running locally and that the REDIS_BASE_URL in your .env is up-to-date.

Populate the databases and initialise OpenSearch:

./manage.py migrate
./manage.py migrate_search

./manage.py loadinitialmetadata
./manage.py createinitialrevisions

Optionally, you can load some test data:
```
./manage.py loaddata fixtures/test_data.yaml
```
Note that this will queue RQ tasks to index the created records in OpenSearch, and hence the loaded records won‘t be returned by search endpoints until RQ is started and the queued tasks have run.
Create a superuser:
```
./manage.py createsuperuser
```
(You can enter any valid email address as the username and SSO email user ID.)
Start the server:
```
./manage.py runserver
```
Start RQ (Redis Queue):
```
python rq/rq-worker.py
```
Note that in production the cron-scheduler:1/1, short-running-working:4/4, long-running-worker:4/4 are run in separate instances .

API documentation

Automatically-generated API documentation is served at /docs (requires admin site credentials).

Local development

If using Docker, prefix these commands with docker-compose run api.

To run the tests:

./tests.sh

To run the tests in parallel, pass -n <number of processes> to ./tests.sh. For example, for four processes:

./tests.sh -n 4

By default all tests are executed. To skip tests that should not run under certain conditions ie. are dependent on external services, use a marker excluded or a marker that start with excluded_.

import pytest

@pytest.mark.excluded
class TestClass:
    def test_1(self):
        ...

@pytest.mark.excluded
def test_1():
    ...

@pytest.mark.excluded_dependent_on_redis
def test_2():
    ...

To exclude such tests from running use one of the options below:

pytest --skip-excluded
pytest -m excluded_dependent_on_redis
pytest -m "excluded_x or excluded_y"
pytest -m "not (excluded_x or excluded_y)"

NOTE: When testing, using the --reuse-db can speed up test runs by excluding migrations

To run the linter:

flake8

There is also a pre-commit hook for Flake8. To install this, run:

make setup-flake8-hook

Obtaining an API access token

You can obtain an access token for local development work in one of two ways:

by running ./manage.py add_access_token <SSO email user ID> with the SSO email user ID of an existing adviser (run ./manage.py add_access_token --help for a list of options)
using the form on http://localhost:8000/admin/add-access-token/

(If you’re using Docker, an access token will be created automatically if certain environment variables are set. See sample.env for more details.)

This access token can be used with most endpoints by setting an Authorization header value of Bearer <access token>.

Note that machine-to-machine endpoints (such as those under /v4/metadata/) instead use Hawk authentication and request signing.

Granting access to the front end

The internal front end uses single sign-on. You should configure the API as follows to use with the front end:

SSO_ENABLED: True
STAFF_SSO_BASE_URL: URL of a Staff SSO or Mock SSO instance. This should be the same server the front end is configured to use.
STAFF_SSO_AUTH_TOKEN: Access token for Staff SSO.

Granting access to machine-to-machine clients

Pure machine-to-machine clients use Hawk authentication with separate credentials for each client.

There are separate views for such clients as these views don’t expect request.user to be set.

Hawk credentials for each client are defined in settings below and each client is assigned scopes in config/settings/common.py.

These scopes define which views each client can access.

Deployment

Data Hub API can run on any Heroku-style platform. Configuration is performed via the following environment variables:

Variable name	Required	Description
`ACTIVITY_STREAM_ACCESS_KEY_ID`	No	A non-secret access key ID, corresponding to `ACTIVITY_STREAM_SECRET_ACCESS_KEY`. The holder of the secret key can access the activity stream endpoint by Hawk authentication.
`ACTIVITY_STREAM_SECRET_ACCESS_KEY`	If `ACTIVITY_STREAM_ACCESS_KEY_ID` is set	A secret key, corresponding to `ACTIVITY_STREAM_ACCESS_KEY_ID`. The holder of this key can access the activity stream endpoint by Hawk authentication.
`ACTIVITY_STREAM_OUTGOING_URL`	No	The URL used to read from activity stream
`ACTIVITY_STREAM_OUTGOING_ACCESS_KEY_ID`	No	A non-secret access key ID, corresponding to `ACTIVITY_STREAM_OUTGOING_SECRET_ACCESS_KEY`. This is used when reading from the activity stream at `ACTIVITY_STREAM_OUTGOING_URL`.
`ACTIVITY_STREAM_OUTGOING_SECRET_ACCESS_KEY`	No	A secret key, corresponding to `ACTIVITY_STREAM_OUTGOING_ACCESS_KEY_ID`. This is used when reading from the activity stream at `ACTIVITY_STREAM_OUTGOING_URL`.
`ADMIN_OAUTH2_ENABLED`	Yes	Enables Django Admin SSO login when is True.
`ADMIN_OAUTH2_BASE_URL`	If `ADMIN_OAUTH2_ENABLED` is set	A base URL of OAuth provider.
`ADMIN_OAUTH2_TOKEN_FETCH_PATH`	If `ADMIN_OAUTH2_ENABLED` is set	OAuth fetch token path for Django Admin SSO login.
`ADMIN_OAUTH2_USER_PROFILE_PATH`	If `ADMIN_OAUTH2_ENABLED` is set	OAuth user profile path for Django Admin SSO login.
`ADMIN_OAUTH2_AUTH_PATH`	If `ADMIN_OAUTH2_ENABLED` is set	OAuth auth path for Django Admin SSO login.
`ADMIN_OAUTH2_CLIENT_ID`	If `ADMIN_OAUTH2_ENABLED` is set	OAuth client ID for Django Admin SSO login.
`ADMIN_OAUTH2_CLIENT_SECRET`	If `ADMIN_OAUTH2_ENABLED` is set	OAuth client secret for Django Admin SSO login.
`AV_V2_SERVICE_URL`	Yes	URL for ClamAV V2 service. If not configured, virus scanning will fail.
`DEFAULT_BUCKET_AWS_ACCESS_KEY_ID`	No	Used as part of boto3 auto-configuration.
`DEFAULT_BUCKET_AWS_DEFAULT_REGION`	No	Default region used by boto3.
`DEFAULT_BUCKET_AWS_SECRET_ACCESS_KEY`	No	Used as part of boto3 auto-configuration.
`CONSENT_SERVICE_BASE_URL`	No	The base url of the consent service, to post email consent preferences to (default=None).
`CONSENT_SERVICE_HAWK_ID`	No	The hawk id to use when making a request to the consent service (default=None).
`CONSENT_SERVICE_HAWK_KEY`	No	The hawk key to use when making a request to the consent service (default=None).
`CSRF_COOKIE_HTTPONLY`	No	Whether to use HttpOnly flag on the CSRF cookie (default=False).
`CSRF_COOKIE_SECURE`	No	Whether to use a secure cookie for the CSRF cookie (default=False).
`DATA_FLOW_API_ACCESS_KEY_ID`	No	A non-secret access key ID, corresponding to `DATA_FLOW_API_SECRET_ACCESS_KEY`. The holder of the secret key can access the omis-dataset endpoint by Hawk authentication.
`DATA_FLOW_API_SECRET_ACCESS_KEY`	If `DATA_FLOW_API_ACCESS_KEY_ID` is set	A secret key, corresponding to `DATA_FLOW_API_ACCESS_KEY_ID`. The holder of this key can access the omis-dataset endpoint by Hawk authentication.
`DATABASE_CONN_MAX_AGE`	No	Maximum database connection age (in seconds).
`DATABASE_CREDENTIALS`	Yes	PostgreSQL data base credentials contained in JSON structure: `{"username": application_user", "password": "p4s5w0rd!", "engine": "postgres", "port": 5432, "dbname": "main", "host": "hostname.rds.amazonaws.com", "dbInstanceIdentifier", "db-instance"}`
`DATAHUB_FRONTEND_BASE_URL`	Yes
`DATAHUB_NOTIFICATION_API_KEY`	No	The GOVUK notify API key to use for the `datahub.notification` django app.
`DATAHUB_SUPPORT_EMAIL_ADDRESS`	No	Email address for DataHub support team.
`DATA_HUB_FRONTEND_ACCESS_KEY_ID`	No	A non-secret access key ID, corresponding to `DATA_HUB_FRONTEND_SECRET_ACCESS_KEY`. The holder of the secret key can access the metadata endpoints by Hawk authentication.
`DATA_HUB_FRONTEND_SECRET_ACCESS_KEY`	If `DATA_HUB_FRONTEND_ACCESS_KEY_ID` is set	A secret key, corresponding to `METADATA_ACCESS_KEY_ID`. The holder of this key can access the metadata endpoints by Hawk authentication.
`DEBUG`	Yes	Whether Django's debug mode should be enabled.
`DIT_EMAIL_DOMAIN_*`	No	An allowable DIT email domain for email ingestion along with it's allowed email authentication methods. Django-environ dict format e.g. example.com=dmarc:pass implementation either spf:pass or dkim:pass
`DIT_EMAIL_INGEST_BLOCKLIST`	No	A list of emails for which email ingestion is prohibited.
`DJANGO_SECRET_KEY`	Yes
`DJANGO_SENTRY_DSN`	Yes
`DJANGO_SETTINGS_MODULE`	Yes
`DNB_AUTOMATIC_UPDATE_LIMIT`	No	Integer of the maximum number of updates the DNB automatic update task should ingest before exiting. This is unlimited if this setting is not set.
`DNB_SERVICE_BASE_URL`	No	The base URL of the DNB service.
`DNB_SERVICE_TOKEN`	No	Access token provided by dnb-service through it's django-rest-framework token authentication.
`DEFAULT_BUCKET`	Yes	S3 bucket for object storage.
`DISABLE_PAAS_IP_CHECK`	No	Disable PaaS IP check for Hawk endpoints (default=False).
`ENABLE_ADMIN_ADD_ACCESS_TOKEN_VIEW`	No	Whether to enable the add access token page for superusers in the admin site (default=True).
`ENABLE_AUTOMATIC_REMINDER_ITA_USER_MIGRATIONS`	No	Whether to enable automatic migration of ITA users to receive notification reminders
`ENABLE_AUTOMATIC_REMINDER_POST_USER_MIGRATIONS`	No	Whether to enable automatic migration of POST users to receive notification reminders
`ENABLE_DAILY_OPENSEARCH_SYNC`	No	Whether to enable the daily OpenSearch sync (default=False).
`ENABLE_INVESTMENT_NOTIFICATION`	No	True or False. Whether or not to activate the RQ task for sending investment notifications
`ENABLE_ESTIMATED_LAND_DATE_REMINDERS`	No	True or False. Whether or not to activate the RQ task for sending investment notifications
`ENABLE_ESTIMATED_LAND_DATE_REMINDERS_EMAIL_DELIVERY_STATUS`	No	True or False. Whether or not to activate the RQ task for checking delivery status
`ENABLE_NEW_EXPORT_INTERACTION_REMINDERS`	No	True or False. Whether or not to activate the RQ task for sending new export interaction notifications
`ENABLE_NEW_EXPORT_INTERACTION_REMINDERS_EMAIL_DELIVERY_STATUS`	No	True or False. Whether or not to activate the RQ task for checking delivery status
`ENABLE_NO_RECENT_EXPORT_INTERACTION_REMINDERS`	No	True or False. Whether or not to activate the RQ task for sending no recent interaction notifications
`ENABLE_NO_RECENT_EXPORT_INTERACTION_REMINDERS_EMAIL_DELIVERY_STATUS`	No	True or False. Whether or not to activate the RQ task for checking delivery status
`ENABLE_NO_RECENT_INTERACTION_REMINDERS`	No	True or False. Whether or not to activate the RQ task for sending investment notifications
`ENABLE_MAILBOX_PROCESSING`	No	True or False. Whether or not to activate the RQ task for mailbox processing
`ENABLE_SLACK_MESSAGING`	No	If present and truthy, enable the transmission of messages to Slack. Necessitates the specification of the other env vars `SLACK_API_TOKEN` and `SLACK_MESSAGE_CHANNEL`
`ENABLE_SPI_REPORT_GENERATION`	No	Whether to enable daily SPI report (default=False).
`ES_APM_ENABLED`	Yes	Enables Elasticsearch APM agent when is True.
`ES_APM_SERVICE_NAME`	Yes, if ES_APM_ENABLED	A name of the running service. Must match following regexp: ^[a-zA-Z0-9 _-]+$.
`ES_APM_SECRET_TOKEN`	Yes, if ES_APM_ENABLED	A secret token used to authorise requests to the APM server.
`ES_APM_SERVER_TIMEOUT`	No	A timeout for requests to the Elasticsearch APM server in duration format (default=20s).
`ES_APM_SERVER_URL`	Yes, if ES_APM_ENABLED	The URL of the Elasticsearch APM server.
`ES_APM_ENVIRONMENT`	Yes, if ES_APM_ENABLED	A name of the environment the service is running, for example: `develop`.
`EXPORT_WINS_SERVICE_BASE_URL`	No	The base url of the Export Wins API (default=None).
`EXPORT_WINS_HAWK_ID`	No	The hawk id to use when making a request to the Export Wins API (default=None).
`EXPORT_WINS_HAWK_KEY`	No	The hawk key to use when making a request to the Export Wins API (default=None).
`EXTRA_DJANGO_APPS`	Yes	Additional Django apps to load (comma-separated). Can be used to reverse the migrations of a removed third-party app (see comment in config/settings/common.py for more detail).
`INTERACTION_ADMIN_CSV_IMPORT_MAX_SIZE`	No	Maximum file size in bytes for interaction admin CSV uploads (default=2MB).
`INTERACTION_NOTIFICATION_API_KEY`	Yes
`INVESTMENT_DOCUMENT_AWS_ACCESS_KEY_ID`	No	Same use as DEFAULT_BUCKET_AWS_ACCESS_KEY_ID, but for investment project documents.
`INVESTMENT_DOCUMENT_AWS_SECRET_ACCESS_KEY`	No	Same use as DEFAULT_BUCKET_AWS_SECRET_ACCESS_KEY, but for investment project documents.
`INVESTMENT_DOCUMENT_AWS_REGION`	No	Same use as DEFAULT_BUCKET_AWS_DEFAULT_REGION, but for investment project documents.
`INVESTMENT_DOCUMENT_BUCKET`	No	S3 bucket for investment project documents storage.
`INVESTMENT_NOTIFICATION_ADMIN_EMAIL`	Yes
`INVESTMENT_NOTIFICATION_API_KEY`	Yes
`INVESTMENT_NOTIFICATION_ESTIMATED_LAND_DATE_TEMPLATE_ID`	Yes	An ID of Notify Template for Estimated Land Date notifications
`INVESTMENT_NOTIFICATION_ESTIMATED_LAND_DATE_SUMMARY_TEMPLATE_ID`	Yes	An ID of Notify Template for Estimated Land Date summary notifications
`INVESTMENT_NOTIFICATION_NO_RECENT_INTERACTION_TEMPLATE_ID`	Yes	An ID of Notify Template for No Recent Investment Interaction notifications
`EXPORT_NOTIFICATION_NO_RECENT_INTERACTION_TEMPLATE_ID`	Yes	An ID of Notify Template for No Recent Export company Interaction notifications
`EXPORT_NOTIFICATION_NO_INTERACTION_TEMPLATE_ID`	Yes	An ID of Notify Second template for No Recent Export company Interaction notifications
`EXPORT_NOTIFICATION_NO_RECENT_INTERACTION_TEMPLATE_ID`	Yes	An ID of Notify Template for No Recent Export company Interaction notifications
`EXPORT_WIN_NOTIFICATION_API_KEY`	Yes
`EXPORT_WIN_CLIENT_RECEIPT_TEMPLATE_ID`	Yes	An ID of Notify Template for Export Win Client Email Receipt notifications
`EXPORT_WIN_LEAD_OFFICER_APPROVED_TEMPLATE_ID`	Yes	An ID of Notify Template for Export Win Lead Officer Approved notifications
`EXPORT_WIN_LEAD_OFFICER_REJECTED_TEMPLATE_ID`	Yes	An ID of Notify Template for Export Win Lead Officer Rejected notifications
`TASK_REMINDER_EMAIL_TEMPLATE_ID`	No	An ID of Notify Template for the generic Task reminder notifications
`TASK_NOTIFICATION_FROM_OTHERS_TEMPLATE_ID`	Yes	An ID of Notify Template for Task assigned by others notifications
`MAILBOX_INGESTION_CLIENT_ID`	No	An OAuth Client ID for Email Ingestion Exchange Server
`MAILBOX_INGESTION_CLIENT_SECRET`	No	An OAuth Client Secret for Email Ingestion Exchange Server
`MAILBOX_INGESTION_EMAIL`	No	The email address for Email Ingestion Exchange Server
`MAILBOX_INGESTION_GRAPH_URL`	No	Graph API URL for Email Ingestion Exchange Server
`MAILBOX_INGESTION_TENANT_ID`	No	A Tenant ID for Email Ingestion Exchange Server
`MAILBOX_INGESTION_FAILURE_TEMPLATE_ID`	No	An ID of Notify Template for mailbox ingestion failure
`MAILBOX_INGESTION_SUCCESS_TEMPLATE_ID`	No	An ID of Notify Template for mailbox ingestion success
`MARKET_ACCESS_ACCESS_KEY_ID`	No	A non-secret access key ID used by the Market Access service to access Hawk-authenticated public company endpoints.
`MARKET_ACCESS_SECRET_ACCESS_KEY`	If `MARKET_ACCESS_ACCESS_KEY_ID` is set	A secret key used by the Market Access service to access Hawk-authenticated public company endpoints.
`NOTIFICATION_SUMMARY_THRESHOLD`	No	Number of notification items that trigger sending a summary email. (default=5)
`OMIS_PUBLIC_ACCESS_KEY_ID`	No	A non-secret access key ID, corresponding to `OMIS_PUBLIC_SECRET_ACCESS_KEY`. The holder of the secret key can access the OMIS public endpoints by Hawk authentication.
`OMIS_NOTIFICATION_ADMIN_EMAIL`	Yes
`OMIS_NOTIFICATION_API_KEY`	Yes
`OMIS_NOTIFICATION_OVERRIDE_RECIPIENT_EMAIL`	No
`OMIS_PUBLIC_BASE_URL`	Yes
`OMIS_PUBLIC_SECRET_ACCESS_KEY`	If `OMIS_PUBLIC_ACCESS_KEY_ID` is set	A secret key, corresponding to `OMIS_PUBLIC_ACCESS_KEY_ID`. The holder of this key can access the OMIS public endpoints by Hawk authentication.
`OPENSEARCH_INDEX_PREFIX`	Yes	Prefix to use for indices and aliases
`OPENSEARCH_SEARCH_REQUEST_TIMEOUT`	No	Timeout (in seconds) for searches (default=20).
`OPENSEARCH_SEARCH_REQUEST_WARNING_THRESHOLD`	No	Threshold (in seconds) for emitting warnings about slow searches (default=10).
`OPENSEARCH_VERIFY_CERTS`	No
`OPENSEARCH_POOL_MAXSIZE`	No	The OpenSearch Python client max connection pool
(default=10).
`PAAS_IP_ALLOWLIST`	No	IP addresses (comma-separated) that can access the Hawk-authenticated endpoints.
`REDIS_BASE_URL`	No	redis base URL without the db
`REDIS_CACHE_DB`	No	redis db for django cache (default 0)
`REPORT_AWS_ACCESS_KEY_ID`	No	Same use as DEFAULT_BUCKET_AWS_ACCESS_KEY_ID, but for reports.
`REPORT_AWS_SECRET_ACCESS_KEY`	No	Same use as DEFAULT_BUCKET_AWS_SECRET_ACCESS_KEY, but for reports.
`REPORT_AWS_REGION`	No	Same use as DEFAULT_BUCKET_AWS_DEFAULT_REGION, but for reports.
`REPORT_BUCKET`	No	S3 bucket for report storage.
`SECTOR_ENVIRONMENT`	No	Only set to "production", in production environment. Otherwise it can be left empty.
`SENTRY_ENVIRONMENT`	Yes	Value for the environment tag in Sentry.
`SKIP_OPENSEARCH_MAPPING_MIGRATIONS`	No	If non-empty, skip applying OpenSearch mapping type migrations on deployment.
`SLACK_API_TOKEN`	No	(Required if `ENABLE_SLACK_MESSAGING` is truthy) Auth token for connection to Slack API for purposes of sending messages through the datahub.core.realtime_messaging module
`SLACK_MESSAGE_CHANNEL`	No	(Required if `ENABLE_SLACK_MESSAGING` is truthy) Name (or preferably ID) of the channel into which datahub.core.realtime_messaging should send messages
`SSO_ENABLED`	Yes	Whether single sign-on via RFC 7662 token introspection is enabled
`STAFF_SSO_AUTH_TOKEN`	If SSO enabled	Access token for the Staff SSO API.
`STAFF_SSO_BASE_URL`	If SSO enabled	The base URL for the Staff SSO API.
`STAFF_SSO_REQUEST_TIMEOUT`	No	Staff SSO API request timeout in seconds (default=5).

Management commands

If using Docker, remember to run these commands inside your container by prefixing them with docker-compose run api.

Database

Apply migrations

For the default database

./manage.py migrate

Create django-reversion initial revisions

If the database is freshly built or a new versioned model is added run:

./manage.py createinitialrevisions

Load initial metadata

These commands are generally only intended to be used on a blank database.

./manage.py loadinitialmetadata

OpenSearch

Update indexes and mapping types

To create missing OpenSearch indexes and migrate modified mapping types:

./manage.py migrate_search

This will also resync data (using RQ) for any newly-created indexes.

See docs/OpenSearch migrations.md for more detail about how the command works.

Resync all OpenSearch records

To resync all records using RQ:

./manage.py sync_search

To resync all records synchronously (without RQ running):

./manage.py sync_search --foreground

You can resync only specific models by using the --model= argument.

./manage.py sync_search --model=company --model=contact

For more details including all the available choices:

./manage.py sync_search --help

Dependencies

See Managing dependencies for information about installing, adding and upgrading dependencies.

Activity Stream

The /v3/activity-stream/* endpoints are protected by two mechanisms:

IP address allowlisting via the X-Forwarded-For header, with a comma separated list of allowlisted IPs in the environment variable PAAS_IP_ALLOWLIST.
Hawk authentication via the Authorization header, with the credentials in the environment variables ACTIVITY_STREAM_ACCESS_KEY_ID and ACTIVITY_STREAM_SECRET_ACCESS_KEY.

IP address allowlisted

The authentication blocks requests that do not have a allowlisteded IP in the second-from-the-end IP in X-Forwarded-For header. In general, this cannot be trusted. However, in PaaS, this can be, and this is the only production environment. Ideally, this would be done at a lower level than HTTP, but this is not possible with the current architecture.

If making requests to this endpoint locally, you must manually add this header or disable the check using DISABLE_PAAS_IP_CHECK environment variable.

Hawk authentication

In general, Hawk authentication hashing the HTTP payload and Content-Type header, and using a nonce, are both optional. Here, as with the Activity Stream endpoints in other DIT projects, both are required. Content-Type may be the empty string, and if there is no payload, then it should be treated as the empty string.

Emails

Email templates are created in GOV UK Notify. You can create a personal account for testing new templates or template changes locally. To test locally you need to update the API Keys and template ID environment variables.

Updating emails

Only update the live email templates once your changes are pushed live. Extra parameters passed to email templates are ignored but not sending parameters required by an email template will cause the emails to error.

Example of environment variables for updating the interaction notification email template:

INTERACTION_NOTIFICATION_API_KEY=<noftify-api-key>
EXPORT_NOTIFICATION_NEW_INTERACTION_TEMPLATE_ID=<template-id>

Name		Name	Last commit message	Last commit date
Latest commit History 11,915 Commits
.bin.sample		.bin.sample
.circleci		.circleci
.copilot		.copilot
.github		.github
.localstack		.localstack
config		config
datahub		datahub
docs		docs
fixtures		fixtures
grafana		grafana
templates/admin		templates/admin
.adr-dir		.adr-dir
.cfignore		.cfignore
.coveragerc		.coveragerc
.dockerignore		.dockerignore
.editorconfig		.editorconfig
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
.pylintrc		.pylintrc
CHANGELOG.md		CHANGELOG.md
CODEOWNERS		CODEOWNERS
Dockerfile		Dockerfile
LICENCE		LICENCE
Makefile		Makefile
Procfile		Procfile
README.md		README.md
app.py		app.py
codecov.yml		codecov.yml
conftest.py		conftest.py
cron-scheduler.py		cron-scheduler.py
data_generator.py		data_generator.py
docker-compose-minimal.yml		docker-compose-minimal.yml
docker-compose-rq-monitor.yml		docker-compose-rq-monitor.yml
docker-compose.single-network.yml		docker-compose.single-network.yml
docker-compose.yml		docker-compose.yml
long-running-worker.py		long-running-worker.py
manage.py		manage.py
pytest.ini		pytest.ini
requirements-dev.in		requirements-dev.in
requirements-dev.txt		requirements-dev.txt
requirements.in		requirements.in
requirements.txt		requirements.txt
rq-run.sh		rq-run.sh
runtime.txt		runtime.txt
sample.env		sample.env
setup-uat.sh		setup-uat.sh
setup.cfg		setup.cfg
short-running-worker.py		short-running-worker.py
start-dev.sh		start-dev.sh
start-uat.sh		start-uat.sh
tests.sh		tests.sh
web.sh		web.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Data Hub API

Installation with Docker

Installation with docker of data-hub-api, data-hub-frontend and dnb-service on same network

Native installation (without Docker)

API documentation

Local development

Obtaining an API access token

Granting access to the front end

Granting access to machine-to-machine clients

Deployment

Management commands

Database

Apply migrations

For the default database

Create django-reversion initial revisions

Load initial metadata

OpenSearch

Update indexes and mapping types

Resync all OpenSearch records

Dependencies

Activity Stream

IP address allowlisted

Hawk authentication

Emails

Updating emails

About

Releases 642

Packages

Contributors 81

Languages

License

uktrade/data-hub-api

Folders and files

Latest commit

History

Repository files navigation

Data Hub API

Installation with Docker

Installation with docker of data-hub-api, data-hub-frontend and dnb-service on same network

Native installation (without Docker)

API documentation

Local development

Obtaining an API access token

Granting access to the front end

Granting access to machine-to-machine clients

Deployment

Management commands

Database

Apply migrations

For the default database

Create django-reversion initial revisions

Load initial metadata

OpenSearch

Update indexes and mapping types

Resync all OpenSearch records

Dependencies

Activity Stream

IP address allowlisted

Hawk authentication

Emails

Updating emails

About

Topics

Resources

License

Security policy

Stars

Watchers

Forks

Releases 642

Packages 0

Contributors 81

Languages

Packages