A quick and easy install of the Station Demand Forecasting Tool on any modern operating system, using Docker.
It is assumed that you have Docker installed and running on a host computer. This could be a virtual machine in a cloud environment, such as Google Cloud, AWS, Microsoft Azure, or Digital Ocean. For testing purposes or a very simple model run (a single station) you could use your own computer, provided it has sufficient resources available.
We have had good results using a DigitalOcean CPU-Optimized virtual machine with 16 or 32 CPUs and using the Docker image created by DigitalOcean that's available in the Marketplace. DigitalOcean virtual machines are charged by the second (whether running or not), and the hourly charge for the CPU-Optimized 32 CPU VM is just under $1.00. For new users, this link will get you a $100 (60-day) credit.
The Docker implementation consists of two containers:
-
An instance of Rstudio Server with all the required packages and dependencies installed, including the sdft R package.
-
An instance of PostgreSQL server with the PostGIS and pgRouting extensions installed, and all the database tables required by the model.
Images for these containers are available via the Docker Hub. There is no need to clone this repository or generate the images yourself.
-
Copy the docker-compose.yml file from the repository to the host computer (place it in a directory called
sdft-docker
). -
Edit
docker-compose.yml
and replace the two instances ofyour_password
with a password of your choice. This will set the postgres user password and the rstudio user password. -
Edit
docker-compose.yml
and amend the entry:volumes: - c:/sdft:/home/rstudio
You should replace
c:/sdft
with the path to a suitable directory on the host computer. In the example above, it is a windows host and the location isc:/sdft
. This folder will be used to read the input files for a model job. It will also be used to write the outputs from a model job. The location needs to be readable and writeable by the 'rstudio' user in the Rstudio container. On Linux, you could create a new directory at root level and change its ownership to user id 1000 (this is the id of the 'rstudio' user):mkdir /sdft/ chown 1000 /sdft/
and then amend
docker-compose.yml
as follows:volumes: - /sdft:/home/rstudio
-
You may want to check for the most recent version of the images on Docker Hub. You can check using these links:
Then amend the image tags in
docker-compose.yaml
as required. -
In terminal or command prompt change to your
sdft-docker
folder. -
Run:
docker-compose up -d
-
The images will be downloaded from Docker Hub and inflated; if they are not already stored locally.
-
On successful completion you should see the following:
Creating network "sdft-docker_postgresql" with driver "bridge" Creating sdft-db ... done Creating sdft-ui ... done
-
You can now connect to the Rstudio server at http://locahost:8787. Logon with the user
rstudio
and the password you entered indocker-compose.yml
earlier. If you are installing on a cloud-based VM then you will usually want to configure a tunnel in your SSH client to forward port 8787 on your local computer to port 8787 on your VM (By default, the DigitalOcean Docker image will allow external access from any Internet host to http://yourdropletipaddress:8787). -
On the PostgreSQL container the database will now be created and the required tables and indexes generated. This process may take some time (perhaps 30-60 minutes). Due to the size of the database once fully generated (around 10GB), this is the most practical method of distribution. This process is only carried out the first time the sdft-db container is built. You can stop and start this container and the database will be preserved.
-
To check on progress, you can attach to the container by running:
docker attach --sig-proxy=false sdft-db
from the command prompt or terminal. The--sig-proxy=false
option allows you to usectrl-c
to detach from the container without stopping it. Once the database initialisation is complete, you will see the following (assuming you're still attached to the container):PostgreSQL init process complete; ready for start up. 2020-09-14 13:44:12.985 UTC [1] LOG: starting PostgreSQL 12.4 (Debian 12.4-1.pgdg100+1) on x86_64-pc-linux-gnu, compiled by gcc (Debian 8.3.0-6) 8.3.0, 64-bit 2020-09-14 13:44:12.985 UTC [1] LOG: listening on IPv4 address "0.0.0.0", port 5432 2020-09-14 13:44:12.985 UTC [1] LOG: listening on IPv6 address "::", port 5432 2020-09-14 13:44:12.991 UTC [1] LOG: listening on Unix socket "/var/run/postgresql/.s.PGSQL.5432" 2020-09-14 13:44:13.012 UTC [1457] LOG: database system was shut down at 2020-09-14 13:44:12 UTC 2020-09-14 13:44:13.018 UTC [1] LOG: database system is ready to accept connections
This provides brief instructions on running a job in testing mode. Please consult the documentation for further information.
-
In your browser connect to the Rstudio server at http://localhost:8787.
-
Copy the example job submission CSV files from the SDFT R package to
/home/rstudio/input
by running:dir.create("/home/rstudio/input") files <- list.files(file.path(system.file(package = "sdft"), "example_input"), full.names = TRUE) file.copy(from=files, to="/home/rstudio/input")
-
The example files will run a job in testing mode. The testing mode only considers a very small catchment area for a proposed station to speed up processing.
-
Edit
config.csv
and set the number of processor cores to be used for the job. As a minimum, 4 are required. Save the file. -
Provide your password for the postgres user, using
key_set()
from the keyring package:library(keyring) key_set("postgres")
You will first be asked to provide a password for the 'system' keyring. This is a new password and you can provide whatever you like here. You will then be prompted for the password to set for the "postgres" service. You should enter the password that you set earlier in the
docker-compose.yml
file. -
You are now ready to submit the test job:
library(sdft) sdft_submit(dbhost = "sdft-db", dirpath = "/home/rstudio")
-
A log file,
sdr.log
, will be generated in the/home/rstudio/output
folder and will be updated while the job runs. -
When the job is complete, several files are created in a subfolder of the output folder. The subfolder takes the name of the job as specified in the
config.csv
file. The set of files, assuming you used the example input as provided, are as follows:station_forecast.csv
contains the model forecast (remember, this will not be a correct forecast when testing mode is used)exogenous.csv
contains a summary of the input exogenous datahelst1_catchment.geojson
contains the postcode centroids for the station's probabilistic catchment, along with the probability of the station being chosen as an attribute to each centroid. A interpolated surface can be generated using QGIS to visualise the catchment.sdr.log
contains information on the job run. The level of detail will depend on theloglevel
set in theconfig.csv
file. This is set to DEBUG for the example test run.
To report an issue, bug, or to make a feature request, please create an issue in this repository.
Please see the full documentation.