This guide will provide instruction and information on setting up the development environment for the BioData Catalyst (BDC) Data Management Core (DMC) Data Submission Tracker (DST). This will include installing prerequisites, setting up a development environment, building the container, deploying to BioData Catalyst, running tests/linting, and information about contributing to development.
At this point all installation is managed system-wide. I think this is a poor way to manage a development environment. I would prefer that we use an environmental encapsulation of some sort, such as poetry or pyenv. We will try to move towards some encapsulation after we have an initial working environment.
To contribute to the project, follow the steps outlined in the Setup the Development Environment section to create a local development environment. Once your environment is set up, you can make changes to the codebase and submit pull requests for review. If you encounter any issues during the setup process or while working on the project please submit an issue and describe the steps you are having trouble with.
Setup for the development environment requires satisfying some prerequisites and dependencies in order to provide services. First we will need to satisfy the Prerequisites, including Dependencies, Optional Dependencies, and Provision PostgreSQL. Please follow these instructions to properly set up and verify a functioning development environment.
In order to build and run the Data Submission tracker dependencies will need to be installed and requirements for the environment will need to be prepared. First we will describe the required dependencies, after these are installed we can set up the appropriate environment to build, run, and test the DST.
The DST dependencies are necessary both for building and deployment as well as for development. The primary development environment is currently Ubuntu 22.04 but other environments will be added as requested. Currently, development of the Data Submission Tracker requires these software tools available or installed on the development system.
- Docker
See Install Prerequisites section for how to install and set up each of these dependencies.
Because the Data Submission Tracker runs in a Docker container and all the relevant code is run within the container these additional dependencies are not strictly necessary for installation on the development system. However, for testing and troubleshooting we recommend also installing the following dependencies.
- Python v3.10.6 or higher
- Django v4.1.4 - higher version not currently recommended
I recommend using pyenv and venv to manage the python version and virtual environment. For detailed instructions on how I set up my Python development see Python Development Environment.
The DST requires an external PostgreSQL accessed via HTML. Previously, this project used an Ansible script to set up the project on an existing Google Cloud Platform Compute instance (for the last commit with this provisioning see #last_gcp). The project now uses Docker to set up the PostgreSQL database. This is a more portable solution and allows for easier testing and development.
With the prerequisites installed, we are now ready to clone and set up the repository for development. Navigate to where you want the Git repository located on your system and clone the repository with the following command.
git clone [email protected]:amc-corey-cox/BDC_Dashboard.git
cd BDC_Dashboard
I recommend setting up pyenv and a virtual environment using venv for development. See Python Development Environment for more information on how I set up my Python development environment. If you are following my recommendations you can set up a local pyenv and virtual environment with the following commands.
pyenv install 3.11.1
pyenv local 3.11.1
python -m venv .venv
source venv/bin/activate
poetry install
For development purposes a number of environment variables need to be set. In the api
folder create a .env
file with the following data.
# Environment variables
# For local development, the api directory should have an .env file with the following:
# Set to True for local dev and False for prod
DEBUG=True
# Set to DEBUG for local dev and INFO for prod
DJANGO_LOG_LEVEL=DEBUG
# The SECRET_KEY generated by Django
SECRET_KEY=
# The Postgres database name
POSTGRES_DB=tickets
# The username of the Postgres User
POSTGRES_USER=bdc_db_user
# A (secure) password for the Postgres User
POSTGRES_PASSWORD=
# The external IP for the Compute Engine instance with Postgres
POSTGRES_HOST=bdc-dashboard-db
# The port for the Postgres Database
POSTGRES_PORT=5432
# The base URL for the Jira API
JIRA_BASE_URL='https://'
# The token for Authorization in the Jira API
JIRA_TOKEN=''
# The ID of the board in Jira where data will be collected
JIRA_BOARD_ID=''
# The project ID for the Jira data
JIRA_PROJECT=''
# The issue type for epic issues in Jira
JIRA_EPIC_ISSUETYPE=10000
You will need to update SECRET_KEY
, POSTGRES_PASSWORD
, JIRA_BASE_URL
, JIRA_TOKEN
, JIRA_BOARD_ID
, and JIRA_PROJECT
with settings appropriate to your configuration. If you don't have a Django SECRET_KEY
you can create one with the following command (Django local installation required).
python -c 'from django.core.management.utils import get_random_secret_key; print(get_random_secret_key())'
Other settings may need to be adjusted depending on your environment or during deployment.
With the PostgreSQL database set up and running and the environment variables set, the repository should be ready for development. To test the development environment, we will build the Docker container and access the application.
First, build the Docker container using docker-compose
.
docker-compose up --build -d
To access the application navigate to http://localhost:8000/
in your browser. You should see a login screen for the application with a button for NIH login
. Login will not work at this time. In order to log in to the application we'll need to set the Django superuser. First, enter the local Docker container shell.
docker exec -it bdc-dashboard-app /bin/bash
Then create a Django superuser on the Docker container.
python manage.py createsuperuser
Create a superuser with your desired credentials, generally your e-mail address and a password.
After creating the superuser you can authenticate on the Django app by navigating to http://localhost:8000/admin
. Once authenticated, access the app at http://localhost:8000/
to navigate the full application site. If all of these steps are successful you are now ready to begin development on the DMC Tracker app.
Docker is an application containerization environment that allows software to be built in containers and deployed in different environments reducing dependencies and creating a more secure runtime environment by virtue of isolation from the host architecture. The Docker ecosystem provides both tools to create a container image and an engine to run those images on a target system. For basic build and development you will only need the container image creation tools. However, for proper testing and to allow access to the software in a development environment we will install both the image creation and engine portions.
Some distributions have unofficial Docker packages installed or dependencies that Docker will install separately. We need to uninstall these to prevent conflicts.
for pkg in docker.io docker-doc docker-compose podman-docker containerd runc; do sudo apt remove -y $pkg; done
I had a stub installation to satisfy another packages spurious dependency. Here's how to check if a docker command still exists.
command -v docker
If the command still exists it is probably a stub. To check try running it. If there is no output, or a message that it isn't a real docker installation, check the file by opening it. If there is a stub file delete it. Replace '.local/bin/docker' with the path to your docker stub from the command above.
rm -rf .local/bin/docker
Docker requires a 64-bit kernel (common on modern systems), 4 GB of RAM, configuring ID mapping in user namespaces enabled. The Docker Desktop also requires a systemd init, and a desktop environment.
Docker requires KVM virtualization support and QEMU version 5.2 or newer, latest
recommended.
First check KVM support by loading the module with the following command.
modprobe kvm
Then load the module specific to your systems' processor.
modprobe kvm_intel # Intel processors
modprobe kvm_amd # AMD processors
If no errors are reported double-check the modules are enabled.
lsmod | grep kvm
Output for this command should look similar to that below.
kvm_amd 167936 0
ccp 126976 1 kvm_amd
kvm 1089536 1 kvm_amd
irqbypass 16384 1 kvm
Next check ownership of the kvm device.
ls -al /dev/kvm
Add your user to the kvm group in order to access the kvm device.
sudo usermod -aG kvm $USER
You can check to make sure your user was added to the kvm group.
grep kvm /etc/group
Docker recommends updating QEMU to the latest version and requires at least version 5.2. Check your current version of QEMU.
/usr/bin/qemu-system-x86_64 --version
If this gives an error (mine did), you need to install QEMU.
sudo apt install -y qemu-system-x86
Unless you experience problems it is probably best to use the version of QEMU that is installed by your distribution. You can check the version.
kvm --version
For Ubuntu, the current version is 6.2. I'm currently using this for development and will update this file if I have any problems or decide to upgrade. The latest version as of this writing is 8.0.2.
I have also installed some other recommended virtualization packages that may be useful or necessary for running and testing VMs locally.
sudo apt install -y qemu-kvm libvirt-clients libvirt-daemon-system bridge-utils virtinst libvirt-daemon
These may not be required and could even create conflicts but all the information I found on installing QEMU suggested installing these as well. These sources also recommend enabling libvirtd.
sudo systemctl enable --now libvirtd
Sources also recommended installing virt-manager
but I'll be using Docker Desktop to manage VMs so I'm skipping this for now.
You can install the Docker packages from a package by downloading the package from the Docker Linux Install page. I prefer to manage my installation with apt.
Make sure everything is up-to-date and allow using a repository over HTTPS
sudo apt update
sudo apt install -y ca-certificates curl gnupg
We need the Docker official GPG public key to use their apt repository.
sudo install -m 0755 -d /etc/apt/keyrings
curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo gpg --dearmor -o /etc/apt/keyrings/docker.gpg
sudo chmod a+r /etc/apt/keyrings/docker.gpg
This will set up the Docker Apt Repository allowing ongoing updates of Docker using the system software updater.
echo \
"deb [arch="$(dpkg --print-architecture)" signed-by=/etc/apt/keyrings/docker.gpg] https://download.docker.com/linux/ubuntu \
"$(. /etc/os-release && echo "$VERSION_CODENAME")" stable" | \
sudo tee /etc/apt/sources.list.d/docker.list > /dev/null
We need to update the Apt cache in order to install from the Docker repository.
sudo apt update
This should show a line accessing downloader.docker.com for the systems installed release.
Now we can install Docker Engine, containerd, and Docker Compose. This will install the latest version, which is currently version 24.0.2.
sudo apt install -y docker-ce docker-ce-cli containerd.io docker-buildx-plugin docker-compose docker-compose-plugin
If you need to install a different version see the Docker Engine Installation
Now let's test the docker installation.
sudo docker run hello-world
Docker requires root access to run. This is a security risk and we need to fix it. The recommended way to do this is to add your user to the docker group. This will allow your user to run docker commands without sudo.
sudo usermod -aG docker ${USER}
You will then need to log in again or run the command below to gain the group permissions.
newgrp docker
Docker Desktop is a GUI for managing Docker containers and VMs.
sudo apt install -y gnome-terminal
sudo apt remove docker-desktop
rm -r $HOME/.docker/desktop
sudo rm /usr/local/bin/com.docker.cli
sudo apt purge docker-desktop
Download the latest version of Docker Desktop from the Docker Desktop page. The latest version as of this writing is 4.1.1.
sudo apt update
sudo apt install -y ./docker-desktop-<version>-<arch>.deb
The DST is written in Python and uses the Django framework. This section will cover setting up a Python development environment for the DST. Because of the complexity of this project, I recommend setting up pyenv, venv, and poetry to manage the Python environment. This will also allow you to install the exact versions of Python and Python packages required for the DST.
pyenv is a Python version manager. It allows you to install and manage multiple versions of Python on the same system. It also allows you to install the exact version of Python required for a project. This allows you to use the same version of Python in development and as the DST will use in production.
First, we need to install the python build dependencies.
sudo apt update; sudo apt install -y build-essential libssl-dev zlib1g-dev \
libbz2-dev libreadline-dev libsqlite3-dev curl \
libncursesw5-dev xz-utils tk-dev libxml2-dev libxmlsec1-dev libffi-dev liblzma-dev
Now we can install pyenv. This will install pyenv to the ~/.pyenv
directory.
curl https://pyenv.run | bash
Now we need to add pyenv to the PATH. Add the following to the end of your ~/.bashrc
file.
export PYENV_ROOT="$HOME/.pyenv"
export PATH="$PYENV_ROOT/bin:$PATH"
eval "$(pyenv init --path)"
Now we can install the version of Python required for the DST.
pyenv install 3.11.1
To use pyenv local to your repository you can run the following command in the root of your repository, cd /path/to/repository
.
pyenv local 3.11.1
venv is a Python module that allows you to create virtual environments for Python. This allows you to install Python packages for a specific project without affecting the system Python installation. This is useful for isolating the Python environment for the DST.
To create a virtual environment for the DST run the following command in the root of your repository.
python -m venv .venv
I create a file named venv_name.txt
in the root of the venv directory. This file contains the name of the virtual environment. I use this in my .bashrc to read the name and track which virtual environment I have active in $PS1. This is optional but I find it useful. If you would like to know how to do this please reach out and I'll share my .bashrc.
echo "dst" > .venv/venv_name.txt
To activate the virtual environment run the following command in the root of your repository.
source .venv/bin/activate
To deactivate the virtual environment run the following command in the root of your repository.
deactivate
Poetry is a Python dependency manager. It allows you to manage Python dependencies for a project. Poetry will also create a virtual environment for the project and install the dependencies in that environment. This allows you to install the exact versions of dependencies required for the DST.
Install poetry with pip in the virtual environment then initialize with the following commands.
pip install poetry
poetry init
Set up the
Reformat requirements.txt and install in Poetry virtual environment with the following command.
poetry add $(sed -E 's/;.*$//; s/\[.*\]//g' api/requirements.txt)
poetry add --extras "grpc" google-api-core
poetry add --extras "crypto" pyjwt
poetry install