Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add preview workflow #1909

Draft
wants to merge 4 commits into
base: master
Choose a base branch
from
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
105 changes: 105 additions & 0 deletions .github/workflows/preview.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,105 @@
on:
pull_request:
branches:
- "*"

jobs:
changed_files:
runs-on: ubuntu-latest
outputs:
new_study_dirs: ${{ steps.new-dirs.outputs.NEW_DIR_LOCATIONS }}
steps:
- name: Get New Directories Added
id: changed-files-dir-names
uses: tj-actions/changed-files@v38
with:
dir_names: "true"

- name: Extract Parent Directory Names of New Studies
id: new-dirs
shell: bash
run: |
echo "NEW_DIR_LOCATIONS=$( echo ${{ steps.changed-files-dir-names.outputs.added_files }} | sed 's/[^ ]*\/case_lists[^ ]*//g' )" >> $GITHUB_OUTPUT

preview:
needs: changed_files
runs-on: ubuntu-latest
container: docker.io/okteto/okteto:2.19.1
steps:
- name: Install Git LFS
run: apk update && apk add git-lfs

- name: Split Directories Into New Line Pattern For Sparse Checkout
id: split
run: |
echo "STUDY_DIRS<<EOF" >> $GITHUB_ENV
echo "$( echo ${{ needs.changed_files.outputs.new_study_dirs }} | tr ' ' '\n')" >> $GITHUB_ENV
echo "EOF" >> $GITHUB_ENV

- name: Checkout Datahub Repository
uses: actions/checkout@v4
with:
lfs: true
sparse-checkout-cone-mode: false
sparse-checkout: |
preview/*
${{ env.STUDY_DIRS }}

- name: Copy New Files to Study Directory
shell: bash
run: |
for dir in ${{ needs.changed_files.outputs.new_study_dirs }}; do
study_name=$( cut -d "/" -f2- <<< ${dir}) # remove 'public/' from string
cp -v -R ${dir} preview/cbioportal-docker-compose/study/${study_name}
done

- name: Context
uses: okteto/context@latest
with:
url: ${{secrets.OKTETO_URL}}
token: ${{ secrets.OKTETO_TOKEN }}

- name: Okteto Build to Import Studies
working-directory: preview/cbioportal-docker-compose
run: |
okteto build --no-cache -t okteto.dev/cbioportal-docker-compose-cbioportal:okteto-with-volume-mounts cbioportal

- name: Deploy Preview Environment
uses: okteto/deploy-preview@latest
env:
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
with:
name: pr-${{ github.event.number }}-justinjao
file: preview/cbioportal-docker-compose/docker-compose.yml
timeout: 15m

- name: Wait For Response From Preview Instance
uses: nev7n/[email protected]
with:
url: 'https://cbioportal-pr-${{ github.event.number }}-justinjao.cloud.okteto.net/'
responseCode: 200
timeout: 600000 # 10 minutes
interval: 30000 # 30 seconds

- name: Activate Namespace
uses: okteto/namespace@latest
with:
namespace: pr-${{github.event.number}}-justinjao

- name: Run Metaimport Script to Import Study Using Kubectl
id: import-study
continue-on-error: true
shell: bash
run: |
okteto kubeconfig
for dir in ${{ needs.changed_files.outputs.new_study_dirs }}; do
study_name=$( cut -d "/" -f2- <<< ${dir})
kubectl exec -it deployment/cbioportal -- metaImport.py -u http://localhost:8080 -s study/${study_name} -o
done

- name: Add PR Comment if Import Failed
uses: mainmatter/continue-on-error-comment@v1
with:
repo-token: ${{ secrets.GITHUB_TOKEN }}
outcome: ${{ steps.import-study.outcome }}
test-id: Error code 1
19 changes: 19 additions & 0 deletions .github/workflows/preview_closed.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
on:
pull_request:
types:
- closed

jobs:
destroy-pr-env:
runs-on: ubuntu-latest
steps:
- name: Context
uses: okteto/context@latest
with:
token: ${{ secrets.OKTETO_TOKEN }}
url: ${{ secrets.OKTETO_URL }}

- name: Destroy Preview Environment
uses: okteto/destroy-preview@latest
with:
name: pr-${{ github.event.number }}-justinjao
73 changes: 73 additions & 0 deletions docs/Preview_Overview.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,73 @@
# Preview Environment Project Documentation

To streamline the quality control process of reviewing external data sources, cBioPortal seeks to automate the deployment of a live staging instance with the new studies already imported. The Preview Environment workflow utilizes a combination of infrastructure-as-code tooling (`docker-compose`, `GitHub Actions`), and a fully-managed developer environment provisioned by Okteto (which runs via a `GitHub Action`).

The specification for building the containerized web applications is primarily accomplished through the configuration provided by the [`cbioportal-docker-compose`](https://github.com/cBioPortal/cbioportal-docker-compose) repository.

Okteto can appropriately parse the `docker-compose.yml` file, but to use these with Okteto to deploy a working staging instance of cBioPortal requires a few additional things to be set up in the infrastructure:

* configuration files for the cbioportal instance
* initial files and SQL commands to seed the initial SQL database
* have these files already present in an accessible image (for reasons outlined below).

These requirements are thus achieved through an initial setup procedure, followed by 2 workflow files, `preview.yml`, which deploys the staging environment with the study imported, and `preview_closed.yml`, which tears down the deployed instance.

The following document outlines each of the steps of the workflow, and provides insight where necessary as to why they were designed in such a manner.

### Initial Setup

![image](images/Intial_Setup.svg)

The [`cbioportal-docker-compose`](https://github.com/cBioPortal/cbioportal-docker-compose) repository contains the files necessary to build a fully running instance of `cbioportal`. However, to get this to run, and to allow Okteto to access this in their `Deploy Preview` action, certain configuration files (such as the `portal.properties` document), and data to seed the SQL database, must be present.

The `cbioportal-docker-compose` repository contains a script, `init.sh`, that initializes these files. After checking out the repository, this script can be executed, and an `okteto build` command can be run to push the image.

This step is thus performed for 2 reasons: to initialize the namespace in the private Okteto registry, as well as to push an image with the configuration files and seeded DB files already in the image.

For a more detailed breakdown of the steps necessary here, as well as other prerequisites to getting this setup, see [Preview Setup](Preview_Setup.md).

### Workflow Overview

In brief, the proposed workflow works by pulling the previously described pre-built images of cBioPortal (with the configuration files added) and the cBioPortal SQL database (with the database files for seeding already added as well). These images are used to launch a container with the new studies imported as well, and this container is re-built to the same namespace registry. The `Deploy Preview` action by Okteto then pulls this image and uses it to deploy the preview environment. Upon which, a `metaImport.py` script can be run to import the study into the running application instance.

![Preview](images/Preview2.svg)

### Identify and Get New Files (2)
The datahub repo is very large, and most files are stored using [Git LFS](https://git-lfs.com/), which by default doesn't store the actual file content in the repository (rather, they are references to the data, which is stored elsewhere). However, when importing an actual study, we need the actual file data. As a result, it is not possible to check out the entire repository's files during a GitHub Action workflow (the runner runs out of space). We therefore need to know which exact studies are the newly added ones that we want to preview, and checkout only these specific files from the repository. This also has the added benefit of making the review process for previewing the studies easier, as only new studies are imported into the staging instance.

The workflow uses a combination of various actions to determine the new directories added, and some text processing to convert them to strings in the proper format for input to the `checkout` action step. It is then checked out using Git's [sparse checkout](https://git-scm.com/docs/git-sparse-checkout) feature.

### Rebuilding cbioportal Image and Pulling Images (3)
Within the `datahub` repo, there is a `cbioportal-docker-compose` directory (nested under `preview`) containing the infrastructure needed to deploy to Okteto. Specifically, there is:
* a `docker-compose.yml` file, nearly identical to the one within the `cbioportal-docker-compose` repo, apart from a minor syntax change
* a .env file specifying the locations to pull the images from during the deploy process
* an empty `study` directory where new studies will be copied to

Once the new studies have been determined, they are transferred to this `study` directory. The `cbioportal` image is then re-built and pushed to the Okteto registry using an `okteto build` command (pushing to the exact registry defined in the .env file for `cbioportal`). As part of this process, all the files within the `study` directory are mounted into the re-built cbioportal container, which is how the studies are made available to the Okteto instance.


### Deploy Containers to Preview Environment (4)
Okteto's `Deploy Preview` action checks out the datahub repository and builds the PR environment from a clean clone. This is the primary reason why the previous image build steps were necessary (to both determine which study to be imported, as well as prevent cloning every other study).

The built images (the pre-built `cbioportal-database` one from the setup, and the newly built `cbioportal`), which are specified in the `.env` file, are then pulled from the Okteto registry and used in the subsequent `Deploy Preview` action by Okteto. The `docker.compose.yml` is read by the `Deploy Preview` action, and the deploy process is started.

> :bulb: The GitHub Action for the deploy process actually finishes prematurely -- even after the service deploys on Okteto, it takes a while for the database to finish initializing. In between this time when the deploy action finishes, and the database is initializing, the public URL returns a 502 error. Therefore, an additional action step, **Wait For Response** is added to ping the URL until the service is fully functional, before proceeding.

One the deploy has fully finished, a message is posted on the PR with the public URL available to visit. At this step, a live staging instance is now deployed, and the studies are present in a `study` directory within the `cbioportal` container. However, the new studies have yet to be imported.

### Import Study (5)

The `kubectl` command is used to run a `metaImport.py` script (which is also located within the `cbioportal` container). This imports the data the database, at which point the data will then be visible on the staging instance. This is setup to be able to import multiple studies.

Note that currently, the setup initialization seeds the database with certain gene panel IDs. However, if a new study is imported containing IDs not already existing within the database, the `metaImport.py` script will fail.

![error](images/gene_panel_error.png)

As such, this step is setup to always pass within the workflow for now, until this issue can be addressed in a future update to the workflow.

> :bulb: To prevent users from missing the error, an additional action step, **Add PR Comment If Import Fails** is added, to notify users if this step fails.


### Teardown of Preview Environment (6 & 7)

Finally, when a PR is either closed or merged, a `preview_closed.yml` workflow is run, which automatically tears down the deployed environment. This is currently necessary due to cost saving measures, as well as the fact that on the free tier, we are limited to roughly 2-3 open Preview Environments at any point in time.
51 changes: 51 additions & 0 deletions docs/Preview_Setup_&_Maintenance.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,51 @@
# Okteto Developer Setup for cBioPortal Maintainers

This documentation serves as an outline for the exact work performed to setup the Okteto preview pipeline. If this were ever to be set up again from scratch, following these steps should replicate the existing infrastructure. For more insight as to why things were developed in this manner, refer to [Preview Overview](Preview_Overview.md).

## One time Setup (If re-building from scratch):

1. Okteto Account Creation - Visit [Here](https://www.okteto.com/try-free/) and link a GitHub account. Note that the account must be linked to a business email (i.e. non-gmail account).
2. Install the Okteto CLI and initialize the context as per instructions [here](https://www.okteto.com/docs/getting-started/#installing-okteto-cli).
3. Generate Okteto Personal Access Token (PAT) as per instructions [here](https://www.okteto.com/docs/cloud/personal-access-tokens/)
4. Clone the `cbioportal-docker-compose` repository and run the initialization script, `./init.sh` in the root directory.
5. Remove all instances of the `:ro` text within the `docker-compose.yml` file, as this is not supported by Okteto.
6. Run `okteto context use https://cloud.okteto.com`, to initialize the namespace Okteto will use.
7. Run `okteto build` to build and push an initial *cbioportal* and *cbioportal-database* image to the Okteto Registry with the initialized configuration present.

Within the Datahub repo:

8. Add new secret variable OKTETO_TOKEN using the PAT.
9. Add new secret variable OKTETO_URL (https://cloud.okteto.com).
10. In the repo settings > Code and automation > Actions > General > Workflow Permissions, enable workflow read-write permissions to allow Okteto to post messages in a PR.
11. Add new workflow file (`preview.yml`) from PR to .github folder of datahub repository.
12. Add the following files to setup infrastructure for re-building image in PR. Specifically:

a. Run `mkdir -p preview/cbioportal-docker-compose/study` to create the necessary directories

b. Add a `.gitkeep` file within the `study` directory to ensure git keeps track of an empty repository

c. Within the `preview/cbioportal-docker-compose` directory, add a modified `docker-compose.yml`, from the file in Step 5 (the file must have all instances of the string `:ro`, removed as Okteto does not support this syntax).

d. Within the `preview/cbioportal-docker-compose` directory, add a `.env` file containing the images to be built, with the link pointing to the Okteto registry:

DOCKER_IMAGE_CBIOPORTAL=registry.cloud.okteto.net/<namespace>/cbioportal-docker-compose-cbioportal:okteto-with-volume-mounts

DOCKER_IMAGE_MYSQL=registry.cloud.okteto.net/<namespace>/cbioportal-docker-compose-cbioportal-database:okteto-with-volume-mounts

Following this setup, subsequent PRs should correctly trigger a staging environment with the new studies imported.

## Maintenance of Current Setup

### Rebuilding the Image to Update cBioPortal Version
Once the workflow has been setup, the private registry has already been initialized. The .env file specifies the private registry, and as such, the deploy preview environment pulls from the previously built images. One thing to note however is that currently, the build version of cbioportal is locked to whatever the most up to date version is during the initial setup where the image was built and pushed from `cbioportal-docker-compose`. Therefore, there may come a point when the version of cBioPortal is too out of date and must be updated.

Essentially, steps 4-7 must be rerun to update the build image.

To simplify this process, a `preview_init.sh` script has been created to replicate this process, and should be executable provided Okteto and git has been correctly set up on the running machine.


### Changing of Namespace
If the Okteto account were ever to be migrated to a different namespace, it should be sufficient to simply re-run steps 4-7, and rename all instances of the namespace to the current desired user (e.g. justinjao) in the `preview.yml`, `preview_closed.yml`, and `.env` files.

### Note About Directory Structure
The namespace of the directory is important, as this is how the `okteto build` command determines what namespace to push the service to (i.e. if the directory is called `cbioportal-docker-compose`, it would push to `okteto.dev/cbioportal-docker-compose-cbioportal:okteto-with-volume-mounts`). This is why within Datahub, while it may not make much sense to have a directory named `cbioportal-docker-compose`, there exists such a specific directory nested under `preview`. The directory could have been changed, which would result in the build image tag name differing. However, the thinking here was that maintaining this namespeace would be easiest for maintenance, as this allows steps 4-7 to easily be automated into the `preview_init.sh` script to rebuild the image via a clean clone of the `cbioportal-docker-compose` repo, without any other change.
Loading