Skip to content

Deployment how tos

Sundareswar Pullela edited this page Sep 30, 2024 · 38 revisions

To start our Plover instance when ITRB's Plover is down

(Note: This example is for KG2.10.0, but the steps should be analogous for future KG2 versions.)

Start the kg2cplover.rtx.ai ec2 instance and run the following:

ssh [email protected]
cd PloverDB/
sudo docker start plovercontainer2.10.0

If the above gave some sort of error, instead try this:

sudo docker stop plovercontainer2.10.0
sudo docker rm plovercontainer2.10.0
sudo docker run -d --name plovercontainer2.10.0 -p 9990:80 ploverimage2.10.0

Wait about 5 minutes for the indexes to finish loading. You can check the logs with:

sudo docker logs plovercontainer2.10.0

When it's ready the last few lines of the log should look something like this:

2022-03-02 00:25:58,807 INFO: Indexes are fully loaded! Took 5.27 minutes.
WSGI app 0 (mountpoint='') ready in 317 seconds on interpreter 0x pid: 11 (default app)
*** uWSGI is running in multiple interpreter mode ***
spawned uWSGI master process (pid: 11)
spawned uWSGI worker 1 (pid: 14, cores: 1)
spawned uWSGI worker 2 (pid: 15, cores: 1)
running "unix_signal:15 gracefully_kill_them_all" (master-start)...

While you're waiting for the above command to finish, you can point the ARAX code to this Plover instead of the ITRB Plover:

  1. Change all three Plover URLs in the "plover" slot of RTX/code/config_kg2c.json to this Plover endpoint (https://kg2cplover.rtx.ai:9990)
  2. Push that change to master
  3. Roll master out to the /kg2 and /kg2beta endpoints on arax.ncats.io

At this point, once Plover has finished loading indexes, /kg2 and /kg2beta should be running normally again.

To build Plover from a new KG2 version

(Note: This example is for KG2.10.0, but the steps should be analogous for future KG2 versions.)

Create a new branch in the PloverDB repo for this KG2 version - we'll name ours kg2.10.0c for this example:

git checkout -b kg2.10.0c

Copy the new KG2c lite JSON file into the webroot directory on kg2webhost.rtx.ai

  1. Log into kg2webhost.rtx.ai using ssh:

(if you have not done this before, someone with ssh access to that instance will need to add your ssh public key to the authorized_keys file for the user ubuntu on that system, before you can ssh in).

  1. Copy the KG2c file, and in the process rename it with the new version number, into the webroot directory:
aws s3 cp s3://rtx-kg2/kg2c_lite.json.gz ./nginx-document-root/kg2c_lite_2.10.0.json.gz

where "2.10.0" represents the new KG2c version number that you are aiming to deploy in PloverDB.

  1. From your laptop, do a test download of KG2c from kg2webhost.rtx.ai
curl https://kg2webhost.rtx.ai/kg2c_lite_2.10.0.json.gz -o kg2c_lite_2.10.0.json.gz
gunzip --list kg2c_lite_2.10.0.json.gz

if you get an error, check to see if your gzipped file maybe contains HTML from a 404 error.

About kg2webhost.rtx.ai

The kg2webhost.rtx.ai system is a t2.micro instance (Ubuntu 20.04 AMI) running in the us-east-1 AWS region (Virginia), with 200 GiB of EBS storage. Currently, Amy, Sundar, and Steve have RSA public keys installed to be able to ssh in. It has nginx installed and the DocumentRoot directory is /var/www/kg2webhost (owned by user ubuntu). AWS CLI is installed and configured for user ubuntu in the directory /home/ubuntu/venv/bin/aws (note, that AWS CLI installation is configured to have default region of us-west-2, since that is where our main KG2 S3 bucket is located). Nginx is configured with HTTPS in this instance, with the SSL certificate being managed by certbot. The crontab for renewing the cert is located in /etc/cron.d/certbot. Currently, this nginx webserver is only used for hosting kg2c_lite_2.X.X.json.gz so that the PloverDB plover.py module can curl in the file at app start-up.

Update the default KG2 version in our new branch:

  1. In app/config_kg2c.json, change

    • "nodes_file": "https://kg2webhost.rtx.ai/kg2c-2.10.0-v1.0-nodes.jsonl.gz" to "https://kg2webhost.rtx.ai/kg2c-2.10.1-v1.0-nodes.jsonl.gz", or whatever exactly the new KG2c JSON file is called in the new KG2c lite JSON file that you are hosting in kg2webhost.rtx.ai.
    • "edges_file": "https://kg2webhost.rtx.ai/kg2c-2.10.0-v1.0-edges.jsonl.gz" to "https://kg2webhost.rtx.ai/kg2c-2.10.1-v1.0-edges.jsonl.gz", or whatever exactly the new KG2c JSON file is called in the new KG2c lite JSON file that you are hosting in kg2webhost.rtx.ai.
  2. Commit and push this change to your branch

  3. Make any other Plover code changes that this new KG2 version necessitates in your branch (usually only needed if KG2's core schema changed)

Then pick an EC2 instance to serve this new Plover from. Generally we use kg2cplover.rtx.ai, but if that instance is already serving a different version of Plover that needs to remain live (i.e., it is being called by one of the RTX-KG2 instances), then you can use kg2cplover2.rtx.ai. Note that you can tell which Plover a given KG2 instance is using by running a query in that KG2 UI and looking at the DEBUG log messages:

Screen Shot 2023-06-14 at 10 34 50 AM

We usually deploy an updated PloverDB into one of our team's self-hosted PloverDB instances in EC2 first, and then once we have a completely working RTX-KG2 KP and ARAX based on the updated (self-hosted) PloverDB, we will eventually merge the updated PloverDB code into master which will trigger deployment into ITRB CI.

Start the Plover EC2 instance (this example uses kg2cplover.rtx.ai) and run the following (with your branch name/version number subbed in):

ssh [email protected]
cd PloverDB/
git fetch
git checkout kg2.10.0c
screen
bash -x run.sh ploverimage2.10.0 plovercontainer2.10.0 "sudo docker"

The build should take around 50 minutes to finish.

After it's done, verify the new Plover service is working by running the test suite against it. From your own machine (assuming you have cloned the PloverDB repo and done pip install -r requirements.txt):

cd PloverDB/
pytest -v test/test_kg2c.py --endpoint https://kg2cplover.rtx.ai:9990

(NOTE: If you loaded Plover onto the kg2cplover2.rtx.ai instance, use this endpoint URL instead: http://kg2cplover2.rtx.ai:9990)

Note that sometimes tests need to be updated due to changes in the new KG2 version, though the majority of tests should pass. For any failing tests, ensure they're failing due to expected topological changes in the new KG2 version; if so, tweak them to get them passing again (via adjusting pinned curies, predicates, or whatever makes sense).

When we're ready for the ITRB CI Plover instance to be running this new KG2 version, merge your branch into main. This should automatically deploy to the ITRB CI Plover (allow about an hour for it to rebuild). Ping Kanna and/or Pouyan in Slack to update the ITRB Test and Prod Plovers.