This repository contains the instructions to create the ARTIS High Performance Computer (HPC) on Amazon Web Services (AWS) and run the ARTIS model. There are two scenarios for using this repository:
- Setting up a new ARTIS HPC on AWS
- Likely able to follow instructions top to bottom
- Running an Existing ARTIS HPC Setup
- Likely able to skip to Setup Local Python Environment and skip Setting Up New ARTIS HPC on AWS if the previous model run was executed from the same local machine.
All commands will be run in the terminal/command line and indicated with a "$" before the command or contained within a code block
- Technologies Used
- Assumptions
- Installations
- Setup Local Python Environment
- AWS CLI Setup
- Update ARTIS Model Scripts and Model Inputs
- Setting Up New ARTIS HPC on AWS
- Running Existing ARTIS HPC Setup
- Combine ARTIS model outputs into CSVs
- Download Results, Clean Up AWS and Docker Environments
- Checks & Troubleshooting
- Create AWS IAM user
- Docker container
artis-image
details
Terraform
- Creates all the AWS infrastructure needed for the ARTIS HPC.
- Destroys all AWS infrastructure for the ARTIS HPC after the ARTIS model has finished to save on unnecessary costs.
Docker
- Creates a docker image that our HPC jobs will use to run the ARTIS model code.
Python
- Uses the Docker and AWS Python (boto3) clients to:
- Push all model inputs to AWS S3
- Build docker image needed to run ARTIS model
- Push docker image to AWS ECR
- Submit jobs to AWS Batch
- Download model outputs from AWS S3 bucket
- Uses the Docker and AWS Python (boto3) clients to:
R
- Main basis of the ARTIS model code
AWS
(Amazon Web Services)IAM
(Identity and Access Management)- Manages all AWS users and permissions
S3
(Simple Storage Service)- Stores all model inputs and outputs outside of the docker container run in the VPC (Virtual Private Cloud)
VPC
(Virtual Private Cloud)- Isolated section of the AWS cloud where all AWS resources are run. It is a virtual network dedicated to your AWS account.
ECR
(Elastic Container Registry)- Stores the docker image
artis-image
that contains the environment, code, and inputs needed to run the ARTIS model
- Stores the docker image
EC2
(Elastic Compute Cloud)- cloud-based service that provides scalable virtual servers (referred to as instances) in the cloud. It runs ARTIS on a virtual machine.
Batch
(Batch Computing Service)- Manages and schedules the execution of jobs. It handles the provisioning of the necessary compute resources (like EC2 instances), queues jobs, and dispatches them to the appropriate resources for execution.
CloudWatch
- Monitors and logs all AWS resources. Logs are used to troubleshoot failed jobs.
AWS Batch Based Architecture
Content from AWS Documentation
Basic workflow steps:
- User creates a job container (
artis-image
), uploads the container to the Amazon Elastic Container Registry, and creates a job definition to AWS Batch. - User submits jobs to a job queue in AWS Batch.
- AWS Batch pulls the image from the container registry and processes the jobs in the queue
- Input and output data from each job is stored in an S3 bucket (
artis-s3-bucket
)
- An AWS root user was created (To create an AWS root user visit )
- AWS root user has created an admin user group with "AdministratorAccess" permissions.
- AWS root user has created IAM users
- AWS root user has add IAM users to admin group
- AWS IAM users have their
AWS AWS_ACCESS_KEY
andAWS_SECRET_ACCESS_KEY
To create an AWS IAM user follow the instructions here: Create AWS IAM user
Note: If have you created ANY AWS RESOURCES for ARTIS manually, not including ROOT and IAM users, please delete these before continuing.
- Homebrew
- AWS CLI
- Terraform CLI
- Python Installation
- Python packages
- docker
- boto3
- Python packages
Note: If you already have Homebrew installed please still confirm by following step 3 below. Both instructions should run without an error message.
- Install homebrew - run$
/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"
- Close existing terminal window where installation command was run and open a new terminal window
- Confirm homebrew has been installed -
- Run $
brew --version
. No error message should appear.
- Run $
If after homebrew installation you get a message stating brew command not found
:
-
Edit zsh config file, run $
vim ~/.zshrc
-
Type
i
to enter edit mode -
Copy & paste this line into the file you opened:
export PATH=/opt/homebrew/bin:$PATH
- Press
Shift
and : - Type
wq
- Press
Enter
- Source new config file, run $
source ~/.zshrc
Following instructions from AWS
Note: If you already have AWS CLI installed please still confirm by following step 3 below. Both instructions should run without an error message.
The following instructions are for MacOS users:
- Run $
curl "https://awscli.amazonaws.com/AWSCLIV2.pkg" -o "AWSCLIV2.pkg"
- Run $
sudo installer -pkg AWSCLIV2.pkg -target /
- Confirm AWS CLI has been installed:
- Run $
which aws
- Run $
aws --version
- Run $
Note: If you already have homebrew installed please confirm by running $brew --version
, no error message should occur.
To install terraform on MacOS we will be using homebrew. If you do not have homebrew installed on your computer please follow the installation instructions here, before continuing.
Based on Terraform CLI installation instructions provided here.
- Run $
brew tap hashicorp/tap
- Run $
brew install hashicorp/tap/terraform
- Run $
brew update
- Run $
brew upgrade hashicorp/tap/terraform
If this has been unsuccessful you might need to install xcode command line tools, try:
- Run terminal command:
sudo xcode-select --install
- install python3 on MacOS: Run $
brew install python3
- check python3 has been installed: Run $
python3 --version
- install pip (package installer for python): Run $
sudo easy_install pip
To prepare for running the ARTIS model on AWS we need to create a virtual environment to run the python scripts in. Note: Please make sure that your terminal is currently in the correct working directory for this project (should end in .../.../artis-hpc
)
- Run $
pwd
to confirm you are in the correct working directory - Run $
python3 -m venv venv
to create a virtual environment - Run $
source venv/bin/activate
to open virtual environment - Run $
pip3 install -r requirements.txt
to install all required python modules - Run $
pip3 list
to check that all python modules have been downloaded. Check that all modules in therequirements.txt
file are included.
If an error occurs please follow these instructions:
- Upgrade your version of pip, Run $
pip install --upgrade pip
- Install all required python modules, Run $
pip3 install -r requirements.txt
- If errors still occur install each python package in the
requirements.txt
file individually, Run $pip3 install [PACKAGE NAME]
ie $pip3 install urllib3
.
- Run $
export AWS_ACCESS_KEY=[YOUR_AWS_ACCESS_KEY]
- sets terminal environmental variable. Replace
[YOUR_AWS_ACCESS_KEY]
with your value
- sets terminal environmental variable. Replace
- Run $
export AWS_SECRET_ACCESS_KEY=[YOUR_AWS_SECRET_ACCESS_KEY]
- sets terminal environmental variable. Replace
[AWS_SECRET_ACCESS_KEY]
with your value
- sets terminal environmental variable. Replace
- Run $
export AWS_REGION=us-east-1
- sets terminal environmental variable
- Run $
aws configure set aws_access_key_id $AWS_ACCESS_KEY
- writes value to AWS credentials file (
~/.aws/credentials
)
- writes value to AWS credentials file (
- Run $
aws configure set aws_secret_access_key $AWS_SECRET_ACCESS_KEY
- writes value to AWS credentials file (
~/.aws/credentials
)
- writes value to AWS credentials file (
- Run $
aws configure set region $AWS_REGION
- writes value to AWS config file (
~/.aws/config
)
- writes value to AWS config file (
To check set values:
Run $echo $AWS_ACCESS_KEY
to display the local environmental variable value set with the export
command. Replace the variable name to check other values
Likewise, Run $aws configure get aws_access_key_id
to print aws environment variable values stored in the AWS credentials file. Replace the variable name to check other values.
You will need to transfer ARTIS code and input data (likely from Seafood-Globalization-Lab/artis-model repo) to your local artis-hpc
project directory:
- Copy
00-aws-hpc-setup.R
,02-artis-pipeline.R
, and03-combine-tables.R
scripts toartis-hpc/data_s3_upload/ARTIS_model_code/
- Run $
export HS_VERSIONS="[HS VERSIONS YOU ARE RUNNING, NO SPACES]"
to specify which HS versions to run- i.e. $
export HS_VERSIONS="02,07,12,17,96"
or $export HS_VERSIONS="17"
- i.e. $
- Run $
./create_pipeline_versions.sh
to create a new version of02-artis-pipeline.R
and00-aws-hpc-setup.R
for every HS version specified to run inHS_VERSIONS
inartis-hpc/data_s3_upload/ARTIS_model_code/
- Copy the most up-to-date set of
model_inputs
toartis-hpc/data_s3_upload/
directory. Retain the folder namemodel_inputs
- Copy the most up-to-date ARTIS
R/
package folder toartis-hpc/data_s3_upload/ARTIS_model_code/
- Copy the most up-to-date ARTIS R package
NAMESPACE
file toartis-hpc/data_s3_upload/ARTIS_model_code/
- Copy the most up-to-date ARTIS R package
DESCRIPTION
file toartis-hpc/data_s3_upload/ARTIS_model_code/
- Copy the most up-to-date .Renviron file to
artis-hpc/data_s3_upload/ARTIS_model_code/
(-AM is this needed?)
If running on a new Apple chip arm64:
- Copy
arm64_venv_requirements.txt
file from the root directory to theartis-hpc/docker_image_files_original/
- Rename the file
artis-hpc/docker_image_files_original/arm64_venv_requirements.txt
toartis-hpc/docker_image_files_original/requirements.txt
The initial_setup.py
script automates the setup process for running the ARTIS model on AWS. It begins by configuring the environment based on the specified chip architecture, copying the appropriate Dockerfile
to the project root, and embedding AWS credentials. The script then creates the necessary AWS infrastructure using Terraform, uploads model input files to the specified S3 bucket, and builds a Docker image using files from the docker_image_files_original/
directory. This image is uploaded to the AWS ECR repository. Finally, the script submits jobs to AWS Batch for model execution. In case of an error, the script automatically cleans up all created AWS resources.
Anytime there are edits or changes to the ARTIS model codebase there is no need to recreate the docker image, skip to Running an Existing ARTIS HPC Setup
- Open Docker Desktop
- Take note of any existing docker images and containers relating to other projects and
- Delete all docker containers relating to ARTIS,
- Delete all docker images relating to ARTIS.
- Create AWS infrastructure, upload model inputs, and create new ARTIS docker image, Run:
python3 initial_setup.py -chip [YOUR CHIP INFRASTRUCTURE] -aws_access_key [YOUR AWS KEY] -aws_secret_key [YOUR AWS SECRET KEY] -s3 artis-s3-bucket -ecr artis-image
Details:
- If you are using an Apple Silicone chip (M1, M2, M3, etc) your chip will be
arm64
, otherwise for intel chips it will bex86
- If you have an existing docker image you would like to use include the
-di [existing docker image name]
flag. - Recommendation: the default options will create a docker image called
artis-image
, so if you want to use the previously created default docker image you would include-di artis-image
. - Note: The AWS docker image repository and the docker image created with default options both have the name
artis-image
, however they are two different resources.
Example command:
- Using credentials stored in local environmental variables set above
- Using existing docker image
artis-image
with thelatest
tag)
python3 initial_setup.py -chip arm64 -aws_access_key $AWS_ACCESS_KEY -aws_secret_key $AWS_SECRET_ACCESS_KEY -s3 artis-s3-bucket -ecr artis-image -di artis-image:latest
Troubleshooting Tip: If terraform states that it created all resources however when you log into the AWS console to confirm cannot see them, they have most likely been created as part of another account. Run terraform destroy -auto-approve
on the command line. Confirmed you have followed the AWS CLI set up instructions with the correct set of keys (AWS access key and AWS secret access key).
A successful and complete model run will then proceed to the next step Combine ARTIS model outputs into CSVs
Note: These instructions are only applicable if:
- all AWS infrastructure is created
- docker image
artis-image
is built and - the only changes are to files within
model_inputs/*
orARTIS_model_code/*
.
- Log onto AWS to check s3 bucket
artis-s3-bucket
for contents that need to be saved. Then permanently delete all contents of theartis-s3-bucket
bucket. - Make sure to put all new scripts or model inputs into the relevant
artis-hpc/data_s3_upload/
folders locally. - Run: $
source venv/bin/activate
to open python environment (make sure proper requirements were installed here) - Run: $
python3 s3_upload.py
to upload local contents ofdata_s3_upload/
to AWS S3 bucketartis-s3-bucket
- Run: $
python3 submit_artis_jobs.py
to submit batch jobs on AWS.- Loops through designated
HS_VERSIONS
to run corresponding shell scripts to sourcedocker_image_artis_pkg_download.R
and02-artis-pipeline_[hs version].R
- Loops through designated
A successful and complete model run will then proceed to the next step Combine ARTIS model outputs into CSVs
- Run $
python3 submit_combine_tables_job.py
-
Run $
python3 s3_download.py
to downloadartis-s3-bucket
contents to localartis-hpc/outputs_[RUN YYYY-MM-DD]
directory -
Open Docker Desktop app, 4. Delete all containers created 5. Optional to delete ARTIS images - can retain if planing to run the model again
-
Run $
terraform destroy
to destroy all AWS resources and dependencies created- This is important to not be charged money for maintaining idle resources and storing GBs of data in S3 buckets
-
Run $
deactivate
to close python environment
- navigate to AWS in your browser and log in to your IAM account.
- Use the search bar at the top of the page to search for "batch" and click on the service Batch result.
- Under "job queue overview" you will be able to see job statuses and click on number to open details.
- Investigate individual job status and details through filters (be sure to click "search")
- Set "Filter type" to "Status" and "Filter value" to "FAILED" in AWS Batch > Jobs window above. Click "Search" button.
- Identify and open relevant failed job by clicking on job name.
- Inspect "Details" for Failed job, "Status Reason" is particularly helpful.
- Click on "Log stream name" to open CloudWatch logs for the specific job. This displays the code output and error messages.
- Note: The "AWS Batch > Jobs > your-job-name" image below shows a common error message
ResourceInitializationError: unable to pull secrets or registry auth: [...]
when there is an issue initializing the resources required by the AWS Batch job. This is most likely a temporary network issue and can be resolved by re-running the specific job (HS version).
Note: The image above shows a common error message when the model code is unable to find the correct file path.
- Search for "cloudwatch" in search bar and click on the Service CloudWatch
- In the left hand nav-bar click on "Logs"" then "Log groups" and next "/aws/batch/job"
- Inspect "log streams" for (likely by "last event time") to identify and open correct log.
- Inspect messages, output, and errors from running the model code
- Navigate to the
artis-s3-bucket
in AWS S3. - Confirm that all expected outputs are present for submitting ARTIS model jobs.
- The
outputs
folder should contain asnet/
subfolder that has each HS version specified in theHS_VERSIONS
variable - Each HS version folder should contain the applicable years
The expected directory structure is as follows:
aws/artis-s3-bucket/
└── outputs/
├── cvxopt_snet/
│ ├── HS[VERSION]/
│ │ ├── [YEAR]/
│ │ │ ├── [RUN YYYY-MM-DD]_all-country-est_[YEAR]_HS[VERSION].RDS
│ │ │ ├── [RUN YYYY-MM-DD]_all-data-prior-to-solve-country_[YEAR]_HS[VERSION].RData # Might be only file for some year folders depending if all countries solved by quadprog
│ │ │ ├── [RUN YYYY-MM-DD]_analysis-documentation_countries-with-no-solve-qp-solution_[YEAR]_HS[VERSION].txt
│ │ │ ├── [RUN YYYY-MM-DD]_country-est_[COUNTRY ISO3C]_[YEAR]_HS[VERSION].RDS
│ │ │ └── ...
│ │ └── [YEAR]/
│ │ └── ...
│ └── HS[VERSION]/
│ └── ...
├── quadprog_snet/
│ ├── HS[VERSION]/
│ │ ├── [YEAR]/
│ │ │ └── [RUN YYYY-MM-DD]_all-country-est_[YEAR]_HS[VERSION].RDS
│ │ │ ├── [RUN YYYY-MM-DD]_all-data-prior-to-solve-country_[YEAR]_HS[VERSION].RData
│ │ │ ├── [RUN YYYY-MM-DD]_analysis-documentation_countries-with-no-solve-qp-solution_[YEAR]_HS[VERSION].txt
│ │ │ ├── [RUN YYYY-MM-DD]_country-est_[COUNTRY ISO3C]_[YEAR]_HS[VERSION].RDS
│ │ │ └── ...
│ │ ├── [YEAR]/
│ │ │ └── ...
│ │ └── no_solve_countries.csv # Key file to check
│ └── HS[VERSION]/
│ └── ...
├── snet/
│ ├── HS[VERSION]/
│ │ └── [YEAR]/
│ │ ├── [RUN YYYY-MM-DD]_S-net_raw_midpoint_[YEAR]_HS[VERSION].csv
│ │ ├── [RUN YYYY-MM-DD]_all-country-est_[YEAR]_HS[VERSION].RDS
│ │ ├── [RUN YYYY-MM-DD]_consumption_[YEAR]_HS[VERSION].csv
│ │ ├── W_long_[YEAR]_HS[VERSION].csv
│ │ ├── X_long.csv
│ │ ├── first_dom_exp_midpoint.csv
│ │ ├── first_error_exp_midpoint.csv
│ │ ├── first_foreign_exp_midpoint.csv
│ │ ├── first_unresolved_foreign_exp_midpoint.csv
│ │ ├── hs_clade_match.csv
│ │ ├── reweight_W_long_[YEAR]_HS[VERSION].csv
│ │ ├── reweight_X_long_[YEAR]_HS[VERSION].csv
│ │ ├── second_dom_exp_midpoint.csv
│ │ ├── second_error_exp_midpoint.csv
│ │ ├── second_foreign_exp_midpoint.csv
│ │ └── second_unresolved_foreign_exp_midpoint.csv
│ ├── [YEAR]/
│ │ └── ...
│ ├── [YEAR]/
│ │ └── ...
│ ├── V1_long_HS[VERSION].csv
│ └── V2_long_HS[VERSION].csv
After submitting submit_combine_tables_job
to AWS batch, the artis-s3-bucket
should contain the additional following files:
aws/artis-s3-bucket/
└── outputs/
├── cvxopt_snet/
│ └── ...
├── quadprog_snet/
│ └── ...
├── snet/
│ └── ...
└── artis_outputs/
├── consumption_midpoint_all_hs_all_years.csv
└── snet_midpoint_all_hs_all_years.csv
# "midpoint"" is estimate type and will change if different estimate type is used
FIXIT: include screenshots for creating an IAM user with the correct admin permissions.
- Create an AWS root user
- Create an IAM user group
- Create an IAM user
- Sign in as IAM user
- Get IAM user access key
Save the access key and secret key in a secure location (i.e. password manager)
Once the docker image artis-image
has been uploaded to AWS ECR, the docker container artis-image
will need to import all R scripts and model inputs from the artis-s3-bucket
on AWS. Once $python3 submit_artis_jobs.py
is run, a new job on AWS Batch will run ARTIS on a new instance of the docker container for each HS version specified within each job. Each docker instance will only import the scripts and model inputs for the HS version and years it is running from artis-s3-bucket
(occurs when docker_image_artis_pkg_download.R
is sourced in job_shell_scripts/
).
Example directory structure within artis-image
:
/home/ec2-user/artis/
│
├── clean_fao_prod.csv
├── clean_fao_taxa.csv
├── clean_sau_prod.csv
├── clean_sau_taxa.csv
├── clean_taxa combined.csv
├── code_max_resolved.csv
├── fao_annual_pop.csv
├── hs-hs-match_HS[VERSION].csv (one file per each HS version)
├── hs-taxa-CF_strict-match_HS[VERSION].csv
├── hs-taxa-match_HS[VERSION].csv
├── standardized_baci_seafood_hs[VERSION]_y[YEAR]_including_value.csv (one file per HS version/year combination)
├── standardized_baci_seafood_hs[VERSION]_y[YEAR].csv (one file per HS version/year combination)
├── standardized_combined_prod.csv
├── standardized_fao_prod.csv
├── standardized_sau_taxa.csv
│
│(Files pulled from `ARTIS_model_code/` in `artis-s3-bucket`. Folder not retained)
├── 00-aws-hpc-setup_hs[VERSION].R
├── 02-artis-pipeline_hs[VERSION].R
├── 03-combine-tables.R
├── NAMESPACE
├── DESCRIPTION
└── R/
├── build_artis_data.R
├── calculate_consumption.R
├── categorize_hs_to_taxa.R
├── classify_prod_dat.R
├── clean_fb_slb_synonyms.R
├── clean_hs.R
├── collect_data.R
├── compile_cf.R
├── create_export_source_weights.R
├── create_reweight_W_long.R
├── create_reweight_X_long.R
├── create_snet.R
└── (Add all files)