Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ORCA-318] Add instructions on setting up env on EC2 #42

Merged
merged 1 commit into from
Apr 19, 2024
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 3 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@ This Python package provides the components to connect various third-party servi

## Demonstration Script

This repository includes a demonstration script called [`demo.py`](demo.py), which showcases how you can use `py-orca` to launch and monitor your workflows on Nextflow Tower. Specifically, it illustrates how to process an RNA-seq dataset using a series of workflow runs, namely `nf-synapse/synstage`, `nf-core/rnaseq`, and `nf-synindex`. `py-orca` can be used with any Python-compatible workflow management system to orchestrate each step (_e.g._ Airflow, Prefect, Dagster). The demonstration script uses [Metaflow](https://metaflow.org/) because it's easy to run locally and has an intuitive syntax.
This repository includes a demonstration script called [`demo.py`](demo.py), which showcases how you can use `py-orca` to launch and monitor your workflows on Nextflow Tower. Specifically, it illustrates how to process an RNA-seq dataset using a series of workflow runs, namely `nf-synapse/synstage`, `nf-core/rnaseq`, and `nf-synapse/synindex`. `py-orca` can be used with any Python-compatible workflow management system to orchestrate each step (_e.g._ Airflow, Prefect, Dagster). The demonstration script uses [Metaflow](https://metaflow.org/) because it's easy to run locally and has an intuitive syntax.

The script assumes that the following environment variables are set. Before setting them up, ensure that you have an AWS profile configured for a role that has access to the dev/ops tower workspace you plan to launch your workflows from. You can set these environment variables using whatever method you prefer (_e.g._ using an `.env` file, sourcing a shell script, etc).
Refer to [`.env.example`](.env.example) for the format of their values as well as examples.
Expand All @@ -26,7 +26,7 @@ Once your environment variables are set, you can create a virtual environment, i

### Creating and setting up your py-`orca` virtual environment and executing `demo.py`

Below are the instructions for creating and setting up your virtual environment and executing the `demo.py`. If you would like to set up a developer environment with the relevant dependencies, you can execute the shell script [dev_setup](https://github.com/Sage-Bionetworks-Workflows/py-orca/blob/main/dev_setup.sh) in a clone of this repository stored on your machine.
Below are the instructions for creating and setting up your virtual environment and executing the `demo.py`. You can also check the tutorial [here](https://sagebionetworks.jira.com/wiki/spaces/IBC/pages/3018489902/py-orca+Getting+Started). If you would like to set up a developer environment with the relevant dependencies, you can execute the shell script [dev_setup](https://github.com/Sage-Bionetworks-Workflows/py-orca/blob/main/dev_setup.sh) in a clone of this repository stored on your machine. You can run it either on your local or on the EC2 instance. Establishing a development environment on an EC2 instance could encounter hurdles. You might need to install Python build dependencies before using [pyenv](https://github.com/pyenv/pyenv/wiki#suggested-build-environment) to manage Python versions. You can refer to this [doc](https://github.com/pyenv/pyenv/wiki#suggested-build-environment:~:text=devel%20xz%2Ddevel-,Amazon%20Linux%202%3A,-yum%20install%20gcc) to resolve the dependency issue. The `openssl11-devel` is not available on `EC2: Linux Docker v1.3.9` so you can install `openssl-devel` instead. Moreover, you might run into missing GCC error, you can install GCC usng `sudo yum install gcc`.
```bash
# Create and activate a Python virtual environment (tested with Python 3.10)
python3 -m venv venv/
Expand All @@ -38,7 +38,7 @@ python3 -m pip install 'py-orca[all]' 'metaflow' 'pyyaml' 's3fs'

Before running the example below, ensure that the `s3_prefix` points to an S3 bucket your Nextflow `dev`
or `prod` tower workspace has access to. In the example below, we will point to the `example-dev-project-tower-scratch` S3 bucket because we will be launching our workflows within the
`example-dev-project` workspace in `tower-dev`.
`example-dev-project` workspace in `tower-dev`. In this case, you can use either of the `workflows-nextflow-dev` profiles to access the S3 bucket.
```bash
# Run the script using an example dataset
python3 demo.py run --dataset_id 'syn51514585' --s3_prefix 's3://example-dev-project-tower-scratch/work'
Expand Down
Loading