Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

flepimop.org documentation updates for information pertaining to new users #460

Merged
merged 11 commits into from
Feb 7, 2025
2 changes: 1 addition & 1 deletion batch/hpc_init.sh
Original file line number Diff line number Diff line change
Expand Up @@ -112,7 +112,7 @@ read FLEPI_RUN_INDEX
cat << EOM
> The HPC init script has successfully finished.

If you are testing if this worked, say installing for the first time, you can use the inference example from the \`flepimop_sample\` repository:
If you are testing if this worked, say installing for the first time, you can use the inference example from the \`flepiMoP/examples/tutorials\` directory:
\`\`\`bash
cd \$PROJECT_PATH
flepimop-inference-main -c \$CONFIG_PATH -j 1 -n 1 -k 1
Expand Down
2 changes: 1 addition & 1 deletion documentation/gitbook/gempyor/output-files.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@ These files contain the values of the variables for both the infection and (if i

Within the `model_output` directory in the project's directory, the files will be organized into folders named for the file types: `seir`, `spar`, `snpi`, `hpar`, `hnpi`, `seed`, `init`, or `llik` (see descriptions below). Within each file type folder, files will further be organized by the simulation name (`setup_name` in config), the modifier scenario names - if scenarios exist for either `seir` or `outcome` parameters (specified with `seir_modifiers::scenarios` and `outcome_modifiers::scenarios` in config), and the `run_id` (the date and time of the simulation, by default). For example:

<pre><code><strong>flepimop_sample
<pre><code><strong>flepiMoP/examples/tutorials
</strong>├── model_output
│   ├── {setup_name}_{seir_modifier_scenario}_{outcome_modifier_scenario}
│   │   └── run_id
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -6,28 +6,23 @@ description: Short tutorial on running locally using an "Anaconda" environment.

### Access model files

As is the case for any run, first see the [Before any run](../before-any-run.md) section to ensure you have access to the correct files needed to run. On your local machine, determine the file paths to:
Follow all the steps in the [Before any run](before-any-run.md) section to ensure you have access to the correct files needed to run your model with flepiMoP.

* the directory containing the flepimop code (likely the folder you cloned from Github), which we'll call `FLEPI_PATH`
* the directory containing your project code including input configuration file and population structure (again likely from Github), which we'll call `DATA_PATH`
Take note of the location of the directory on your local computer where you cloned the flepiMoP model code (which we'll call `FLEPI_PATH`).
emprzy marked this conversation as resolved.
Show resolved Hide resolved

{% hint style="info" %}
For example, if you clone your Github repositories into a local folder called Github and are using the flepimop\_sample as a project repository, your directory names could be\
For example, if you cloned your Github repositories into a local folder called `Github` and are using `flepiMoP/examples/tutorials` as a project repository, your directory names could be\
\
_**On Mac:**_

\<dir1> = /Users/YourName/Github/flepiMoP
/Users/YourName/Github/flepiMoP

\<dir2> = /Users/YourName/Github/flepimop\_sample\
/Users/YourName/Github/fleiMoP/examples/tutorials
\
_**On Windows:**_\
\<dir1> = C:\Users\YourName\Github\flepiMoP
C:\Users\YourName\Github\flepiMoP

\<dir2> = C:\Users\YourName\Github\flepimop\_sample\\

(hint: if you navigate to a directory like `C:\Users\YourName\Github` using `cd C:\Users\YourName\Github`, modify the above `<dir1>` paths to be `.\flepiMoP` and `.\flepimop_sample)`

:warning: Note again that these are best cloned **flat.**
C:\Users\YourName\Github\flepiMoP\examples\tutorials
{% endhint %}

## 🧱 Setup (do this once)
Expand Down Expand Up @@ -80,62 +75,45 @@ In this `conda` environment, commands with R and python will uses this environme

### Define environment variables

First, you'll need to fill in some variables that are used by the model. This can be done in a script (an example is provided at the end of this page). For your first time, it's better to run each command individually to be sure it exits successfully.
Since you'll be navigating frequently between the folder that contains your project code and the folder that contains the core flepiMoP model code, it's helpful to define shortcuts for these file paths. You can do this by creating environmental variables that you can then quickly call instead of writing out the whole file path.

First, in `myparentfolder` populate the folder name variables for the paths to the flepimop code folder and the project folder:
If you're on a **Mac** or Linux/Unix based operating system, define the FLEPI\_PATH and PROJECT\_PATH environmental variables to be your directory locations, for example

```bash
export FLEPI_PATH=$(pwd)/flepiMoP
export DATA_PATH=$(pwd)/flepimop_sample
export FLEPI_PATH=/Users/YourName/Github/flepiMoP
export PROJECT_PATH=/Users/YourName/Github/flepiMoP/examples/tutorials
```
emprzy marked this conversation as resolved.
Show resolved Hide resolved

Go into the code directory (making sure it is up to date on your favorite branch) and do the installation required of the repository:
or, if you have already navigated to your flepiMoP directory

```bash
cd $FLEPI_PATH # move to the flepimop directory
Rscript build/local_install.R # Install R packages
pip install --no-deps -e flepimop/gempyor_pkg/ # Install Python package gempyor
export FLEPI_PATH=$(pwd)
export PROJECT_PATH=$(pwd)/examples/tutorials
```

Each installation step may take a few minutes to run.

{% hint style="info" %}
Note: These installations take place in your conda environment and not the local operating system. They must be made once while in your environment and need not be done for every time you run a model, provided they have been installed once. You will need an active internet connection for installing the R packages (since some are hosted online), but not for other steps of running the model.
{% endhint %}

<details>

<summary>Help! I have errors in installation</summary>

If you get an error because no cran mirror is selected, just create in your home directory a `.Rprofile` file:
You can check that the variables have been set by either typing `env` to see all defined environmental variables, or typing `echo $FLEPI_PATH` to see the value of `FLEPI_PATH`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

probably shouldn't suggest people env - going to spam them confusingly.


{% code title="~/.Rprofile" lineNumbers="true" %}
```r
local({r <- getOption("repos")
r["CRAN"] <- "http://cran.r-project.org"
options(repos=r)
})
```
{% endcode %}
If you're on a **Windows** machine

Perhaps this should be added to the top of the local\_install.R script #todo
<pre class="language-bash"><code class="lang-bash"><strong>set FLEPI_PATH=C:\Users\YourName\Github\flepiMoP
</strong>set PROJECT_PATH=C:\Users\YourName\Github\flepiMoP\examples\tutorials
</code></pre>
emprzy marked this conversation as resolved.
Show resolved Hide resolved

When running `local_install.R` the first time, you may get an error:
or, if you have already navigated to your flepiMoP directory

<pre><code><strong>ERROR: dependency ‘report.generation’ is not available for package ‘inference’
</strong><strong>[...]
</strong><strong>installation of package ‘./R/pkgs//inference’ had non-zero exit status
</strong></code></pre>
<pre class="language-bash"><code class="lang-bash"><strong>set FLEPI_PATH=%CD%
</strong>set PROJECT_PATH=%CD%\examples\tutorials
</code></pre>

and the second time it'll finish successfully (no non-zero exit status at the end). That's because there is a circular dependency in this file (inference requires report.generation which is built after) and will hopefully get fixed.
You can check that the variables have been set by either typing `set` to see all defined environmental variables, or typing `echo $FLEPI_PATH$` to see the value of `FLEPI_PATH`.

For subsequent runs, once is enough because the package is already installed once.

</details>
{% hint style="info" %}
If you choose not to define environment variables, remember to use the full or relative path names for navigating to the right files or folders in future steps.
{% endhint %}

Other environmental variables can be set at any point in process of setting up your model run. These options are listed in ... ADD ENVAR PAGE
Other environmental variables can be set at any point in process of setting up your model run. These options are listed in ... **ADD ENVAR PAGE**
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't mind a live TODO, but it should be in a comment, rather than actively displayed. Probably a better version is to convert this to an item on an explicit issue, e.g. the new environmental variables one. Basically an explicit note "in file XYZ, section ABC, ..."


For example, some frequently used environmental variables which we recommend setting are:
For example, some frequently used environmental variables we recommend setting are:
emprzy marked this conversation as resolved.
Show resolved Hide resolved

{% code overflow="wrap" %}
```bash
Expand All @@ -153,19 +131,19 @@ The next step depends on what sort of simulation you want to run: One that inclu
In either case, navigate to the project folder and make sure to delete any old model output files that are there.

```bash
cd $DATA_PATH # goes to your project repository
cd $PROJECT_PATH # goes to your project repository
rm -r model_output/ # delete the outputs of past run if there are
```

#### Inference run

An inference run requires a configuration file that has an `inference` section. Stay in the `$DATA_PATH` folder, and run the inference script, providing the name of the configuration file you want to run (ex. `config.yml`). In the example data folder (flepimop\_sample), try out the example config XXX.
An inference run requires a configuration file that has an `inference` section. Stay in the `$PROJECT_PATH` folder, and run the inference script, providing the name of the configuration file you want to run (ex. `config.yml`).
emprzy marked this conversation as resolved.
Show resolved Hide resolved

```bash
flepimop-inference-main.R -c config.yml
```

This will run the model and create [a lot of output files](../../gempyor/output-files.md) in `$DATA_PATH/model_output/`.
This will run the model and create [a lot of output files](../../gempyor/output-files.md) in `$PROJECT_PATH/model_output/`.

The last few lines visible on the command prompt should be:

Expand All @@ -191,7 +169,7 @@ where:

#### Non-inference run

Stay in the `$DATA_PATH` folder, and run a simulation directly from forward-simulation Python package `gempyor`. To do this, call `flepimop simulate` providing the name of the configuration file you want to run (ex. `config.yml`). An example config is provided in `flepimop_sample/config_sample_2pop_interventions.yml.`
Stay in the `$PROJECT_PATH` folder, and run a simulation directly from forward-simulation Python package `gempyor`. To do this, call `flepimop simulate` providing the name of the configuration file you want to run (ex. `config.yml`). An example config is provided in `PROJECT_PATH/config_sample_2pop_interventions.yml.`

```
flepimop simulate config.yml
Expand All @@ -203,23 +181,4 @@ It is currently required that all configuration files have an `interventions` se

You can also try to knit the Rmd file in `flepiMoP/flepimop/gempyor_pkg/docs` which will show you how to analyze these files.

### Do it all with a script

The following script does all the above commands in an easy script. Save it in `myparentfolder` as `quick_setup.sh`. Then, just go to `myparentfolder` and type `source quick_setup_flu.sh` and it'll do everything for you!

{% code title="quick_setup_flu.sh" lineNumbers="true" %}
```bash
export FLEPI_PATH=$(pwd)/flepiMoP
export DATA_PATH=$(pwd)/flepimop_sample

cd $FLEPI_PATH
Rscript build/local_install.R
pip install --no-deps -e gempyor_pkg/ # before: python setup.py develop --no-deps

cd $DATA_PATH
rm -rf model_output
export CONFIG_PATH=config.yml # set your configuration file path

flepimop-inference-main -j 1 -n 1 -k 1
```
{% endcode %}
Original file line number Diff line number Diff line change
Expand Up @@ -56,8 +56,7 @@ $ ./flepiMoP/build/hpc_install_or_update.sh <cluster-name>

These steps to initialize the environment need to run on a per run or as needed basis.

Change directory to where a full clone of the `flepiMoP` repository was placed (it will state the location in the output of the script above). And then run the `hpc_init.sh` script, substituting `<cluster-name>` with either `rockfish` or `longleaf`. This script will assume the same defaults as the script before for where the `flepiMoP` clone is and the name of the conda environment. This script will also ask about a project directory and config, if this is your first time initializing `flepiMoP` it might be helpful to clone [the `flepimop_sample` GitHub repository](https://github.com/HopkinsIDD/flepimop\_sample) to the same directory to use as a test.

Change directory to where a full clone of the `flepiMoP` repository was placed (it will state the location in the output of the script above). And then run the `hpc_init.sh` script, substituting `<cluster-name>` with either `rockfish` or `longleaf`. This script will assume the same defaults as the script before for where the `flepiMoP` clone is and the name of the conda environment. This script will also ask about a project directory and config, if this is your first time initializing `flepiMoP` it might be helpful to use configs out of `flepiMoP/examples/tutorials` directory as a test.
```
$ source batch/hpc_init.sh <cluster-name>
```
Expand All @@ -82,7 +81,7 @@ If you'd like to have more control, you can specify the arguments manually:
$ python $FLEPI_PATH/batch/inference_job_launcher.py --slurm \
-c $CONFIG_PATH \
-p $FLEPI_PATH \
--data-path $DATA_PATH \
--data-path $PROJECT_PATH \
emprzy marked this conversation as resolved.
Show resolved Hide resolved
--upload-to-s3 True \
--id $FLEPI_RUN_INDEX \
--fs-folder /scratch4/primary-user/flepimop-runs \
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -110,25 +110,25 @@ export FLEPI_MEM_PROFILE=TRUE
export FLEPI_MEM_PROF_ITERS=50
```

Then prepare the pipeline directory (if you have already done that and the pipeline hasn't been updated (`git pull` says it's up to date). You need to set $DATA\_PATH to your data folder. For a COVID-19 run, do:
Then prepare the pipeline directory (if you have already done that and the pipeline hasn't been updated (`git pull` says it's up to date). You need to set $PROJECT\_PATH to your data folder. For a COVID-19 run, do:

```bash
cd ~/drp
export DATA_PATH=$(pwd)/COVID19_USA
export PROJECT_PATH=$(pwd)/COVID19_USA
export GT_DATA_SOURCE="csse_case, fluview_death, hhs_hosp"
```

for Flu do:

```bash
cd ~/drp
export DATA_PATH=$(pwd)/Flu_USA
export PROJECT_PATH=$(pwd)/Flu_USA
```

Now for any type of run:

```bash
cd $DATA_PATH
cd $PROJECT_PATH
export FLEPI_PATH=$(pwd)/flepiMoP
cd $FLEPI_PATH
git checkout main
Expand All @@ -153,12 +153,12 @@ For now, just in case: update the `arrow` package from 8.0.0 in the docker to 11
Now flepiMoP is ready 🎉 ;

```bash
cd $DATA_PATH
cd $PROJECT_PATH
git pull
git checkout main
```

Do some clean-up before your run. The fast way is to restore the `$DATA_PATH` git repository to its blank states (⚠️ removes everything that does not come from git):
Do some clean-up before your run. The fast way is to restore the `$PROJECT_PATH` git repository to its blank states (⚠️ removes everything that does not come from git):

<pre class="language-bash"><code class="lang-bash"><strong>git reset --hard &#x26;&#x26; git clean -f -d # this deletes everything that is not on github in this repo !!!
</strong></code></pre>
Expand All @@ -178,7 +178,7 @@ rm -rf model_output data/us_data.csv data-truth &&
rm -rf data/seeding_territories_Level5.csv data/seeding_territories_Level67.csv

# don't delete model_output if you have another run in //
rm -rf $DATA_PATH/model_output
rm -rf $PROJECT_PATH/model_output
```

</details>
Expand Down Expand Up @@ -240,7 +240,7 @@ If you'd like to have more control, you can specify the arguments manually:
<pre class="language-bash"><code class="lang-bash"><strong>python $FLEPI_PATH/batch/inference_job_launcher.py --aws \ ## FIX THIS TO REFLECT AWS OPTIONS
</strong><strong> -c $CONFIG_PATH \
</strong><strong> -p $FLEPI_PATH \
</strong><strong> --data-path $DATA_PATH \
</strong><strong> --data-path $PROJECT_PATH \
</strong><strong> --upload-to-s3 True \
</strong><strong> --id $FLEPI_RUN_INDEX \
</strong><strong> --restart-from-location $RESUME_LOCATION
Expand All @@ -250,7 +250,7 @@ We allow for a number of different jobs, with different setups, e.g., you may _n

{% tabs %}
{% tab title="Standard" %}
<pre class="language-bash" data-overflow="wrap"><code class="lang-bash"><strong>cd $DATA_PATH
<pre class="language-bash" data-overflow="wrap"><code class="lang-bash"><strong>cd $PROJECT_PATH
</strong><strong>
</strong>$FLEPI_PATH/batch/inference_job_launcher.py --aws -c $CONFIG_PATH -q $COMPUTE_QUEUE --non-stochastic
</code></pre>
Expand All @@ -259,7 +259,7 @@ We allow for a number of different jobs, with different setups, e.g., you may _n
{% tab title="Non-inference" %}
{% code overflow="wrap" %}
```bash
cd $DATA_PATH
cd $PROJECT_PATH

$FLEPI_PATH/batch/inference_job_launcher.py --aws -c $CONFIG_PATH -q $COMPUTE_QUEUE --non-stochastic -j 1 -k 1
```
Expand All @@ -271,7 +271,7 @@ $FLEPI_PATH/batch/inference_job_launcher.py --aws -c $CONFIG_PATH -q $COMPUTE_QU

**Carrying seeding** (_do this to use seeding fits from resumed run_):

<pre class="language-bash" data-overflow="wrap"><code class="lang-bash"><strong>cd $DATA_PATH
<pre class="language-bash" data-overflow="wrap"><code class="lang-bash"><strong>cd $PROJECT_PATH
</strong><strong>
</strong>$FLEPI_PATH/batch/inference_job_launcher.py --aws -c $CONFIG_PATH -q $COMPUTE_QUEUE --non-stochastic --resume-carry-seeding --restart-from-location $RESUME_LOCATION
</code></pre>
Expand All @@ -280,7 +280,7 @@ $FLEPI_PATH/batch/inference_job_launcher.py --aws -c $CONFIG_PATH -q $COMPUTE_QU

{% code overflow="wrap" %}
```bash
cd $DATA_PATH
cd $PROJECT_PATH

$COVID_PATH/batch/inference_job_launcher.py --aws -c $CONFIG_PATH -q $COMPUTE_QUEUE --non-stochastic --resume-discard-seeding --restart-from-location $RESUME_LOCATION
```
Expand All @@ -290,7 +290,7 @@ $COVID_PATH/batch/inference_job_launcher.py --aws -c $CONFIG_PATH -q $COMPUTE_QU

{% code overflow="wrap" %}
```bash
cd $DATA_PATH
cd $PROJECT_PATH

$COVID_PATH/batch/inference_job_launcher.py -c $CONFIG_PATH -q $COMPUTE_QUEUE --non-stochastic --resume-carry-seeding --restart-from-location $RESUME_LOCATION
```
Expand Down
Loading
Loading