Merge pull request #16 from boettiger-lab/preprint-preparation

Preprint preparation
boettiger-lab · Dec 17, 2024 · 7e1150b · 7e1150b
2 parents 3c4f98a + 61981df
commit 7e1150b
Show file tree

Hide file tree

Showing 10 changed files with 134 additions and 17 deletions.
diff --git a/.gitignore b/.gitignore
@@ -159,4 +159,6 @@ cython_debug/
 #  option (not recommended) you can uncomment the following to ignore the entire idea folder.
 #.idea/
 
+.DS_Store
+
 saved_agents/
diff --git a/README.md b/README.md
@@ -1,32 +1,111 @@
 # rl4fisheries
 
-Models:
-
-- `asm_env.py`: provides `AsmEnv()`. This encodes our population dynamics model, coupled with an observation process, and a harvest process with a corresponding utility model. These processes can all be modified using the `config` argument. Their defaults are defined in `asm_fns.py`. By default, observations are stock biomass and mean weight.
-- `asm_esc.py`: provides `AsmEscEnv()` which inherits from `AsmEnv` and has one difference to it: actions in `AsmEscEnv()` represent escapement levels rather than fishing intensities. 
-- `ams_cr_like.py`: provides `AsmCRLike()`. In this environment, mean weight is observed and the action is to set parameters `(x1, x2, y2)` for a biomass-based harvest control rule of the type `CautionaryRule` (specified below).
-
-Strategies evaluated with Bayesian Optimization: 
-
-- `agents.cautionary_rule.CautionaryRule`: piece-wise linear harvest-control rule specified by three parameters `(x1, x2, y2)`. Example plot (TBD).
-- `agents.msy.Msy`: constant mortality harvest control rule. Specified by one parameter `mortality`.
-- `agents.const_esc.ConstEsc`: constant escapement harvest control rule. Specified by one parameter `escapement`.
+RL and Bayesian optimization methodologies for harvest control rule optimization in fisheries.
+Includes:
+- A gymnasium environment for a Walleye population dynamics model
+- Policy functions for different commonly-tested policies (including those in the paper)
+- Scripts to optimize RL policies and non-RL policies
+- Notebooks to reproduce paper figures
+- Templates to train new RL policies on our Walleye environment
 
 ## Installation
 
-Clone this repo, then:
+To install this source code, you need to have git, Python and pip installed.
+To quickly check whether these are installed you can open the terminal and run the following commands:
+```bash
+git version
+pip --version
+python -V
+```
+If the commands are not recognized by the terminal, refer to
+[here](https://github.com/git-guides/install-git)
+for git installation instructions,
+[here](https://realpython.com/installing-python/) 
+for Python installation instructions and/or
+[here](https://pip.pypa.io/en/stable/installation/)
+for pip installation instructions.
 
+To install this source code, run
 ```bash
+git clone https://github.com/boettiger-lab/rl4fisheries.git
 cd rl4fisheries
-pip install .
+pip install -e .
 ```
 
-## RL training:
+## Optimized policies
+
+The optimized policies presented in the paper---both RL policies and non-RL policies such as the precautionary policy---are saved in a public hugging-face 
+[repository](https://huggingface.co/boettiger-lab/rl4eco/tree/main/sb3/rl4fisheries/results).
+RL policies are saved as zip files named ```PPO-AsmEnv-(...)-UMx-(...).zip``` since the RL algorithm PPO was used to optimize them.
+Here *UM* stands for *utility model* and `x=1, 2, or 3` designates which utility model the policy was optimized for.
+Precautionary policies are named `cr-UMx.pkl` (CR stands for "cautionary rule", an acronym we used during the research phase of this collaboration).
+Similarly, constant escapement policies are saved as `esc-UMx.pkl` and FMSY policies are saved as `msy-UMx.pkl`.
+
+## Reproducing paper figures
+
+The Jupyter notebooks found at `rl4fisheries/notebooks/for_results` may be used to recreate the figures found in the paper. 
+Notice that the data for the plots is re-generated each time the notebook is run so, e.g., the time-series plots will look different.
 
-Simply run 
+To reproduce these figures in your own machine you need to have Jupyter Notebooks installed, however you can navigate to 
+```https://github.com/boettiger-lab/rl4fisheries```
+and click on `code > codespaces > Create codespace on main` to open the notebooks in a Github codespace.
+
+## Optimizing RL policies
+
+To optimize an RL policy from scratch, use the command
 ```bash
 python scripts/train.py -f path/to/config/file.yml
 ```
-The trained model is automatically pushed to Huggingface (requires a HF token). 
-The config files used for our results are found in `hyperpars/for_results/`
+You can use the following template config file:
+```bash
+python scripts/train.py -f hyperpars/RL-template.yml
+```
 
+The config files we used for the policies in our paper are found at `hyperpars/for_results/`.
+For example 
+[this](https://github.com/boettiger-lab/rl4fisheries/blob/main/hyperpars/for_results/ppo_biomass_UM1.yml) 
+config file was used to train 1-Obs. RL in Scenario 1 (utility = total harvest).
+The trained model is automatically pushed to hugging-face if a hugging-face token is provided. 
+
+## Source code structure
+
+```
+rl4fisheries
+|
+|-- hyperpars
+|   |
+|   |-- configuration yaml files
+|
+|-- notebooks
+|   |
+|   |-- Jupyter notebooks
+|
+|-- src/rl4fisheries
+|   |
+|   |-- agents
+|   |   |
+|   |   |-- interfaces for policies such as Precautionary Policy, FMSY, Constant Escapement
+|   |
+|   |-- envs
+|   |   |
+|   |   |-- Gymnasium environments used in our study.
+|   |       (specifically, asm_env.py is used for our paper).
+|   |
+|   |-- utils
+|       |
+|       |-- ray.py: RL training within Ray framework (not used in paper)
+|       |
+|       |-- sb3.py: RL training within Stable Baselines framework (used in paper)
+|       |
+|       |-- simulation.py: helper functions to simulate the system dynamics using a policy
+|    
+|-- tests
+|   |
+|   |-- continuous integration tests to ensure code quality in pull requests
+|
+|-- noxfile.py: file used to run continuous integration tests
+|
+|-- pyproject.toml: file used in the installation of this source code
+|
+|-- README.md 
+```
diff --git a/hyperpars/RL-template.yml b/hyperpars/RL-template.yml
@@ -0,0 +1,36 @@
+algo: "PPO"
+total_timesteps: 6000000
+algo_config:
+    tensorboard_log: "../../../logs"
+    #
+    # use a feedforward neural net with three layers of 64, 32, and 16 neurons
+    policy: 'MlpPolicy' 
+    use_sde: True
+    policy_kwargs: "dict(net_arch=[64, 32, 16])"
+    #
+    # you can add hyperparameter values here, e.g. by uncommenting the following row:
+    # learning_rate: 0.00015
+
+# The environment simulating the population dynamics of Walleye
+env_id: "AsmEnv"
+config: 
+  # configurations that specify the specifics of the environment:
+  #
+  # use one observation (vulnerable biomass)
+  observation_fn_id: 'observe_1o'
+  n_observs: 1
+  #
+  # use the "default" utility function:
+  harvest_fn_name: "default"
+  upow: 1
+
+# helps paralellize training:
+n_envs: 12
+
+# save and upload models to hugging-face (needs hugging-face token)
+repo: "boettiger-lab/rl4eco"
+save_path: "../from-template/"
+id: "from-template"
+
+# misc, needed to use custom network structures (as in algo_config: policy_kwargs).
+additional_imports: ["torch"]
diff --git a/notebooks/SystemDynamics.ipynb → notebooks/old/SystemDynamics.ipynb b/notebooks/SystemDynamics.ipynb → notebooks/old/SystemDynamics.ipynb
diff --git a/notebooks/compare-solutions.ipynb → notebooks/old/compare-solutions.ipynb b/notebooks/compare-solutions.ipynb → notebooks/old/compare-solutions.ipynb
diff --git a/notebooks/explore-optima.ipynb → notebooks/old/explore-optima.ipynb b/notebooks/explore-optima.ipynb → notebooks/old/explore-optima.ipynb
diff --git a/.../optimal-fixed-policy-cases-results.ipynb → .../optimal-fixed-policy-cases-results.ipynb b/.../optimal-fixed-policy-cases-results.ipynb → .../optimal-fixed-policy-cases-results.ipynb
diff --git a/notebooks/optimal-fixed-policy.ipynb → notebooks/old/optimal-fixed-policy.ipynb b/notebooks/optimal-fixed-policy.ipynb → notebooks/old/optimal-fixed-policy.ipynb
diff --git a/notebooks/popdyn_tests.ipynb → notebooks/old/popdyn_tests.ipynb b/notebooks/popdyn_tests.ipynb → notebooks/old/popdyn_tests.ipynb
diff --git a/notebooks/result_plots.ipynb → notebooks/old/result_plots.ipynb b/notebooks/result_plots.ipynb → notebooks/old/result_plots.ipynb