diff --git a/.gitignore b/.gitignore index dd0f825..64be628 100644 --- a/.gitignore +++ b/.gitignore @@ -159,4 +159,6 @@ cython_debug/ # option (not recommended) you can uncomment the following to ignore the entire idea folder. #.idea/ +.DS_Store + saved_agents/ diff --git a/README.md b/README.md index 27841a3..c922f78 100644 --- a/README.md +++ b/README.md @@ -1,32 +1,111 @@ # rl4fisheries -Models: - -- `asm_env.py`: provides `AsmEnv()`. This encodes our population dynamics model, coupled with an observation process, and a harvest process with a corresponding utility model. These processes can all be modified using the `config` argument. Their defaults are defined in `asm_fns.py`. By default, observations are stock biomass and mean weight. -- `asm_esc.py`: provides `AsmEscEnv()` which inherits from `AsmEnv` and has one difference to it: actions in `AsmEscEnv()` represent escapement levels rather than fishing intensities. -- `ams_cr_like.py`: provides `AsmCRLike()`. In this environment, mean weight is observed and the action is to set parameters `(x1, x2, y2)` for a biomass-based harvest control rule of the type `CautionaryRule` (specified below). - -Strategies evaluated with Bayesian Optimization: - -- `agents.cautionary_rule.CautionaryRule`: piece-wise linear harvest-control rule specified by three parameters `(x1, x2, y2)`. Example plot (TBD). -- `agents.msy.Msy`: constant mortality harvest control rule. Specified by one parameter `mortality`. -- `agents.const_esc.ConstEsc`: constant escapement harvest control rule. Specified by one parameter `escapement`. +RL and Bayesian optimization methodologies for harvest control rule optimization in fisheries. +Includes: +- A gymnasium environment for a Walleye population dynamics model +- Policy functions for different commonly-tested policies (including those in the paper) +- Scripts to optimize RL policies and non-RL policies +- Notebooks to reproduce paper figures +- Templates to train new RL policies on our Walleye environment ## Installation -Clone this repo, then: +To install this source code, you need to have git, Python and pip installed. +To quickly check whether these are installed you can open the terminal and run the following commands: +```bash +git version +pip --version +python -V +``` +If the commands are not recognized by the terminal, refer to +[here](https://github.com/git-guides/install-git) +for git installation instructions, +[here](https://realpython.com/installing-python/) +for Python installation instructions and/or +[here](https://pip.pypa.io/en/stable/installation/) +for pip installation instructions. +To install this source code, run ```bash +git clone https://github.com/boettiger-lab/rl4fisheries.git cd rl4fisheries -pip install . +pip install -e . ``` -## RL training: +## Optimized policies + +The optimized policies presented in the paper---both RL policies and non-RL policies such as the precautionary policy---are saved in a public hugging-face +[repository](https://huggingface.co/boettiger-lab/rl4eco/tree/main/sb3/rl4fisheries/results). +RL policies are saved as zip files named ```PPO-AsmEnv-(...)-UMx-(...).zip``` since the RL algorithm PPO was used to optimize them. +Here *UM* stands for *utility model* and `x=1, 2, or 3` designates which utility model the policy was optimized for. +Precautionary policies are named `cr-UMx.pkl` (CR stands for "cautionary rule", an acronym we used during the research phase of this collaboration). +Similarly, constant escapement policies are saved as `esc-UMx.pkl` and FMSY policies are saved as `msy-UMx.pkl`. + +## Reproducing paper figures + +The Jupyter notebooks found at `rl4fisheries/notebooks/for_results` may be used to recreate the figures found in the paper. +Notice that the data for the plots is re-generated each time the notebook is run so, e.g., the time-series plots will look different. -Simply run +To reproduce these figures in your own machine you need to have Jupyter Notebooks installed, however you can navigate to +```https://github.com/boettiger-lab/rl4fisheries``` +and click on `code > codespaces > Create codespace on main` to open the notebooks in a Github codespace. + +## Optimizing RL policies + +To optimize an RL policy from scratch, use the command ```bash python scripts/train.py -f path/to/config/file.yml ``` -The trained model is automatically pushed to Huggingface (requires a HF token). -The config files used for our results are found in `hyperpars/for_results/` +You can use the following template config file: +```bash +python scripts/train.py -f hyperpars/RL-template.yml +``` +The config files we used for the policies in our paper are found at `hyperpars/for_results/`. +For example +[this](https://github.com/boettiger-lab/rl4fisheries/blob/main/hyperpars/for_results/ppo_biomass_UM1.yml) +config file was used to train 1-Obs. RL in Scenario 1 (utility = total harvest). +The trained model is automatically pushed to hugging-face if a hugging-face token is provided. + +## Source code structure + +``` +rl4fisheries +| +|-- hyperpars +| | +| |-- configuration yaml files +| +|-- notebooks +| | +| |-- Jupyter notebooks +| +|-- src/rl4fisheries +| | +| |-- agents +| | | +| | |-- interfaces for policies such as Precautionary Policy, FMSY, Constant Escapement +| | +| |-- envs +| | | +| | |-- Gymnasium environments used in our study. +| | (specifically, asm_env.py is used for our paper). +| | +| |-- utils +| | +| |-- ray.py: RL training within Ray framework (not used in paper) +| | +| |-- sb3.py: RL training within Stable Baselines framework (used in paper) +| | +| |-- simulation.py: helper functions to simulate the system dynamics using a policy +| +|-- tests +| | +| |-- continuous integration tests to ensure code quality in pull requests +| +|-- noxfile.py: file used to run continuous integration tests +| +|-- pyproject.toml: file used in the installation of this source code +| +|-- README.md +``` \ No newline at end of file diff --git a/hyperpars/RL-template.yml b/hyperpars/RL-template.yml new file mode 100644 index 0000000..ed7c51f --- /dev/null +++ b/hyperpars/RL-template.yml @@ -0,0 +1,36 @@ +algo: "PPO" +total_timesteps: 6000000 +algo_config: + tensorboard_log: "../../../logs" + # + # use a feedforward neural net with three layers of 64, 32, and 16 neurons + policy: 'MlpPolicy' + use_sde: True + policy_kwargs: "dict(net_arch=[64, 32, 16])" + # + # you can add hyperparameter values here, e.g. by uncommenting the following row: + # learning_rate: 0.00015 + +# The environment simulating the population dynamics of Walleye +env_id: "AsmEnv" +config: + # configurations that specify the specifics of the environment: + # + # use one observation (vulnerable biomass) + observation_fn_id: 'observe_1o' + n_observs: 1 + # + # use the "default" utility function: + harvest_fn_name: "default" + upow: 1 + +# helps paralellize training: +n_envs: 12 + +# save and upload models to hugging-face (needs hugging-face token) +repo: "boettiger-lab/rl4eco" +save_path: "../from-template/" +id: "from-template" + +# misc, needed to use custom network structures (as in algo_config: policy_kwargs). +additional_imports: ["torch"] \ No newline at end of file diff --git a/notebooks/SystemDynamics.ipynb b/notebooks/old/SystemDynamics.ipynb similarity index 100% rename from notebooks/SystemDynamics.ipynb rename to notebooks/old/SystemDynamics.ipynb diff --git a/notebooks/compare-solutions.ipynb b/notebooks/old/compare-solutions.ipynb similarity index 100% rename from notebooks/compare-solutions.ipynb rename to notebooks/old/compare-solutions.ipynb diff --git a/notebooks/explore-optima.ipynb b/notebooks/old/explore-optima.ipynb similarity index 100% rename from notebooks/explore-optima.ipynb rename to notebooks/old/explore-optima.ipynb diff --git a/notebooks/optimal-fixed-policy-cases-results.ipynb b/notebooks/old/optimal-fixed-policy-cases-results.ipynb similarity index 100% rename from notebooks/optimal-fixed-policy-cases-results.ipynb rename to notebooks/old/optimal-fixed-policy-cases-results.ipynb diff --git a/notebooks/optimal-fixed-policy.ipynb b/notebooks/old/optimal-fixed-policy.ipynb similarity index 100% rename from notebooks/optimal-fixed-policy.ipynb rename to notebooks/old/optimal-fixed-policy.ipynb diff --git a/notebooks/popdyn_tests.ipynb b/notebooks/old/popdyn_tests.ipynb similarity index 100% rename from notebooks/popdyn_tests.ipynb rename to notebooks/old/popdyn_tests.ipynb diff --git a/notebooks/result_plots.ipynb b/notebooks/old/result_plots.ipynb similarity index 100% rename from notebooks/result_plots.ipynb rename to notebooks/old/result_plots.ipynb