Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Noise Benchmarking, LSTM sb3_zoo hyperpar, Issue #1 #9

Merged
merged 24 commits into from
Mar 7, 2024
Merged

Conversation

felimomo
Copy link
Collaborator

@felimomo felimomo commented Mar 5, 2024

In this branch:

  • I added the ability to manually tune the system noise (Asm().sigma) with a script input, added a bash script to train agents for several values of noise simultaneously.
  • I added RecurrentPPO hyperparams from sb3_zoo, and added the fn sb3_train_v2 which is tailored to reading these hyperparams from yaml.
  • I fixed Issue#1
  • I updated the observation and harvest processes to be relative to vulnerable populations rather than to the entire population.

Next steps: sb3_train_v2 is more general than sb3_train, however its syntax is a bit more involved and might need refactoring in order to reach sb3_train's level of clarity. I'll leave this for the next pull request -- the code is fully functional right now, although in a 'quick-n-dirty' state in some parts.

@felimomo felimomo requested a review from cboettig March 5, 2024 19:49
@felimomo felimomo marked this pull request as draft March 5, 2024 23:41
@felimomo
Copy link
Collaborator Author

felimomo commented Mar 5, 2024

Converted to draft to solve Issue #1

@felimomo felimomo marked this pull request as ready for review March 5, 2024 23:51
@felimomo felimomo changed the title Noise Benchmarking, adding LSTM training with sb3_zoo hyperparams Noise Benchmarking, LSTM sb3_zoo hyperpar, Issue #1 Mar 5, 2024
@felimomo felimomo marked this pull request as draft March 6, 2024 21:01
@felimomo
Copy link
Collaborator Author

felimomo commented Mar 6, 2024

Updated based on our meeting! Also, side note: now self.vulb is updated at the observation using the new (post-harvest, post-growth) system state.

Copy link
Member

@cboettig cboettig left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@felimomo fantastic work here! A few comments in line. Basically I think let's just add the harvest_vul in now as well, since it looks odd to have something explicitly called survey_vul being used inside the harvest function, and then merge this.

I think we're due for a refactor of the Asm code, so that it's easier to read and easier to customize. (most importantly, asm_2o should re-use all the code except for an override observation method, and similarly for escapement with an override step method I think). but let's tackle that in a separate PR.

hyperpars/ppo-asm-v0-1.yml Outdated Show resolved Hide resolved
hyperpars/ppo-asm2o-v0-1.yml Outdated Show resolved Hide resolved
hyperpars/ppo-asm2o-v0-1.yml Outdated Show resolved Hide resolved
hyperpars/rppo-asm2o.yml Show resolved Hide resolved
scripts/benchmark_noise.sh Show resolved Hide resolved
@@ -174,7 +173,7 @@ def initialize_population(self):

# leading array calculations to get vul-at-age, wt-at-age, etc.
for a in range(0, p["n_age"], 1):
vul[a] = 1 / (1 + np.exp(-p["asl"] * (p["ages"][a] - p["ahv"])))
survey_vul[a] = 1 / (1 + np.exp(-p["asl"] * (p["ages"][a] - p["ahv"])))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

once we have a harvest_vul, it's going to need to be initialized here too. Presumably it's identical but has different values for p["asl"] and p["ahv"]. (We'll need @CarlJwalters or @ChrisFishCahill to figure out some reasonable choices there)

src/rl4fisheries/envs/asm.py Outdated Show resolved Hide resolved
self.abar = sum(p["vul"] * np.array(p["ages"]) * n) / sum(n)
self.wbar = sum(p["vul"] * n * p["wt"]) / sum(n * p["wt"])
self.abar = sum(p["survey_vul"] * np.array(p["ages"]) * n) / sum(n)
self.wbar = sum(p["survey_vul"] * n * p["wt"]) / sum(n * p["wt"])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same as above, eventually this should be harvest_vul

src/rl4fisheries/envs/asm_2o.py Show resolved Hide resolved
src/rl4fisheries/utils/sb3.py Show resolved Hide resolved
@cboettig cboettig marked this pull request as ready for review March 7, 2024 00:19
@cboettig cboettig merged commit bc0a45d into main Mar 7, 2024
2 checks passed
@felimomo felimomo deleted the benchmarking branch March 18, 2024 19:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants