Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Idea] Implement success rate in logging #175

Closed
Ivan-267 opened this issue Feb 17, 2024 · 4 comments
Closed

[Idea] Implement success rate in logging #175

Ivan-267 opened this issue Feb 17, 2024 · 4 comments
Assignees
Labels
enhancement New feature or request

Comments

@Ivan-267
Copy link
Collaborator

Ivan-267 commented Feb 17, 2024

Proposal:

In addition to reward, having an overview of the current success rate can be useful in many envs. This can be a very important metric in envs that have a clear goal (e.g. successfully landed for the 3DLander env, sucessfully parked for the 3DCarParking env, etc.).

It seems we could support this with SB3 by implementing:.

https://stable-baselines3.readthedocs.io/en/master/common/logger.html#rollout

success_rate: Mean success rate during training (averaged over stats_window_size episodes, 100 by default), you must pass an extra argument to the Monitor wrapper to log that value (info_keywords=("is_success",)) and provide info["is_success"]=True/False on the final step of the episode

Needs to be considered:

It would be great if this can be added in a way that doesn't affect previous envs (e.g. they either report always true, false, or don't show this statistic).

  1. Info sending/receiving: Some modifications would be needed to the plugin and Python env code to send / receive info, optimally preserving compatibility with older envs that don't send info. Once we enable info sending, we can later also set the truncated/terminated flags.

  2. Usage / plugin side changes:

func end_episode(final_reward = 0, success = true):
	reward += final_reward
	done = true
	needs_reset = true
	episode_successful = success

(Just a potential usage example, the end episode method is implemented in the env code, not plugin, although we can consider simplifying the process with something like edbeeching/godot_rl_agents_plugin#20, however, that does break compatibility with existing envs)

For compatibility, possibly the simplest way would be to always report episode success as true by default, unless set by the user.
Optionally, we could also add a boolean arg to the sb3 example script that sets the monitor to report this stat or not.

@Ivan-267 Ivan-267 added the enhancement New feature or request label Feb 27, 2024
@GianiStatie
Copy link
Collaborator

GianiStatie commented Oct 16, 2024

I can help implement this, since I also wanted to expose the info dictionary to Python, such that we can later populate it with more useful info.

If I got it correctly from the documentation (the environment info dict must contain an is_success key to compute that value) we just need to add the "is_success" key to the Godot info. I think we can do this based out of whether the environment ended before or after the X simulations steps you set in the environment.

@Ivan-267
Copy link
Collaborator Author

I can help implement this, since I also wanted to expose the info dictionary to Python, such that we can later populate it with more useful info.

If I got it correctly from the documentation (the environment info dict must contain an is_success key to compute that value) we just need to add the "is_success" key to the Godot info. I think we can do this based out of whether the environment ended before or after the X simulations steps you set in the environment.

Thanks, we can work on implementing this together. For now, a few things to consider are that some of these will be SB3 specific (the success flag), but we can highlight that with a comment somewhere in the plugin and make it optional, and other stuff like custom info should be framework agnostic (no issues there). There's an additional thing I wanted to implement at some point, and that's truncation. These are separate issues, but of course make sense to consider when we implement the changes.

For success: I think we should let the user decide the criteria for success. An episode that reaches the restart timer is not necessarily unsuccessful (e.g. one env's goal could be to collect as many items as possible before the env restarts, there's not necessarily a clear success here, but the user could define an arbitrary success threshold for tracking if wanted). What do you think?

@GianiStatie
Copy link
Collaborator

GianiStatie commented Oct 16, 2024

Sounds good. I'll start working on a draft sometime this week and I'll ping you so we can ping-pong some ideas.

Regarding the other idea:
When you say truncation, you mean terminating all agents once one of them is done?

@GianiStatie
Copy link
Collaborator

I've created 2 MRs for the first iteration:

I've tested it locally with a 2D environment I'm working on and so far so good

@GianiStatie GianiStatie self-assigned this Oct 21, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants