[Bug]: Rendering with EvalCallback does not render the initial or final state #1692

kylesayrs · 2023-09-23T23:49:53Z

🐛 Bug

EvalCallback does not render initial state of the episode. This is an issue for environments which have dynamic/random initial states, as it makes it unclear to the user what the state looked like before the agent performed its first step.
EvalCallback does not render the last step of the episode, instead rendering the reset state. This robs the user of seeing the last (often crucial) step and is confusing for new users.

This behavior is because the environment reset is performed within the DummyVecEnv.step function. I have a local implementation that fixes this issue and clearly renders both initial and final states, but I'd first like to confirm that this behavior should indeed be treated as a bug worthy of a feature/fix.

To Reproduce

from typing import Optional

import numpy as np

from stable_baselines3.common.env_util import make_vec_env
from stable_baselines3 import DQN
from stable_baselines3.common.monitor import Monitor
from stable_baselines3.common.callbacks import EvalCallback
from stable_baselines3.common.envs import BitFlippingEnv


# A user should not be required to subclass their environment to debug the EvalCallback
class ClearerRenderEnv(BitFlippingEnv):
    def step(self, *args, **kwargs):
        pre_state = self.state.copy()
        rets = super().step(*args, **kwargs)
        print(f"step    : {pre_state} -> {self.state} | {self.desired_goal}")

        return rets
    
    def reset(self, *args, **kwargs):
        if not hasattr(self, "state"):
            return super().reset(*args, **kwargs)
        
        pre_state = self.state.copy()
        rets = super().reset(*args, **kwargs)
        print(f"reset   : {pre_state} -> {self.state} | {self.desired_goal} ")

        return rets

    def render(self) -> Optional[np.ndarray]:
        if self.render_mode == "rgb_array":
            return self.state.copy()
        print(f"rendered:          {self.state} | {self.desired_goal}")


if __name__ == "__main__":
    eval_callback = EvalCallback(
        Monitor(ClearerRenderEnv(n_bits=2)),
        n_eval_episodes=1,
        eval_freq=10_000,
        render=True,
    )

    environment = make_vec_env(
        BitFlippingEnv,
        env_kwargs={"n_bits": 2},
        n_envs=1
    )

    model = DQN(
        "MultiInputPolicy",
        environment,
        learning_starts=0,
        learning_rate=0.1,
        verbose=2
    )
    
    model.learn(
        total_timesteps=10_000,
        log_interval=None,
        callback=eval_callback,
        progress_bar=True,
    )

Relevant log output / Error message

Notice the first state [0, 0] is never rendered and neither is the last step, only the reset frame (in this case, also [0, 0])

step    : [0 0] -> [0 1] | [1 1]
rendered:          [0 1] | [1 1]
step    : [0 1] -> [1 1] | [1 1]
reset   : [1 1] -> [0 0] | [1 1] 
rendered:          [0 0] | [1 1]
Eval num_timesteps=10000, episode_reward=-1.00 +/- 0.00
Episode length: 2.00 +/- 0.00
Success rate: 100.00%

For a new user only looking at the rendered messages, this would be very confusing. Here's another example where it seems like the agent changed two values at once.

step    : [1 1] -> [1 0] | [1 1]
rendered:          [1 0] | [1 1]
step    : [1 0] -> [1 1] | [1 1]
reset   : [1 1] -> [0 1] | [1 1] 
rendered:          [0 1] | [1 1]
Eval num_timesteps=10, episode_reward=-1.00 +/- 0.00
Episode length: 2.00 +/- 0.00
Success rate: 100.00%

System Info

No response

Checklist

My issue does not relate to a custom gym environment. (Use the custom gym env template instead)
I have checked that there is no similar issue in the repo
I have read the documentation
I have provided a minimal and working example to reproduce the bug
I've used the markdown code blocks for both code and stack traces.

The text was updated successfully, but these errors were encountered:

araffin · 2023-09-25T09:24:55Z

Hello,
I guess the issue comes from evaluate_policy() where should call render() before the call to predict().
For the final, I'm not sure as with VecEnv, they are reset automatically.

kylesayrs · 2023-09-26T02:55:06Z

The evalcallback already specially constructs an environment for evaluation, and this environment should act slightly differently than typical VecEnvs (for example, stop stepping after reaching a terminal state and do not reset) . Therefore my current plan to fix is to subclass VecEnv and overload the step_wait function to step to not reset automatically.

I'll have to create a check in the step function to make sure that step is never called with a done environment

kylesayrs · 2023-09-26T02:56:50Z

Side note, there's another discussion to be had about parallelizing the eval environment, aka passing a n_envs argument to evaluate_policy

araffin · 2023-09-26T09:21:28Z

I'll have to create a check in the step function to make sure that step is never called with a done environment
Side note, there's another discussion to be had about parallelizing the eval environment, aka passing a n_envs argument to evaluate_policy

The two are related, evaluate_policy() already supports multiple envs, and that's because of VecEnv precisely: #402

and this environment should act slightly differently than typical VecEnvs

if you want that behavior, you should use a gym.Env instead.
As I don't see rendering the last step as critical (rendering is usually for debug only), I would call render() before predict but add a note in the doc about the fact that the last timestep is not.

kylesayrs · 2023-09-27T04:31:39Z

The two are related, evaluate_policy() already supports multiple envs, and that's because of VecEnv precisely.

evaluate_policy has the capacity to support multiple envs, but currently hard codes just one.

The two methods that should be overridden by the EvalVecEnv subclass are VecEnv.reset and VecEnv.step.
If we pass both of these methods the render argument, reset will ensure that the first frame is rendered and step will ensure that render() is called after the predicted step is taken in all environments but before any of the environments reset.

Pseudocode

def reset(render=false):
    rets = super().reset()
    if render:
        self.render()

    return rets

def step(render=false):
    self.step_async()
    for env_idx in range(self.num_envs):
        ...
        self.envs[env_idx].step(...)
        ...
    
    if render:
        self.render()
    
    for env_idx in range(self.num_envs):
        ...
        self.envs[env_idx].reset()
        ...

araffin · 2023-09-27T08:16:25Z

evaluate_policy has the capacity to support multiple envs, but currently hard codes just one.

What makes you think that?

Pseudocode

so if you want to do something like that, gym wrapper will help you achieve what you want, no need for a new VecEnv.

kylesayrs · 2023-09-27T17:24:46Z

What makes you think that?

common/evaluation.py:62 env = DummyVecEnv([lambda: env])
common/envs/dummy_vec_env.py:42 super().__init__(len(env_fns), env.observation_space, env.action_space)
common/envs/base_vec_env.py:63 self.num_envs = num_envs

Therefore env.num_envs is always 1. Is it possible I'm misreading something?

kylesayrs · 2023-09-27T17:32:29Z

Ah, I guess a user could pass a VecEnv that is already instantiated with multiple environments. I missed this in the documentation.

If a vector env is passed in, this divides the episodes to evaluate onto the different elements of the vector env

so if you want to do something like that, gym wrapper will help you achieve what you want, no need for a new VecEnv.

Are you proposing a EvalGymWrapper after the environment has been vectorized?

if not isinstance(env, VecEnv):
        env = DummyVecEnv([lambda: env])  # type: ignore[list-item, return-value]
env = EvalWrapper(env)

araffin · 2023-09-27T18:00:03Z

Ah, I guess a user could pass a VecEnv that is already instantiated with multiple environments. I missed this in the documentation.

yes

Are you proposing a EvalGymWrapper after the environment has been vectorized?

No, I'm just saying that if a user wants to render the very last step, he/she can wrap the Gym env before creating a VecEnv with a Gym wrapper that calls render before and after the step, in order to render the final step.

Rendering the first step will be solved by moving the call to render() in the evaluation function.

kylesayrs · 2023-09-28T03:48:29Z

Shouldn't rendering all the steps and states be the default behavior of the EvalCallback? I personally find a lot of value in rendering all the frames both debugging the env and the agent's learning.

kylesayrs · 2023-09-28T05:48:44Z

I see how my previous implementation wouldn't work if the function is passed a VecEnv, rather than the base gym environment.

One change could to be to enforce that the caller pass a base Gym environment and a n_envs, then the function constructs a EvalVecEnv which inherits from VecEnv and implements correct rendering with the specified number of environments.

We can continue to support VecEnvs, but display a warning that rendering won't behave properly.

I believe this implementation would allow first and last frames to be rendered, preserve multi-environment evaluation, and reduce complexity for the user since they not longer have to construct a vec_env themselves. What do you think?

kylesayrs · 2023-10-02T08:16:27Z

This might be better done with a wrapper. Still working on a PR...

jonomon · 2023-10-07T16:51:41Z

I was missing the last render.
As a workaround, I added a "terminal_render" value in self.buf_infos of the DummyVecEnv

# https://github.com/DLR-RM/stable-baselines3/blob/master/stable_baselines3/common/vec_env/dummy_vec_env.py#L69
def step_wait(self) -> VecEnvStepReturn:
  ...
  if self.buf_dones[env_idx]:
      # save final observation where user can get it, then reset
      self.buf_infos[env_idx]["terminal_observation"] = obs
      self.buf_infos[env_idx]["terminal_render"] = self.envs[env_idx].render() #<---- THIS LINE
      obs, self.reset_infos[env_idx] = self.envs[env_idx].reset()

In the eval callback, I included a conditional render:

if done:
    screen = self._eval_env.buf_infos[0]["terminal_render"]
else:
    screen = self._eval_env.render()

kylesayrs added the bug Something isn't working label Sep 23, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug]: Rendering with EvalCallback does not render the initial or final state #1692

[Bug]: Rendering with EvalCallback does not render the initial or final state #1692

kylesayrs commented Sep 23, 2023 •

edited

Loading

araffin commented Sep 25, 2023

kylesayrs commented Sep 26, 2023

kylesayrs commented Sep 26, 2023

araffin commented Sep 26, 2023

kylesayrs commented Sep 27, 2023

araffin commented Sep 27, 2023

kylesayrs commented Sep 27, 2023

kylesayrs commented Sep 27, 2023

araffin commented Sep 27, 2023

kylesayrs commented Sep 28, 2023 •

edited

Loading

kylesayrs commented Sep 28, 2023 •

edited

Loading

kylesayrs commented Oct 2, 2023

jonomon commented Oct 7, 2023

[Bug]: Rendering with EvalCallback does not render the initial or final state #1692

[Bug]: Rendering with EvalCallback does not render the initial or final state #1692

Comments

kylesayrs commented Sep 23, 2023 • edited Loading

🐛 Bug

To Reproduce

Relevant log output / Error message

System Info

Checklist

araffin commented Sep 25, 2023

kylesayrs commented Sep 26, 2023

kylesayrs commented Sep 26, 2023

araffin commented Sep 26, 2023

kylesayrs commented Sep 27, 2023

araffin commented Sep 27, 2023

kylesayrs commented Sep 27, 2023

kylesayrs commented Sep 27, 2023

araffin commented Sep 27, 2023

kylesayrs commented Sep 28, 2023 • edited Loading

kylesayrs commented Sep 28, 2023 • edited Loading

kylesayrs commented Oct 2, 2023

jonomon commented Oct 7, 2023

kylesayrs commented Sep 23, 2023 •

edited

Loading

kylesayrs commented Sep 28, 2023 •

edited

Loading

kylesayrs commented Sep 28, 2023 •

edited

Loading