Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add gym make support for Meta-World envs #498

Open
wants to merge 33 commits into
base: master
Choose a base branch
from

Conversation

reginald-mclean
Copy link
Collaborator

@reginald-mclean reginald-mclean commented Aug 29, 2024

This PR is built on top of #499 to add gym.make support for the environments in Meta-World. They are organized as follows:

  • gym.make('Meta-World/env-name'): this creates a single environment with a single goal
  • gym.make('Meta-World/MT1-env-name'): This creates a version of the env-name environment with multiple goals, but no testing goals. This is typically used to test how an RL algorithm can acquire skills in a goal conditioned fashion
  • gym.make('Meta-World/ML1-train-env-name') or gym.make('Meta-World/ML1-test-env-name'): These commands make the training environment or testing environment for Meta-Learning problems with a single environment.

And there are also gym.make_vec commands that return multiple environments wrapped in a sync or async wrapper:

  • gym.make_vec('Meta-World/MT10-sync') or gym.make_vec('Meta-World/MT10-async'): This returns the MT10 set of environments either in sync or async mode.
  • gym.make_vec('Meta-World/ML10-train-sync') or gym.make_vec('Meta-World/MT10-train-async'): This returns the ML10 set of training environments either in sync or async mode.
  • gym.make_vec('Meta-World/ML10-test-sync') or gym.make_vec('Meta-World/ML10-test-async'): This returns the ML10 set of testing environments either in sync or async mode.
  • gym.make_vec('Meta-World/ML45-train-sync') or gym.make_vec('Meta-World/ML45-train-async'): This returns the ML10 set of training environments either in sync or async mode.
  • gym.make_vec('Meta-World/ML45-test-sync') or gym.make_vec('Meta-World/ML45-test-async'): This returns the ML10 set of testing environments either in sync or async mode.

@rainx0r
Copy link
Contributor

rainx0r commented Aug 30, 2024

Few things I noticed:

  • It's not entirely clear to me why _make_single_env() has ML1-related code when _make_single_ml() exists.
  • For MT envs, they should also have an AutoTerminateOnSuccess wrapper but it should be toggled to the initial state specified by the terminate_on_success flag as done here.
  • use_one_hot is set to False for all registered envs but it should be True for MT10 and MT50.
  • The way seed is added to the TaskSelect wrappers could probably cause some weird issues as it would reinitialise the global numpy rng state multiple times during env instantiation, and in general it's not necessary so it should probably be removed. The wrappers use the underlying env's np_random anyway so they don't really need to be seeded. Maybe the seed passed into init_each_env should just be used on the env directly:
    env = env_cls()
    if seed:
        env.seed(seed)

Also I think for simplicity it should be possible to just have a single definition of init_each_env that is used for both MT and ML that takes in env_cls, tasks, task_select_method, maybe seed, maybe max_episode_steps, maybe use_one_hot / env_id / num_tasks and has branching logic for OneHotWrapper and the task select method. MT envs just provide those one-hot parameters and use all tasks, while ML envs handle the task splitting a bit differently but they otherwise use the same wrappers and logic.

@reginald-mclean
Copy link
Collaborator Author

reginald-mclean commented Aug 30, 2024

@rainx0r

It's not entirely clear to me why _make_single_env() has ML1-related code when _make_single_ml() exists.

Good catch, remnant of previous attempt at creating ML envs.

For MT envs, they should also have an AutoTerminateOnSuccess wrapper but it should be toggled to the initial state specified by the terminate_on_success flag as done

Will update

use_one_hot is set to False for all registered envs but it should be True for MT10 and MT50.

I don't know if I agree with this. We can include the wrapper for completeness and show examples of enabling it, but I think we shouldn't influence users to use the wrapper by default.

Maybe the seed passed into init_each_env should just be used on the env directly

Agreed

Also I think for simplicity it should be possible to just have a single definition of init_each_env

It would be possible but I think it might become a bit of a convoluted function to write/maintain. Unless there's a clean way of merging them, I think keeping them separate makes the most sense for maintenance reasons.

@pseudo-rnd-thoughts
Copy link
Member

pseudo-rnd-thoughts commented Sep 2, 2024

From experience, having tonnes of registered environment can make life easier but it can mean many more environments. For future proofing, an alternative approach is

We can keep, MetaWorld/env-name, this is good.
Then for MT1, an alternative is gym.make("MetaWorld/MT1", env_name="env_name", mode="train/test"), this allows flexibility with env-name (to add more) and mode to easily specify if to train or test.

Similarly for MT10, we can use env_names=[...]

Is there any reason for 1 and 10 only, could we make it MultiTask with env_names that is flexible to any number?

I'm purely spitballing ideas, you don't need to take any of them

@reginald-mclean
Copy link
Collaborator Author

@pseudo-rnd-thoughts
I actually implemented something along the lines of what you suggested but forgot to include it. In addition to the above gym.make_vec commands there is also:

  • gym.make_vec('Meta-World/custom-mt-envs-sync', envs_list=['env_name1-v3', 'env_name2-v3', ...]) # or custom-mt-envs-async
  • gym.make_vec('Meta-World/custom-ml-envs-sync', envs_list=['env_name1-v3', 'env_name2-v3', ...])# or custom-ml-envs-async

Both of the above commands gives the user control over the environments that they want to use in a multi-task or meta-RL setting, instead of the predefined ones.

I agree, it is a LOT of environments to add, but there are also lots of different use cases of MW environments. Some of them are single environments (ie 'Meta-World/reach-v3'), some of them are the smaller MT/ML environments (ie 'Meta-World/ML-train-reach-v3'), and some of them are the pre-defined environment sets (MT10, MT50, ML10, ML45).

@pseudo-rnd-thoughts
Copy link
Member

pseudo-rnd-thoughts commented Sep 18, 2024

@reginald-mclean In my opinion, I would work to keep the number of environments to a minimal.
Personally, I would only have gym.make for the individual environments, i.e., gym.make("Meta-World/reach-v3")
Alongside the generic MultiTask and MetaTask single and vector environments.
Then finally have the original gym.make("MetaWorld/MT50") to MT10, MT50, ML10, ML45

This reduces the mess you need to maintain and provides more opinions to the users on what they do.
If a user wants a custom multi-task setup with env x, y, z then they can make it without a close but no equivalent version existing within the 100s of environment that could be registered.

Environment parameters are your friend here, minimising what you need to maintain while adding flexibility to users

I say this as I remove over 800 environment from Atari, currently there are over 1000 environments registered for only 100 games. For ALE, there are 14 environments registered for each game which in my opinion is crazy and very few people actually use the extra / special registered environments which can be accessed through parameters.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants