Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Roadmap #388

Closed
5 tasks
pseudo-rnd-thoughts opened this issue Jan 7, 2023 · 4 comments
Closed
5 tasks

Roadmap #388

pseudo-rnd-thoughts opened this issue Jan 7, 2023 · 4 comments

Comments

@pseudo-rnd-thoughts
Copy link
Member

pseudo-rnd-thoughts commented Jan 7, 2023

I think the easiest solution is to consider these changes to be for a 1.0.0 release

  • Change all environments to use mujoco (and the gymnasium MujocoEnv)
  • Change to use Gymnasium rather than Gym
  • Add pypi releases
  • Fix the CI testing
  • Add environments to gymnasium.make

Proposed idea for environments

These ideas might be wrong but I would be interested in feedback on them

For the multi-task environments, we can consider there existing two "types" of environments.

  1. A task environment where the agent has to solve a task associated with just that environment. On reset, the environment is reset for just this environment.
  2. A *multi-*task environment that contains many task environments where the agent has to solve any of the sub-tasks. On reset, the multi-task environment will randomly select one of the task environments and use this task for the episode until the next reset.

I would propose that users use the multi-task environment from make, i.e., gymnasium.make("metaworld/multi-task-50"). Additionally, I think we should include the one-hot vector info either inside the multi-task environment that can be enabled or as a wrapper within metaworld such that users don't need to copy the previous implementation (this can allow the wrapper to be updated as API is).
This multi-task environment should be a very generic environment such that custom task environments can be used with it.

For meta-environments, we can consider there existing two "modes" of environments or even two different environments.

  1. In "training" mode, the environment has variance in the goal space but it always has the same goal type.
  2. In "testing" mode, the environment has a new goal type that is different

Similar to multi-task I think we can use make again with a parameter to specify the environment's mode, gymnasium.make("metaworld/meta-1", mode="training"). Then this environment can be passed any training library returning the trained agent. Then we can have an evaluate_meta_agent function that takes the environment and a function for the evaluating the agent policy given an observation and info.

@krzentner
Copy link
Contributor

There's a lot to unpack in that roadmap, and I definitely think we should take things step by step.

  1. Fixing the CI should be relatively easy. It was working before it was moved to Farama, it's just broken now because it depends on some secrets that aren't set up yet.
  2. Moving to the mujoco package would be a massive user improvement, and something I've been already looking at doing.
  3. PyPI releases would be good to have as well. There's already a package, but it's never been updated.
  4. Currently, Meta-World doesn't really use that much of gym either (the environments are not present in gym.make). It makes sense to move it to gymnasium. However, we should probably make sure that garage has gymnasium support before we do that change, otherwise no one doing Meta-RL or multi-task RL will want to use the new version.
  5. For a set of environments to add to gymnasium.make, I think it would be best to add all of the environments listed in metaworld.envs.ALL_V2_ENVIRONMENTS_GOAL_OBSERVABLE and metaworld.envs.ALL_V2_ENVIRONMENTS_GOAL_HIDDEN (e.g. gymnasium.make('metaworld/pick-place-v2-goal-hidden')). We should probably set the seeded_rand_vec flag by default, which samples new goals on reset.

The rest of the proposed roadmap I'm not so confident in. For example, some multi-task RL algorithms (e.g. task embeddings) require the one-hot vector to not be present, so adding it by default seems unwise. There's also the complication that the meaning of "task" in MT1 and the meaning of "task" in MT50 is quite different, and I don't understand how you're proposing to handle those differences.

I very much think that it's a bit too early to be proposing entirely new ways of using the Meta-World benchmark, as the last part of this roadmap describes. There's already an evaluation protocol for Meta-World, and proposing a new one without clarity on how it improves on the previous one is likely to cause confusion.

@pseudo-rnd-thoughts
Copy link
Member Author

Thanks for the feedback.

At least in my head, I would have thought we could create a super class for Meta and Multi-task envs.
This might not be possible but I think it would be a very helpful thing for the field to encourage other environment implementation and training algorithms that work for these environments

@reginald-mclean
Copy link
Collaborator

KR could you provide us with some insight into the correct evaluation protocol of Meta-World? For multi-task learning algorithms the evaluation should be on the training tasks, and for meta-learning algorithms the evaluation is on the held out test tasks? Both of these evaluations repeated a number of times to generate a success rate.

Could you also provide us with the distinction between task in MT1 vs task in MT50? The way that I understood it was that each environment is a task, and then each environment has 50 different initializations to simulate the initial state distribution of a singular environment. Therefore MT1 is a singular task, and MT50 contains 50 tasks, each with 50 different initializations per task.

I agree that enabling the seeded_rand_vec flag by default is a good idea. Better to have 1 environment per task where different initializations can be sampled by doing env.reset() rather than 50 environments per task where an initialization must be sampled from somewhere else.

@krzentner
Copy link
Contributor

krzentner commented Mar 22, 2023

I wrote a description about tasks, and how Meta-RL and Multi-Task RL differ in #393, that hopefully is more clear than prior explanations. Feel free to ask any followup questions you have.

The Meta-RL evaluation procedure is basically the following:

for task in test_tasks:
    policy = meta_algo.get_policy()
    for adaptation_steps in range(n_adaptation_steps):
        data = rollout(policy, task)
        policy = meta_algo.adapt_policy(policy, data)
    task_performance = evaluate(policy)

The implementation used in the original Meta-World paper is here. However, it only supports one adaptation step, which is fine for RL^2 and PEARL, but means that MAML only runs one gradient step.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants