-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
More Sample Efficient MAL #41
Comments
Yannicks comments
|
TODO try:
|
For completeness, here's pseudocode of what I'm proposing. // follower pretraining // leader training FAQ:
|
Our improved MAL solutions currently take lots of samples but implementing model-learning as RL with a complex NN describing the agents "policy" might not be necessary, when essentially all we want to do is find a query-/context-vector that is close to the actual environment and gives a policy that performs well in the real env.
Bc our S and A are discrete, we could simply have a model of the environment that is continually extended (ie SxA->S is just the mean of all transitions we observed, similar to how it's done in DynaQ [see comparison to literature]), which might be more sample efficient.
Other approaches are also feasible.
It's worth always paying attention to what properties of Gerstgrasser we'd give up (ie. leader should see the query responses but get 0 reward for it will most likely be violated => not guarantees to converge to SE anymore)
The text was updated successfully, but these errors were encountered: