You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi, this is a nice project for hybrid action space, and I see you mentioned PDQN/HPPO in README.md. Do you have some experiment results about these algorithms in this environment. If not, we want to invite you to implement related algorithms and benchmarks in our repo DI-engine together, we will offer corresponding supports for you. Do you have will to construct a hybrid action space RL benchmark? Other comments are also welcome.
The text was updated successfully, but these errors were encountered:
Thank you very much for your feedback!
Unfortunately these days I am very busy and I cannot take care of it.
I did implement P-QLearning in my q-learning-algorithms in the past, I do not remember if it converged or the score.
Note: Algorithms are now using architectures that needs to know the which parameters are related to which action (e.g. MP-DQN). I think it may be better to change the way to handle the observation space. I am not completely sure yet what is the best way to do it. Even though it would definitely future-proof the repository, it would also break any agent that used this env... gym-platform is using one tuple of space per parameter-action pair, didn't test how inconvenient it is to have empty tuple (e.g. for breaking).
Hi, this is a nice project for hybrid action space, and I see you mentioned PDQN/HPPO in
README.md
. Do you have some experiment results about these algorithms in this environment. If not, we want to invite you to implement related algorithms and benchmarks in our repo DI-engine together, we will offer corresponding supports for you. Do you have will to construct a hybrid action space RL benchmark? Other comments are also welcome.The text was updated successfully, but these errors were encountered: