Minimal training and inference code for making a humanoid robot stand up.
export DISPLAY=:0
if in a headless environmnet- Run
train.py
!
- Implement simple MJX environment using Unitree G1 simulation artifacts, similar to this
- Implement simple PPO policy to try to make the robot stand up
- Parallelize using JAX
- Low standard deviation for "overfitting test" does not work very well for PPO because need to converge upon actual solution. Cannot get to actual solution if do not explore a little. With that in mind, I think the model is generally getting too comfortable tricking the reward system by just losing as fast as posisble so doesn't have to deal with penalty
- Theory that model just trying to lose (to retain
is_healthy
while not losing as much height. It wants to fall as quick as possible so can reset) can be tested by adding "wait" and removing the mask. This effectively reduces the fact that reset will work. While the model is stuck in the failed state, it still is unhealthy and therefore loses out on reward.
- The goal for this repository is to provide a super minimal implementation of a PPO policy for making a humanoid robot stand up, with only three files:
environment.py
defines a class for interacting with the environmenttrain.py
defines the core training loopinfer.py
generates a video clip of a trained model controlling a robot