-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Vla training #10
base: master
Are you sure you want to change the base?
Vla training #10
Conversation
…t VLA inference on never-seen-before backgrounds work (when it didn't before)
oops sorry |
No worries! I'm starting to eliminate my competitive programming habits and making sure that when I contribute to repos, I'm being more mindful |
Also - I think this vla stuff is independent of if the sim we choose to use and also if we want to adapt it to real life in the future, it only needs access to image and next best move. Maybe not putting it in stompy_live is better? I also got rid of the env_norm stuff, can put that in a separarte repo too |
In this pull request, we fine-tune Open-Source Vision-Language-Action Model - OpenVLA to give Stompy to ability to find the optimal next move based on a
and
OpenVLA is not zero-shot. It needs to be fine-tuned for each new (environment, task).
Data processing scripts take in a directory of json files, each containing the optimal steps for a single episode, collates all (current_image, optimal_next_step) tuples into a .h5 dataset. Right now this data is generated via PPO, which has access to all variables in sim.
datasets.py loads the .h5 file into a custom PyTorch dataset, which is wrapped in dataloader and served with custom batch size, image transformations, tokenizer, etc in finetune.py
finetune.py further has configuration options for: learning_rate, use_lora, lora configs, num_epochs, pretrained_model_path, ....
Training Observations and Logs:
Our first approach didn't work because we only did a single camera angle, we realized that we needed to capture a larger distribution to have to model actually do that, so we randomized camera angles and cube and target location
Overfit run - gets around 30% action accuracy on new, randomized push cube task.after a lot of tweaking and tricks we got it to 30% action accuracy in validation, which is decent because the model has to pick the right action out of # discrete tokens ^ # deg of freedom = 256 ^ 7 options for it to be considered correct
But when we tested it it still doesn't work - it works at the start, but the moment the arms "tweaks" / makes a non-optimal move, it goes out of distribution because the training data only ever contains optimal trajectories
so now we're training on more data to get the arm to learn error correction