VLAGen: Automated Data Collection for Generalizing Robotic Policies

This repository contains the code for VLAGen, a simulation-based data generation and filtering pipeline designed to autonomously generalize Vision-Language-Action (VLA) models to new objects.

VLAGen addresses the limitations of existing VLA models by generating diverse, high-quality training trajectories all in simulation (by using high temperature during inference). Then, the trajectories are evaluated with a vision-language model (GPT-4V) and filtered by removing low-action time steps to mitigate "catastrophic idling." The generated data is then used to fine-tune or preference-optimize OpenVLA.

This repository is a fork of OpenVLA. This project incorporates code from SimplerEnv-OpenVLA, available at: https://github.com/DelinQu/SimplerEnv-OpenVLA

Features

Automated Data Generation: Generates robotic manipulation trajectories in SIMPLER/SAPIEN simulation using OpenVLA (model deployed with high temperature settings to generate diverse trajectories).
GPT-4V Trajectory Scoring: Employs GPT-4V to automatically score trajectories based on their success in completing the task.

(Above)Data pipeline generates and ranks the trajectories for picking a Fanta can (out-of-distribution) with distractors in the background.

Magnitude-Based Filtering: Filters out low-action trajectories to mitigate catastrophic idling behavior.
Fine-tuning and Preference Optimization: Supports both fine-tuning and preference optimization (using KTO) of OpenVLA using the generated data. KTO for reference: https://arxiv.org/abs/2402.01306
SIMPLER Environment Integration: Leverages the SIMPLER benchmark for real-to-sim evaluation and data generation.
Scalable and Efficient: Provides a scalable and efficient solution for training robotic models without relying on extensive human-collected datasets.

Usage

This project uses bash scripts for data generation and evaluation located in the scripts_run_eval directory. These scripts interact with the simpler_env environment and the openvla model. Specific scripts are provided for various tasks and variations (e.g., openvla_drawer_variant_agg.sh, openvla_move_near_visual_matching.sh). Refer to the individual script descriptions for detailed usage instructions and parameters.

Data Generation: To generate training data, use the bash scripts passing in --policy-model openvla_generate_data as an argument. This will leverage OpenVLA to generate trajectories that are then scored by GPT-4V.

Model Fine-tuning: The vla-scripts directory contains Python scripts for fine-tuning and training OpenVLA models (finetune.py, finetune_KTO.py, train.py). These scripts can be used to fine-tune OpenVLA models using the data generated by the bash scripts.

Model Evaluation: The scripts_run_eval directory contains bash scripts for evaluating the performance of OpenVLA models on various manipulation tasks. These scripts control the SIMPLER environment, run the OpenVLA policy, and log the results.

Installation

Clone the repository:

git clone <repository_url>
cd <repository_name>

Install ManiSkill2 real-to-sim environments:
```
cd ManiSkill2_real2sim
pip install -e .
```
Install OpenVLA requirements: Refer to the README.md for complete installation instructions (this might involve installing PyTorch, transformers, and other dependencies). Note that specific version constraints might be necessary to ensure compatibility.
Obtain an OpenAI API key: An OpenAI key is required for GPT-4V scoring.

Technologies Used

Python: The primary programming language for the project.
PyTorch: Deep learning framework for model training and inference.
Transformers (Hugging Face): Library for loading and utilizing pre-trained OpenVLA models.
PEFT (Hugging Face): Library for parameter-efficient fine-tuning, enabling LoRA (Low-Rank Adaptation).
BitsAndBytes: Enables 4-bit quantization of the OpenVLA model for memory-efficient fine-tuning.
SIMPLER: A simulation benchmark used for generating and evaluating robotic manipulation policies.
OpenVLA: A pre-trained vision-language-action model. https://openvla.github.io/
GPT-4V: A vision-language model used to score the generated trajectories.
Draccus: A Python library for configuration management and structured data handling.
Make: Used for streamlining common development tasks.
Bash: Used to orchestrate the data generation and evaluation processes.

Dependencies

The project's dependencies are specified in pyproject.toml. Use pip install -e . to install all required packages from this file.

Contributing

Contributions are welcome! Please open an issue or submit a pull request.

License

MIT License. See the LICENSE file for details.

Name		Name	Last commit message	Last commit date
Latest commit History 84 Commits
ManiSkill2_real2sim		ManiSkill2_real2sim
experiments/robot		experiments/robot
prismatic		prismatic
scripts		scripts
scripts_run_eval		scripts_run_eval
simpler_env		simpler_env
tools		tools
vla-scripts		vla-scripts
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
finetune.py		finetune.py
finetune_KTO.py		finetune_KTO.py
labeldiagram.png		labeldiagram.png
pyproject.toml		pyproject.toml
requirements-min.txt		requirements-min.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

VLAGen: Automated Data Collection for Generalizing Robotic Policies

Features

Usage

Installation

Technologies Used

Dependencies

Contributing

License

About

Releases

Packages

Languages

License

pl909/VLAGen

Folders and files

Latest commit

History

Repository files navigation

VLAGen: Automated Data Collection for Generalizing Robotic Policies

Features

Usage

Installation

Technologies Used

Dependencies

Contributing

License

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages