Skip to content
/ VLAGen Public
forked from openvla/openvla

VLAGen: Automated Data Collection for Generalizing Robotic Policies

License

Notifications You must be signed in to change notification settings

pl909/VLAGen

 
 

Repository files navigation

VLAGen: Automated Data Collection for Generalizing Robotic Policies

This repository contains the code for VLAGen, a simulation-based data generation and filtering pipeline designed to autonomously generalize Vision-Language-Action (VLA) models to new objects.

VLAGen addresses the limitations of existing VLA models by generating diverse, high-quality training trajectories all in simulation (by using high temperature during inference). Then, the trajectories are evaluated with a vision-language model (GPT-4V) and filtered by removing low-action time steps to mitigate "catastrophic idling." The generated data is then used to fine-tune or preference-optimize OpenVLA.

This repository is a fork of OpenVLA. This project incorporates code from SimplerEnv-OpenVLA, available at: https://github.com/DelinQu/SimplerEnv-OpenVLA

Licensed under the MIT License (c) 2024 simpler-env.

Features

  • Automated Data Generation: Generates robotic manipulation trajectories in SIMPLER/SAPIEN simulation using OpenVLA (model deployed with high temperature settings to generate diverse trajectories).
  • GPT-4V Trajectory Scoring: Employs GPT-4V to automatically score trajectories based on their success in completing the task.

alt text

(Above)Data pipeline generates and ranks the trajectories for picking a Fanta can (out-of-distribution) with distractors in the background.

  • Magnitude-Based Filtering: Filters out low-action trajectories to mitigate catastrophic idling behavior.
  • Fine-tuning and Preference Optimization: Supports both fine-tuning and preference optimization (using KTO) of OpenVLA using the generated data. KTO for reference: https://arxiv.org/abs/2402.01306
  • SIMPLER Environment Integration: Leverages the SIMPLER benchmark for real-to-sim evaluation and data generation.
  • Scalable and Efficient: Provides a scalable and efficient solution for training robotic models without relying on extensive human-collected datasets.

Usage

This project uses bash scripts for data generation and evaluation located in the scripts_run_eval directory. These scripts interact with the simpler_env environment and the openvla model. Specific scripts are provided for various tasks and variations (e.g., openvla_drawer_variant_agg.sh, openvla_move_near_visual_matching.sh). Refer to the individual script descriptions for detailed usage instructions and parameters.

Data Generation: To generate training data, use the bash scripts passing in --policy-model openvla_generate_data as an argument. This will leverage OpenVLA to generate trajectories that are then scored by GPT-4V.

Model Fine-tuning: The vla-scripts directory contains Python scripts for fine-tuning and training OpenVLA models (finetune.py, finetune_KTO.py, train.py). These scripts can be used to fine-tune OpenVLA models using the data generated by the bash scripts.

Model Evaluation: The scripts_run_eval directory contains bash scripts for evaluating the performance of OpenVLA models on various manipulation tasks. These scripts control the SIMPLER environment, run the OpenVLA policy, and log the results.

Installation

  1. Clone the repository:
    git clone <repository_url>
    cd <repository_name>
  2. Install ManiSkill2 real-to-sim environments:
    cd ManiSkill2_real2sim
    pip install -e .
  3. Install OpenVLA requirements: Refer to the README.md for complete installation instructions (this might involve installing PyTorch, transformers, and other dependencies). Note that specific version constraints might be necessary to ensure compatibility.
  4. Obtain an OpenAI API key: An OpenAI key is required for GPT-4V scoring.

Technologies Used

  • Python: The primary programming language for the project.
  • PyTorch: Deep learning framework for model training and inference.
  • Transformers (Hugging Face): Library for loading and utilizing pre-trained OpenVLA models.
  • PEFT (Hugging Face): Library for parameter-efficient fine-tuning, enabling LoRA (Low-Rank Adaptation).
  • BitsAndBytes: Enables 4-bit quantization of the OpenVLA model for memory-efficient fine-tuning.
  • SIMPLER: A simulation benchmark used for generating and evaluating robotic manipulation policies.
  • OpenVLA: A pre-trained vision-language-action model. https://openvla.github.io/
  • GPT-4V: A vision-language model used to score the generated trajectories.
  • Draccus: A Python library for configuration management and structured data handling.
  • Make: Used for streamlining common development tasks.
  • Bash: Used to orchestrate the data generation and evaluation processes.

Dependencies

The project's dependencies are specified in pyproject.toml. Use pip install -e . to install all required packages from this file.

Contributing

Contributions are welcome! Please open an issue or submit a pull request.

License

MIT License. See the LICENSE file for details.

About

VLAGen: Automated Data Collection for Generalizing Robotic Policies

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 91.4%
  • Shell 8.6%