Merge branch 'main' into hlin/deps

volcengine · Dec 17, 2024 · 78181a1 · 78181a1
2 parents 91cce97 + d60f843
commit 78181a1
Show file tree

Hide file tree

Showing 101 changed files with 4,383 additions and 276 deletions.
diff --git a/.github/workflows/ray_test.yml → .github/workflows/gpu_test.yml b/.github/workflows/ray_test.yml → .github/workflows/gpu_test.yml
@@ -18,15 +18,19 @@ on:
 
 jobs:
   ray:
-    runs-on: [self-hosted, gpu] # test if the enviroment is ready
+    runs-on: [self-hosted, gpu]
     steps:
       - uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
         with:
             fetch-depth: 0
       - name: Install the current repository
         run: |
-          pip install -e .
-      - name: Running some ray test that only need 2 GPUs
+          pip install -e .[test]
+      - name: Running dataset tests
+        run: |
+          [ ! -d "$HOME/verl-data" ] && git clone --depth 1 https://github.com/eric-haibin-lin/verl-data ~/verl-data
+          pytest -s -x tests/verl
+      - name: Running ray tests that need 2 GPUs
         run: |
           cd tests/ray
           pytest -s -x test_rvdz.py test_driverfunc_to_worker.py test_data_transfer.py test_colocated_workers.py test_check_worker_alive.py
diff --git a/.github/workflows/yapf_format.yml b/.github/workflows/yapf_format.yml
@@ -42,4 +42,4 @@ jobs:
           pip install toml==0.10.2
       - name: Running yapf
         run: |
-          yapf -r -vv -d --style=./.style.yapf verl tests single_controller examples
+          yapf -r -vv -d --style=./.style.yapf verl tests examples
diff --git a/README.md b/README.md
@@ -1,7 +1,3 @@
-<div align=center>
-  <img src="docs/_static/logo.png" width = "20%" height = "20%" />
-</div>
-
 <h1 style="text-align: center;">veRL: Volcano Engine Reinforcement Learning for LLM</h1>
 
 veRL (HybridFlow) is a flexible, efficient and industrial-level RL(HF) training framework designed for large language models (LLMs). veRL is the open-source version of [HybridFlow](https://arxiv.org/abs/2409.19256v2) paper.
@@ -29,66 +25,106 @@ veRL is fast with:
 <!-- <a href=""><b>Slides</b></a> | -->
 </p>
 
+## News
+
+- [2024/12] The team presented <a href="https://neurips.cc/Expo/Conferences/2024/workshop/100677">Post-training LLMs: From Algorithms to Infrastructure</a> at NeurIPS 2024.
+  - [Slides](https://github.com/eric-haibin-lin/verl-data/tree/neurips), [notebooks](https://lightning.ai/eric-haibin-lin/studios/verl-neurips~01je0d1benfjb9grmfjxqahvkn?view=public&section=featured), and [video](https://neurips.cc/Expo/Conferences/2024/workshop/100677) available.
+- [2024/08] HybridFlow (verl) is accepted to EuroSys 2025.
+
+## Installation Guide
+
+Below are the steps to install veRL in your environment.
+
+### Requirements
+- **Python**: Version >= 3.9
+- **CUDA**: Version >= 12.1
+
+veRL supports various backends. Currently, the following configurations are available:
+- **FSDP** and **Megatron-LM** for training.
+- **vLLM** for rollout generation.
+
+**Training backends**
+
+We recommend using **FSDP** backend to investigate, research and prototype different models, datasets and RL algorithms. The guide for using FSDP backend can be found in [PyTorch FSDP Backend](https://verl.readthedocs.io/en/latest/workers/fsdp_workers.html)
+
+For users who pursue better scalability, we recommend using **Megatron-LM** backend. Currently, we support Megatron-LM@core_v0.4.0 and we fix some internal issues of Megatron-LM. Here's the additional installation guide. The guide for using Megatron-LM backend can be found in [Megatron-LM Backend](https://verl.readthedocs.io/en/latest/workers/megatron_workers.html)
 
+### Installation Options
 
-## Installation
+#### 1. From Docker Image
 
-For installing the latest version of veRL, the best way is to clone and install it from source. Then you can modify our code to customize your own post-training jobs.
+We provide pre-built Docker images for quick setup.
+
+Image and tag: `verlai/verl:vemlp-th2.4.0-cu124-vllm0.6.3-ray2.10-te1.7-v0.0.3`
+
+1. Launch the desired Docker image:
 
 ```bash
-# install verl together with some lightweight dependencies in setup.py
-git clone https://github.com/volcengine/verl.git
-cd verl
-pip3 install -e .
+docker run --runtime=nvidia -it --rm --shm-size="10g" --cap-add=SYS_ADMIN -v <image:tag> 
+```
+
+2.	Inside the container, install veRL:
+
+```bash
+# install the nightly version
+git clone https://github.com/volcengine/verl && cd verl && pip3 install -e .
+# or install from pypi via `pip3 install verl`
 ```
 
-You can also install veRL using `pip3 install`
+4. Setup Megatron (optional)
+
+If you want to enable training with Megatron, Megatron code must be added to PYTHONPATH:
 
 ```bash
-# directly install from pypi
-pip3 install verl
+cd ..
+git clone -b core_v0.4.0 https://github.com/NVIDIA/Megatron-LM.git
+cp verl/patches/megatron_v4.patch Megatron-LM/
+cd Megatron-LM && git apply megatron_v4.patch
+pip3 install -e .
+export PYTHONPATH=$PYTHONPATH:$(pwd)
 ```
 
-### Dependencies
+You can also get the Megatron code after verl's patch via
+```bash
+git clone -b core_v0.4.0_verl https://github.com/eric-haibin-lin/Megatron-LM
+```
+
+#### 2. From Custom Environments
 
-veRL requires Python >= 3.9 and CUDA >= 12.1.
+<details><summary>If you prefer setting up veRL in your custom environment, expand this section and follow the steps below.</summary>
 
-veRL support various backend, we currently release FSDP and Megatron-LM for actor training and vLLM for rollout generation.
+Using **conda** is recommended for managing dependencies.
 
-To install the dependencies, we recommend using conda:
+1. Create a conda environment:
 
 ```bash
 conda create -n verl python==3.9
 conda activate verl
 ```
 
-The following dependencies are required for all backends.
+2. Install common dependencies (required for all backends)
 
 ```bash
 # install torch [or you can skip this step and let vllm to install the correct version for you]
-pip install torch==2.4.0 torchvision==0.19.0 torchaudio==2.4.0 --index-url https://download.pytorch.org/whl/cu121
+pip3 install torch==2.4.0 torchvision==0.19.0 torchaudio==2.4.0 --index-url https://download.pytorch.org/whl/cu121
 
 # install vllm
-pip3 install vllm==0.5.4
-pip3 install ray==2.10 # other version may have bug
+pip3 install vllm==0.6.3 # or you can install 0.5.4, 0.4.2 and 0.3.1
+pip3 install ray
 
 # flash attention 2
 pip3 install flash-attn --no-build-isolation
 ```
 
-**FSDP**
-
-We recommend using FSDP backend to investigate, research and prototype different models, datasets and RL algorithms.
-
-The pros, cons and extension guide for using FSDP backend can be found in [PyTorch FSDP Backend](https://verl.readthedocs.io/en/latest/workers/fsdp_workers.html)
+3. Install veRL
 
-**Megatron-LM**
-
-For users who pursue better scalability, we recommend using Megatron-LM backend. Please install the above dependencies first.
-
-Currently, we support Megatron-LM@core_v0.4.0 and we fix some internal issues of Megatron-LM. Here's the additional installation guide.
+```bash
+# install the nightly version
+git clone https://github.com/volcengine/verl && cd verl && pip3 install -e .
+# or install from pypi via `pip3 install verl`
+```
 
-The pros, cons and extension guide for using Megatron-LM backend can be found in [Megatron-LM Backend](https://verl.readthedocs.io/en/latest/workers/megatron_workers.html)
+4. Setup Megatron (optional)
 
 ```bash
 # FOR Megatron-LM Backend
@@ -103,13 +139,14 @@ pip3 install git+https://github.com/NVIDIA/[email protected]
 # megatron core v0.4.0
 cd ..
 git clone -b core_v0.4.0 https://github.com/NVIDIA/Megatron-LM.git
-cd Megatron-LM
-cp ../verl/patches/megatron_v4.patch .
-git apply megatron_v4.patch
+cp verl/patches/megatron_v4.patch Megatron-LM/
+cd Megatron-LM && git apply megatron_v4.patch
 pip3 install -e .
 export PYTHONPATH=$PYTHONPATH:$(pwd)
 ```
 
+</details>
+
 ## Getting Started
 Visit our [documentation](https://verl.readthedocs.io/en/latest/index.html) to learn more.
 
@@ -135,15 +172,20 @@ Visit our [documentation](https://verl.readthedocs.io/en/latest/index.html) to l
   - [Add models to Megatron-LM backend](https://verl.readthedocs.io/en/latest/advance/megatron_extension.html)
 
 
-## Contribution
+## Community and Contribution
+
+### Communication channel
+
+[Join us](https://join.slack.com/t/verlgroup/shared_invite/zt-2w5p9o4c3-yy0x2Q56s_VlGLsJ93A6vA) for discussions on slack!
+
 ### Code formatting
 We use yapf (Google style) to enforce strict code formatting when reviewing MRs. To reformat you code locally, make sure you installed `yapf`
 ```bash
 pip3 install yapf
 ```
 Then, make sure you are at top level of verl repo and run
 ```bash
-yapf -ir -vv --style ./.style.yapf verl single_controller examples
+yapf -ir -vv --style ./.style.yapf verl examples
 ```
 
 

diff --git a/docker/Dockerfile.ngc.vllm b/docker/Dockerfile.ngc.vllm
@@ -0,0 +1,31 @@
+FROM nvcr.io/nvidia/pytorch:24.05-py3
+
+# uninstall nv-pytorch fork
+RUN pip3 uninstall pytorch-quantization \
+     pytorch-triton \
+     torch \
+     torch-tensorrt \
+     torchvision \
+     xgboost transformer_engine flash_attn \
+     apex megatron-core -y
+
+RUN pip3 install torch==2.4.0 torchvision==0.19.0 torchaudio==2.4.0 --index-url https://download.pytorch.org/whl/cu124
+
+# make sure torch version is kept
+RUN pip3 install --no-cache-dir \
+    "torch==2.4.0" \
+    accelerate \
+    codetiming \
+    datasets \
+    dill \
+    hydra-core \
+    numpy \
+    pybind11 \
+    tensordict \
+    "transformers<=4.46.0"
+
+# ray is installed via vllm
+RUN pip3 install --no-cache-dir vllm==0.6.3
+
+# we choose flash-attn v2.7.0 or v2.7.2 which contain pre-built wheels
+RUN pip3 install --no-cache-dir --no-build-isolation flash-attn==2.7.0.post2
diff --git a/docker/Dockerfile.vemlp.vllm.te b/docker/Dockerfile.vemlp.vllm.te
@@ -0,0 +1,41 @@
+# docker buildx build --platform linux/x86_64 -t "verlai/verl:$TAG" -f docker/$FILE .
+
+# the one in docker.io is an alias for the one veturbo
+# FROM vemlp-cn-beijing.cr.volces.com/veturbo/pytorch:2.4-cu124
+FROM docker.io/haibinlin/verl:v0.0.5-th2.4.0-cu124-base
+
+# only config pip index with https://pypi.tuna.tsinghua.edu.cn/simple if needed
+# unset for now
+RUN pip3 config unset global.index-url
+
+# transformers 4.47.0 contains the following bug:
+# AttributeError: 'Gemma2Attention' object has no attribute '_flash_attn_uses_top_left_mask'
+RUN pip3 install --no-cache-dir \
+    torch==2.4.0 \
+    accelerate \
+    codetiming \
+    dill \
+    hydra-core \
+    numpy \
+    pybind11 \
+    tensordict \
+    "transformers <= 4.46.0"
+
+RUN pip3 install --no-cache-dir flash-attn==2.7.0.post2 --no-build-isolation
+
+# vllm depends on ray, and veRL does not support ray > 2.37
+RUN pip3 install --no-cache-dir vllm==0.6.3 ray==2.10
+
+# install apex
+RUN MAX_JOBS=4 pip3 install -v --disable-pip-version-check --no-cache-dir --no-build-isolation \
+    --config-settings "--build-option=--cpp_ext" --config-settings "--build-option=--cuda_ext" \
+    git+https://github.com/NVIDIA/apex
+
+# install Transformer Engine
+# - flash-attn pinned to 2.5.3 by TransformerEngine, switch to eric-haibin-lin/[email protected] to relax version req
+# - install with: MAX_JOBS=1 NINJA_FLAGS="-j1" TE_BUILD_WITH_NINJA=0 to avoid OOM
+# - cudnn is required by TransformerEngine
+# RUN CUDNN_PATH=/opt/conda/lib/python3.11/site-packages/nvidia/cudnn \
+#     pip3 install git+https://github.com/eric-haibin-lin/[email protected]
+RUN MAX_JOBS=1 NINJA_FLAGS="-j1" pip3 install flash-attn==2.5.3 --no-cache-dir --no-build-isolation
+RUN MAX_JOBS=1 NINJA_FLAGS="-j1" pip3 install git+https://github.com/NVIDIA/TransformerEngine.git@v1.7
diff --git a/docs/advance/dpo_extension.rst b/docs/advance/dpo_extension.rst
@@ -47,8 +47,8 @@ Implementation details:
 
 .. code:: python
 
-   from single_controller.base import Worker
-   from single_controller.ray import RayWorkerGroup, RayClassWithInitArgs, RayResourcePool
+   from verl.single_controller.base import Worker
+   from verl.single_controller.ray import RayWorkerGroup, RayClassWithInitArgs, RayResourcePool
    import ray
 
    @ray.remote
@@ -75,7 +75,7 @@ API: compute reference log probability
 
 .. code:: python
 
-   from single_controller.base import Worker
+   from verl.single_controller.base import Worker
    import ray
 
    @ray.remote
@@ -93,7 +93,7 @@ API: Update actor model parameters
 
 .. code:: python
 
-   from single_controller.base import Worker
+   from verl.single_controller.base import Worker
    import ray
 
    @ray.remote
@@ -184,7 +184,7 @@ registered into the worker_group**
 
 .. code:: python
 
-   from single_controller.base.decorator import register
+   from verl.single_controller.base.decorator import register
 
    def dispatch_data(worker_group, data):
        return data.chunk(worker_group.world_size)
@@ -214,11 +214,11 @@ computation, and data collection.
 
 Furthermore, the model parallelism size of each model is usually fixed,
 including dp, tp, pp. So for these common distributed scenarios, we have
-pre-implemented specific dispatch and collect methods,in `decorator.py <https://github.com/volcengine/verl/blob/main/single_controller/base/decorator.py>`_, which can be directly used to wrap the computations.
+pre-implemented specific dispatch and collect methods,in `decorator.py <https://github.com/volcengine/verl/blob/main/verl/single_controller/base/decorator.py>`_, which can be directly used to wrap the computations.
 
 .. code:: python
 
-   from single_controller.base.decorator import register, Dispatch
+   from verl.single_controller.base.decorator import register, Dispatch
 
    @register(dispatch_mode=Dispatch.DP_COMPUTE_PROTO)
    def generate_sequences(self, data: DataProto) -> DataProto:

diff --git a/docs/examples/config.rst b/docs/examples/config.rst
@@ -307,7 +307,7 @@ Trainer
      total_epochs: 30
      project_name: verl_examples
      experiment_name: gsm8k
-     logger: ['console', 'tracking']
+     logger: ['console', 'wandb']
      nnodes: 1
      n_gpus_per_node: 8
      save_freq: -1
@@ -319,8 +319,7 @@ Trainer
 - ``trainer.total_epochs``: Number of epochs in training.
 - ``trainer.project_name``: For wandb
 - ``trainer.experiment_name``: For wandb
-- ``trainer.logger``: Support console and tracking. For tracking, we
-  will initialize a wandb
+- ``trainer.logger``: Support console and wandb
 - ``trainer.nnodes``: Number of nodes used in the training.
 - ``trainer.n_gpus_per_node``: Number of GPUs per node.
 - ``trainer.save_freq``: The frequency (by iteration) to save checkpoint

diff --git a/docs/examples/gsm8k_example.rst b/docs/examples/gsm8k_example.rst
@@ -91,7 +91,7 @@ We also provide various training scripts for SFT on GSM8K dataset in `gsm8k sft
        trainer.project_name=gsm8k-sft \
        trainer.experiment_name=gsm8k-sft-deepseek-coder-6.7b-instruct \
        trainer.total_epochs=4 \
-       trainer.logger=['console','tracking']
+       trainer.logger=['console','wandb']
 
 Step 4: Perform PPO training with your model on GSM8K Dataset
 -------------------------------------------------------------
@@ -156,7 +156,7 @@ The script of run_deepseek7b_llm.sh
        critic.model.fsdp_config.optimizer_offload=False \
        algorithm.kl_ctrl.kl_coef=0.001 \
        trainer.critic_warmup=0 \
-       trainer.logger=['console','tracking'] \
+       trainer.logger=['console','wandb'] \
        trainer.project_name='verl_example_gsm8k' \
        trainer.experiment_name='deepseek_llm_7b_function_rm' \
        trainer.n_gpus_per_node=8 \

diff --git a/docs/examples/ppo_code_architecture.rst b/docs/examples/ppo_code_architecture.rst
@@ -49,13 +49,13 @@ Define worker classes
    if config.actor_rollout_ref.actor.strategy == 'fsdp': # for FSDP backend
        assert config.actor_rollout_ref.actor.strategy == config.critic.strategy
        from verl.trainer.ppo.workers.fsdp_workers import ActorRolloutRefWorker, CriticWorker
-       from single_controller.ray import RayWorkerGroup
+       from verl.single_controller.ray import RayWorkerGroup
        ray_worker_group_cls = RayWorkerGroup
 
    elif config.actor_rollout_ref.actor.strategy == 'megatron': # for Megatron backend
        assert config.actor_rollout_ref.actor.strategy == config.critic.strategy
        from verl.trainer.ppo.workers.megatron_workers import ActorRolloutRefWorker, CriticWorker
-       from single_controller.ray.megatron import NVMegatronRayWorkerGroup
+       from verl.single_controller.ray.megatron import NVMegatronRayWorkerGroup
        ray_worker_group_cls = NVMegatronRayWorkerGroup # Ray worker class for Megatron-LM
 
    else: