Skip to content

Commit

Permalink
Merge branch 'main' into hlin/deps
Browse files Browse the repository at this point in the history
  • Loading branch information
eric-haibin-lin authored Dec 17, 2024
2 parents 91cce97 + d60f843 commit 78181a1
Show file tree
Hide file tree
Showing 101 changed files with 4,383 additions and 276 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -18,15 +18,19 @@ on:

jobs:
ray:
runs-on: [self-hosted, gpu] # test if the enviroment is ready
runs-on: [self-hosted, gpu]
steps:
- uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
with:
fetch-depth: 0
- name: Install the current repository
run: |
pip install -e .
- name: Running some ray test that only need 2 GPUs
pip install -e .[test]
- name: Running dataset tests
run: |
[ ! -d "$HOME/verl-data" ] && git clone --depth 1 https://github.com/eric-haibin-lin/verl-data ~/verl-data
pytest -s -x tests/verl
- name: Running ray tests that need 2 GPUs
run: |
cd tests/ray
pytest -s -x test_rvdz.py test_driverfunc_to_worker.py test_data_transfer.py test_colocated_workers.py test_check_worker_alive.py
2 changes: 1 addition & 1 deletion .github/workflows/yapf_format.yml
Original file line number Diff line number Diff line change
Expand Up @@ -42,4 +42,4 @@ jobs:
pip install toml==0.10.2
- name: Running yapf
run: |
yapf -r -vv -d --style=./.style.yapf verl tests single_controller examples
yapf -r -vv -d --style=./.style.yapf verl tests examples
116 changes: 79 additions & 37 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,3 @@
<div align=center>
<img src="docs/_static/logo.png" width = "20%" height = "20%" />
</div>

<h1 style="text-align: center;">veRL: Volcano Engine Reinforcement Learning for LLM</h1>

veRL (HybridFlow) is a flexible, efficient and industrial-level RL(HF) training framework designed for large language models (LLMs). veRL is the open-source version of [HybridFlow](https://arxiv.org/abs/2409.19256v2) paper.
Expand Down Expand Up @@ -29,66 +25,106 @@ veRL is fast with:
<!-- <a href=""><b>Slides</b></a> | -->
</p>

## News

- [2024/12] The team presented <a href="https://neurips.cc/Expo/Conferences/2024/workshop/100677">Post-training LLMs: From Algorithms to Infrastructure</a> at NeurIPS 2024.
- [Slides](https://github.com/eric-haibin-lin/verl-data/tree/neurips), [notebooks](https://lightning.ai/eric-haibin-lin/studios/verl-neurips~01je0d1benfjb9grmfjxqahvkn?view=public&section=featured), and [video](https://neurips.cc/Expo/Conferences/2024/workshop/100677) available.
- [2024/08] HybridFlow (verl) is accepted to EuroSys 2025.

## Installation Guide

Below are the steps to install veRL in your environment.

### Requirements
- **Python**: Version >= 3.9
- **CUDA**: Version >= 12.1

veRL supports various backends. Currently, the following configurations are available:
- **FSDP** and **Megatron-LM** for training.
- **vLLM** for rollout generation.

**Training backends**

We recommend using **FSDP** backend to investigate, research and prototype different models, datasets and RL algorithms. The guide for using FSDP backend can be found in [PyTorch FSDP Backend](https://verl.readthedocs.io/en/latest/workers/fsdp_workers.html)

For users who pursue better scalability, we recommend using **Megatron-LM** backend. Currently, we support Megatron-LM@core_v0.4.0 and we fix some internal issues of Megatron-LM. Here's the additional installation guide. The guide for using Megatron-LM backend can be found in [Megatron-LM Backend](https://verl.readthedocs.io/en/latest/workers/megatron_workers.html)

### Installation Options

## Installation
#### 1. From Docker Image

For installing the latest version of veRL, the best way is to clone and install it from source. Then you can modify our code to customize your own post-training jobs.
We provide pre-built Docker images for quick setup.

Image and tag: `verlai/verl:vemlp-th2.4.0-cu124-vllm0.6.3-ray2.10-te1.7-v0.0.3`

1. Launch the desired Docker image:

```bash
# install verl together with some lightweight dependencies in setup.py
git clone https://github.com/volcengine/verl.git
cd verl
pip3 install -e .
docker run --runtime=nvidia -it --rm --shm-size="10g" --cap-add=SYS_ADMIN -v <image:tag>
```

2. Inside the container, install veRL:

```bash
# install the nightly version
git clone https://github.com/volcengine/verl && cd verl && pip3 install -e .
# or install from pypi via `pip3 install verl`
```

You can also install veRL using `pip3 install`
4. Setup Megatron (optional)

If you want to enable training with Megatron, Megatron code must be added to PYTHONPATH:

```bash
# directly install from pypi
pip3 install verl
cd ..
git clone -b core_v0.4.0 https://github.com/NVIDIA/Megatron-LM.git
cp verl/patches/megatron_v4.patch Megatron-LM/
cd Megatron-LM && git apply megatron_v4.patch
pip3 install -e .
export PYTHONPATH=$PYTHONPATH:$(pwd)
```

### Dependencies
You can also get the Megatron code after verl's patch via
```bash
git clone -b core_v0.4.0_verl https://github.com/eric-haibin-lin/Megatron-LM
```

#### 2. From Custom Environments

veRL requires Python >= 3.9 and CUDA >= 12.1.
<details><summary>If you prefer setting up veRL in your custom environment, expand this section and follow the steps below.</summary>

veRL support various backend, we currently release FSDP and Megatron-LM for actor training and vLLM for rollout generation.
Using **conda** is recommended for managing dependencies.

To install the dependencies, we recommend using conda:
1. Create a conda environment:

```bash
conda create -n verl python==3.9
conda activate verl
```

The following dependencies are required for all backends.
2. Install common dependencies (required for all backends)

```bash
# install torch [or you can skip this step and let vllm to install the correct version for you]
pip install torch==2.4.0 torchvision==0.19.0 torchaudio==2.4.0 --index-url https://download.pytorch.org/whl/cu121
pip3 install torch==2.4.0 torchvision==0.19.0 torchaudio==2.4.0 --index-url https://download.pytorch.org/whl/cu121

# install vllm
pip3 install vllm==0.5.4
pip3 install ray==2.10 # other version may have bug
pip3 install vllm==0.6.3 # or you can install 0.5.4, 0.4.2 and 0.3.1
pip3 install ray

# flash attention 2
pip3 install flash-attn --no-build-isolation
```

**FSDP**

We recommend using FSDP backend to investigate, research and prototype different models, datasets and RL algorithms.

The pros, cons and extension guide for using FSDP backend can be found in [PyTorch FSDP Backend](https://verl.readthedocs.io/en/latest/workers/fsdp_workers.html)
3. Install veRL

**Megatron-LM**

For users who pursue better scalability, we recommend using Megatron-LM backend. Please install the above dependencies first.

Currently, we support Megatron-LM@core_v0.4.0 and we fix some internal issues of Megatron-LM. Here's the additional installation guide.
```bash
# install the nightly version
git clone https://github.com/volcengine/verl && cd verl && pip3 install -e .
# or install from pypi via `pip3 install verl`
```

The pros, cons and extension guide for using Megatron-LM backend can be found in [Megatron-LM Backend](https://verl.readthedocs.io/en/latest/workers/megatron_workers.html)
4. Setup Megatron (optional)

```bash
# FOR Megatron-LM Backend
Expand All @@ -103,13 +139,14 @@ pip3 install git+https://github.com/NVIDIA/[email protected]
# megatron core v0.4.0
cd ..
git clone -b core_v0.4.0 https://github.com/NVIDIA/Megatron-LM.git
cd Megatron-LM
cp ../verl/patches/megatron_v4.patch .
git apply megatron_v4.patch
cp verl/patches/megatron_v4.patch Megatron-LM/
cd Megatron-LM && git apply megatron_v4.patch
pip3 install -e .
export PYTHONPATH=$PYTHONPATH:$(pwd)
```

</details>

## Getting Started
Visit our [documentation](https://verl.readthedocs.io/en/latest/index.html) to learn more.

Expand All @@ -135,15 +172,20 @@ Visit our [documentation](https://verl.readthedocs.io/en/latest/index.html) to l
- [Add models to Megatron-LM backend](https://verl.readthedocs.io/en/latest/advance/megatron_extension.html)


## Contribution
## Community and Contribution

### Communication channel

[Join us](https://join.slack.com/t/verlgroup/shared_invite/zt-2w5p9o4c3-yy0x2Q56s_VlGLsJ93A6vA) for discussions on slack!

### Code formatting
We use yapf (Google style) to enforce strict code formatting when reviewing MRs. To reformat you code locally, make sure you installed `yapf`
```bash
pip3 install yapf
```
Then, make sure you are at top level of verl repo and run
```bash
yapf -ir -vv --style ./.style.yapf verl single_controller examples
yapf -ir -vv --style ./.style.yapf verl examples
```


Expand Down
31 changes: 31 additions & 0 deletions docker/Dockerfile.ngc.vllm
Original file line number Diff line number Diff line change
@@ -0,0 +1,31 @@
FROM nvcr.io/nvidia/pytorch:24.05-py3

# uninstall nv-pytorch fork
RUN pip3 uninstall pytorch-quantization \
pytorch-triton \
torch \
torch-tensorrt \
torchvision \
xgboost transformer_engine flash_attn \
apex megatron-core -y

RUN pip3 install torch==2.4.0 torchvision==0.19.0 torchaudio==2.4.0 --index-url https://download.pytorch.org/whl/cu124

# make sure torch version is kept
RUN pip3 install --no-cache-dir \
"torch==2.4.0" \
accelerate \
codetiming \
datasets \
dill \
hydra-core \
numpy \
pybind11 \
tensordict \
"transformers<=4.46.0"

# ray is installed via vllm
RUN pip3 install --no-cache-dir vllm==0.6.3

# we choose flash-attn v2.7.0 or v2.7.2 which contain pre-built wheels
RUN pip3 install --no-cache-dir --no-build-isolation flash-attn==2.7.0.post2
41 changes: 41 additions & 0 deletions docker/Dockerfile.vemlp.vllm.te
Original file line number Diff line number Diff line change
@@ -0,0 +1,41 @@
# docker buildx build --platform linux/x86_64 -t "verlai/verl:$TAG" -f docker/$FILE .

# the one in docker.io is an alias for the one veturbo
# FROM vemlp-cn-beijing.cr.volces.com/veturbo/pytorch:2.4-cu124
FROM docker.io/haibinlin/verl:v0.0.5-th2.4.0-cu124-base

# only config pip index with https://pypi.tuna.tsinghua.edu.cn/simple if needed
# unset for now
RUN pip3 config unset global.index-url

# transformers 4.47.0 contains the following bug:
# AttributeError: 'Gemma2Attention' object has no attribute '_flash_attn_uses_top_left_mask'
RUN pip3 install --no-cache-dir \
torch==2.4.0 \
accelerate \
codetiming \
dill \
hydra-core \
numpy \
pybind11 \
tensordict \
"transformers <= 4.46.0"

RUN pip3 install --no-cache-dir flash-attn==2.7.0.post2 --no-build-isolation

# vllm depends on ray, and veRL does not support ray > 2.37
RUN pip3 install --no-cache-dir vllm==0.6.3 ray==2.10

# install apex
RUN MAX_JOBS=4 pip3 install -v --disable-pip-version-check --no-cache-dir --no-build-isolation \
--config-settings "--build-option=--cpp_ext" --config-settings "--build-option=--cuda_ext" \
git+https://github.com/NVIDIA/apex

# install Transformer Engine
# - flash-attn pinned to 2.5.3 by TransformerEngine, switch to eric-haibin-lin/[email protected] to relax version req
# - install with: MAX_JOBS=1 NINJA_FLAGS="-j1" TE_BUILD_WITH_NINJA=0 to avoid OOM
# - cudnn is required by TransformerEngine
# RUN CUDNN_PATH=/opt/conda/lib/python3.11/site-packages/nvidia/cudnn \
# pip3 install git+https://github.com/eric-haibin-lin/[email protected]
RUN MAX_JOBS=1 NINJA_FLAGS="-j1" pip3 install flash-attn==2.5.3 --no-cache-dir --no-build-isolation
RUN MAX_JOBS=1 NINJA_FLAGS="-j1" pip3 install git+https://github.com/NVIDIA/TransformerEngine.git@v1.7
14 changes: 7 additions & 7 deletions docs/advance/dpo_extension.rst
Original file line number Diff line number Diff line change
Expand Up @@ -47,8 +47,8 @@ Implementation details:

.. code:: python
from single_controller.base import Worker
from single_controller.ray import RayWorkerGroup, RayClassWithInitArgs, RayResourcePool
from verl.single_controller.base import Worker
from verl.single_controller.ray import RayWorkerGroup, RayClassWithInitArgs, RayResourcePool
import ray
@ray.remote
Expand All @@ -75,7 +75,7 @@ API: compute reference log probability

.. code:: python
from single_controller.base import Worker
from verl.single_controller.base import Worker
import ray
@ray.remote
Expand All @@ -93,7 +93,7 @@ API: Update actor model parameters

.. code:: python
from single_controller.base import Worker
from verl.single_controller.base import Worker
import ray
@ray.remote
Expand Down Expand Up @@ -184,7 +184,7 @@ registered into the worker_group**

.. code:: python
from single_controller.base.decorator import register
from verl.single_controller.base.decorator import register
def dispatch_data(worker_group, data):
return data.chunk(worker_group.world_size)
Expand Down Expand Up @@ -214,11 +214,11 @@ computation, and data collection.

Furthermore, the model parallelism size of each model is usually fixed,
including dp, tp, pp. So for these common distributed scenarios, we have
pre-implemented specific dispatch and collect methods,in `decorator.py <https://github.com/volcengine/verl/blob/main/single_controller/base/decorator.py>`_, which can be directly used to wrap the computations.
pre-implemented specific dispatch and collect methods,in `decorator.py <https://github.com/volcengine/verl/blob/main/verl/single_controller/base/decorator.py>`_, which can be directly used to wrap the computations.

.. code:: python
from single_controller.base.decorator import register, Dispatch
from verl.single_controller.base.decorator import register, Dispatch
@register(dispatch_mode=Dispatch.DP_COMPUTE_PROTO)
def generate_sequences(self, data: DataProto) -> DataProto:
Expand Down
5 changes: 2 additions & 3 deletions docs/examples/config.rst
Original file line number Diff line number Diff line change
Expand Up @@ -307,7 +307,7 @@ Trainer
total_epochs: 30
project_name: verl_examples
experiment_name: gsm8k
logger: ['console', 'tracking']
logger: ['console', 'wandb']
nnodes: 1
n_gpus_per_node: 8
save_freq: -1
Expand All @@ -319,8 +319,7 @@ Trainer
- ``trainer.total_epochs``: Number of epochs in training.
- ``trainer.project_name``: For wandb
- ``trainer.experiment_name``: For wandb
- ``trainer.logger``: Support console and tracking. For tracking, we
will initialize a wandb
- ``trainer.logger``: Support console and wandb
- ``trainer.nnodes``: Number of nodes used in the training.
- ``trainer.n_gpus_per_node``: Number of GPUs per node.
- ``trainer.save_freq``: The frequency (by iteration) to save checkpoint
Expand Down
4 changes: 2 additions & 2 deletions docs/examples/gsm8k_example.rst
Original file line number Diff line number Diff line change
Expand Up @@ -91,7 +91,7 @@ We also provide various training scripts for SFT on GSM8K dataset in `gsm8k sft
trainer.project_name=gsm8k-sft \
trainer.experiment_name=gsm8k-sft-deepseek-coder-6.7b-instruct \
trainer.total_epochs=4 \
trainer.logger=['console','tracking']
trainer.logger=['console','wandb']
Step 4: Perform PPO training with your model on GSM8K Dataset
-------------------------------------------------------------
Expand Down Expand Up @@ -156,7 +156,7 @@ The script of run_deepseek7b_llm.sh
critic.model.fsdp_config.optimizer_offload=False \
algorithm.kl_ctrl.kl_coef=0.001 \
trainer.critic_warmup=0 \
trainer.logger=['console','tracking'] \
trainer.logger=['console','wandb'] \
trainer.project_name='verl_example_gsm8k' \
trainer.experiment_name='deepseek_llm_7b_function_rm' \
trainer.n_gpus_per_node=8 \
Expand Down
4 changes: 2 additions & 2 deletions docs/examples/ppo_code_architecture.rst
Original file line number Diff line number Diff line change
Expand Up @@ -49,13 +49,13 @@ Define worker classes
if config.actor_rollout_ref.actor.strategy == 'fsdp': # for FSDP backend
assert config.actor_rollout_ref.actor.strategy == config.critic.strategy
from verl.trainer.ppo.workers.fsdp_workers import ActorRolloutRefWorker, CriticWorker
from single_controller.ray import RayWorkerGroup
from verl.single_controller.ray import RayWorkerGroup
ray_worker_group_cls = RayWorkerGroup
elif config.actor_rollout_ref.actor.strategy == 'megatron': # for Megatron backend
assert config.actor_rollout_ref.actor.strategy == config.critic.strategy
from verl.trainer.ppo.workers.megatron_workers import ActorRolloutRefWorker, CriticWorker
from single_controller.ray.megatron import NVMegatronRayWorkerGroup
from verl.single_controller.ray.megatron import NVMegatronRayWorkerGroup
ray_worker_group_cls = NVMegatronRayWorkerGroup # Ray worker class for Megatron-LM
else:
Expand Down
Loading

0 comments on commit 78181a1

Please sign in to comment.