Skip to content

Commit

Permalink
[doc] fix document deprecated link (#235)
Browse files Browse the repository at this point in the history
- As titled
  • Loading branch information
PeterSH6 authored Feb 9, 2025
1 parent 577a341 commit 610c20c
Show file tree
Hide file tree
Showing 4 changed files with 10 additions and 10 deletions.
4 changes: 2 additions & 2 deletions docs/examples/ppo_code_architecture.rst
Original file line number Diff line number Diff line change
Expand Up @@ -109,8 +109,8 @@ Step 2: Define the worker class corresponding to this role
``Critic``, ``Reward Model`` and ``Reference model`` on two different
backend: PyTorch FSDP
and Megatron-LM.
See `FSDP Workers <https://github.com/volcengine/verl/blob/main/verl/trainer/ppo/workers/fsdp_workers.py>`_
and `Megatron-LM Workers <https://github.com/volcengine/verl/blob/main/verl/trainer/ppo/workers/megatron_workers.py>`_
See `FSDP Workers <https://github.com/volcengine/verl/blob/main/verl/workers/fsdp_workers.py>`_
and `Megatron-LM Workers <https://github.com/volcengine/verl/blob/main/verl/workers/megatron_workers.py>`_
for more information.

Step 3: Define resource pool id and resource pool spec
Expand Down
10 changes: 4 additions & 6 deletions docs/hybrid_flow.rst
Original file line number Diff line number Diff line change
Expand Up @@ -4,11 +4,9 @@ HybridFlow Programming Guide

.. _vermouth: https://github.com/vermouth1992

Author: `Chi Zhang <vermouth>`_
Author: `Chi Zhang <https://github.com/vermouth1992>`_

.. _hybridflow: https://arxiv.org/pdf/2409.19256

verl is an open source implementation of the paper `HybridFlow <hybridflow>`_ [1]_. In this section, we will introduce the basic concepts of HybridFlow, the motivation and how to program with verl APIs.
verl is an open source implementation of the paper `HybridFlow <https://arxiv.org/abs/2409.19256v2>`_ [1]_. In this section, we will introduce the basic concepts of HybridFlow, the motivation and how to program with verl APIs.

Motivation and Design
------------------------
Expand Down Expand Up @@ -83,7 +81,7 @@ Overall Execution Diagram

Below is a simplified diagram denoting the execution of a reinforcement learning job. In the diagram, the controller runs on a single process, while the generator/actor workers, critic workers run on multiple processes, placed with specific resource groups. For rollout, the controller passes the data to the generator to perform sample generation. When the rollout is done, the data is passed back to controller for the next step of the algorithm. Similar execution is done for other workers. With the hybrid controller design, the data flow and computation is decoupled to provide both efficiency in computation and flexiblity in defining algorithm training loops.

.. image:: https://github.com/eric-haibin-lin/verl-community/blob/main/docs/driver_worker.png?raw=true
.. figure:: https://github.com/eric-haibin-lin/verl-community/blob/main/docs/driver_worker.png?raw=true
:alt: The execution diagram

Codebase walkthrough (PPO)
Expand All @@ -93,7 +91,7 @@ Entry function
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Code: https://github.com/volcengine/verl/blob/main/verl/trainer/main_ppo.py

In this file, we define a remote function `main_task` that serves as the controller process as shown in Figure~\ref{}. We also define a ``RewardManager``, where users can customize their reward function based on the data source in the dataset. Note that `RewardManager` should return the final token-level reward that is optimized by RL algorithms. Note that users can combine model-based rewards and rule-based rewards.
In this file, we define a remote function `main_task` that serves as the controller (driver) process as shown in the above figure. We also define a ``RewardManager``, where users can customize their reward function based on the data source in the dataset. Note that `RewardManager` should return the final token-level reward that is optimized by RL algorithms. Note that users can combine model-based rewards and rule-based rewards.
The ``main_task`` constructs a RayPPOTrainer instance and launch the fit. Note that ``main_task`` **runs as a single process**.

We highly recommend that the ``main_task`` is NOT schduled on the head of the ray cluster because ``main_task`` will consume a lot of memory but the head usually contains very few resources.
Expand Down
4 changes: 3 additions & 1 deletion docs/perf/perf_tuning.rst
Original file line number Diff line number Diff line change
@@ -1,5 +1,7 @@
Performance Tuning Guide
=========================
==============================

Author: `Guangming Sheng <https://github.com/PeterSH6>`_

In this section, we will discuss how to tune the performance of all the stages in verl, including:

Expand Down
2 changes: 1 addition & 1 deletion docs/workers/fsdp_workers.rst
Original file line number Diff line number Diff line change
Expand Up @@ -59,7 +59,7 @@ highlighted below:
3. ``FSDPVLLMShardingManager`` a context manager to perform actual
resharding between actor and rollout.

See `source code <https://github.com/volcengine/verl/blob/main/verl/trainer/ppo/workers/fsdp_workers.py#L42>`_. for more information.
See `source code <https://github.com/volcengine/verl/blob/main/verl/workers/fsdp_workers.py>`_. for more information.

1. Generate sequence and recompute log prob

Expand Down

0 comments on commit 610c20c

Please sign in to comment.