From c70cb2451d6696cbaf836c253a4139f019bc3aeb Mon Sep 17 00:00:00 2001 From: Haibin Lin Date: Wed, 18 Dec 2024 13:57:53 -0800 Subject: [PATCH] fix quickstart syntax --- docs/index.rst | 2 +- docs/start/quickstart.rst | 21 ++++++++++----------- 2 files changed, 11 insertions(+), 12 deletions(-) diff --git a/docs/index.rst b/docs/index.rst index ce72cd69..756e4aee 100644 --- a/docs/index.rst +++ b/docs/index.rst @@ -89,7 +89,7 @@ Code formatting ^^^^^^^^^^^^^^^^^^^^^^^^ We use yapf (Google style) to enforce strict code formatting when reviewing MRs. Run yapf at the top level of verl repo: -.. bash:: +.. code-block:: bash pip3 install yapf yapf -ir -vv --style ./.style.yapf verl examples tests diff --git a/docs/start/quickstart.rst b/docs/start/quickstart.rst index eb7cb935..8422c470 100644 --- a/docs/start/quickstart.rst +++ b/docs/start/quickstart.rst @@ -12,7 +12,7 @@ Introduction .. _hf_dataset_gsm8k: https://huggingface.co/datasets/gsm8k -In this example, we train an LLM to tackle the `GSM8k `_ task with function-based rewards[1]_. +In this example, we train an LLM to tackle the `GSM8k `_ task with function-based rewards. [1]_ Prerequisite: @@ -45,7 +45,7 @@ Step 1: Prepare the dataset We preprocess the dataset in parquet format so that (1) it contains necessary fields for computing RL rewards and (2) is faster to read. -.. code:: bash +.. code-block:: bash python3 examples/data_preprocess/gsm8k.py --local_dir ~/data/gsm8k @@ -56,7 +56,7 @@ Usually we recommend starting with an "instruct" model variant so that the model If you start from a "base" model variant, doing SFT before RL is recommended. Refer to the `sft directory `_ and `SFT Trainer `_ for further details. -.. code:: bash +.. code-block:: bash python3 -c "import transformers; transformers.pipeline('text-generation', model='Qwen/Qwen2.5-0.5B-Instruct')" @@ -75,12 +75,12 @@ For mode details, please refer to `verl/utils/reward_score/gsm8k.py `_ directory. \ No newline at end of file +.. [2] More training script examples for FSDP and Megatron-LM backend are stored in `examples/ppo_trainer `_ directory.