From d5feb2321878aefb242c5b11fa74be6fe39d100b Mon Sep 17 00:00:00 2001 From: NikV Date: Tue, 14 Jan 2025 15:15:26 -0500 Subject: [PATCH 1/7] Improve readability of the quick tour. --- docs/source/quicktour.mdx | 29 ++++++++++++++++++++++------- 1 file changed, 22 insertions(+), 7 deletions(-) diff --git a/docs/source/quicktour.mdx b/docs/source/quicktour.mdx index b2190245c..74e9a7e61 100644 --- a/docs/source/quicktour.mdx +++ b/docs/source/quicktour.mdx @@ -32,18 +32,33 @@ lighteval accelerate \ "leaderboard|truthfulqa:mc|0|0" ``` -Here, `--tasks` refers to either a comma-separated list of supported tasks from -the [tasks_list](available-tasks) in the format: +Here, the first argument specifies which model(s) to run, and the second argument specifies how to evaluate them. + +Multiple models can be evaluated at the same time by using a comma-separated list. For example: ```bash -{suite}|{task}|{num_few_shot}|{0 or 1 to automatically reduce `num_few_shot` if prompt is too long} +lighteval accelerate \ + "pretrained=gpt2,pretrained=HuggingFaceTB/SmolLM2-135M-Instruct" \ + "leaderboard|truthfulqa:mc|0|0" ``` -or a file path like -[examples/tasks/recommended_set.txt](https://github.com/huggingface/lighteval/blob/main/examples/tasks/recommended_set.txt) -which specifies multiple task configurations. +Similarly, multiple evalutions can be run as well, either with a comma-separated list of supported tasks, or by specifing +a file path, like from [examples/tasks/recommended_set.txt](https://github.com/huggingface/lighteval/blob/main/examples/tasks/recommended_set.txt). +For example: + +```bash +lighteval accelerate \ + "pretrained=gpt2 \ + ./path/to/lighteval/examples/tasks/recommended_set.txt +``` + +The task specification might be a bit hard to grasp as first. The format is as follows: + +```bash +{suite}|{task}|{num_few_shot}|{0 or 1 to automatically reduce `num_few_shot` if prompt is too long} +``` -Tasks details can be found in the +All supported tasks can be found at the [tasks_list](available-tasks). For more details, you can have a look at the [file](https://github.com/huggingface/lighteval/blob/main/src/lighteval/tasks/default_tasks.py) implementing them. From b3d878c8d033d9af548eb6de9609299af23555f5 Mon Sep 17 00:00:00 2001 From: Nik Date: Tue, 21 Jan 2025 11:44:14 -0500 Subject: [PATCH 2/7] update based on feedback --- docs/source/quicktour.mdx | 47 +++++++++++++++++++++------------------ 1 file changed, 25 insertions(+), 22 deletions(-) diff --git a/docs/source/quicktour.mdx b/docs/source/quicktour.mdx index 74e9a7e61..8f4b47d37 100644 --- a/docs/source/quicktour.mdx +++ b/docs/source/quicktour.mdx @@ -24,7 +24,7 @@ Lighteval can be used with a few different commands. ### Evaluate a model on a GPU -To evaluate `GPT-2` on the Truthful QA benchmark, run: +To evaluate `GPT-2` on the Truthful QA benchmark run: ```bash lighteval accelerate \ @@ -32,36 +32,39 @@ lighteval accelerate \ "leaderboard|truthfulqa:mc|0|0" ``` -Here, the first argument specifies which model(s) to run, and the second argument specifies how to evaluate them. +Here, the first argument specifies the model to run, and the second argument specifies which tasks to run. -Multiple models can be evaluated at the same time by using a comma-separated list. For example: +The syntax for the model arguments is `key1=value1,key2=value2,etc`. +The keys correspond with the backend configuration (accelerate, vllm), and are detailed [below](#Model Arguments). -```bash -lighteval accelerate \ - "pretrained=gpt2,pretrained=HuggingFaceTB/SmolLM2-135M-Instruct" \ - "leaderboard|truthfulqa:mc|0|0" +The syntax for the task specification might be a bit hard to grasp as first. The format is as follows: + +```txt +{suite}|{task}|{num_few_shot}|{0 for strict `num_few_shots`, or 1 to allow a reduction} ``` -Similarly, multiple evalutions can be run as well, either with a comma-separated list of supported tasks, or by specifing -a file path, like from [examples/tasks/recommended_set.txt](https://github.com/huggingface/lighteval/blob/main/examples/tasks/recommended_set.txt). -For example: +If the fourth value is set to 1, lighteval will check if the prompt (including the few-shot examples) is too long for the context size of the task or the model. +If so, the number of few shot examples is automatically reduced. -```bash -lighteval accelerate \ - "pretrained=gpt2 \ - ./path/to/lighteval/examples/tasks/recommended_set.txt -``` +All officially supported tasks can be found at the [tasks_list](available-tasks). +Moreover, community-provided tasks can be found in the +[extended folder](https://github.com/huggingface/lighteval/tree/main/src/lighteval/tasks/extended) and the +[community](https://github.com/huggingface/lighteval/tree/main/community_tasks) folder. +For more details on the implementation of the tasks, such as how prompts are constructed, or which metrics are used, you can have a look at the +[file](https://github.com/huggingface/lighteval/blob/main/src/lighteval/tasks/default_tasks.py) +implementing them. -The task specification might be a bit hard to grasp as first. The format is as follows: +Running multiple tasks is supported, either with a comma-separated list, or by specifying a file path. +The file should be structured like [examples/tasks/recommended_set.txt](https://github.com/huggingface/lighteval/blob/main/examples/tasks/recommended_set.txt). +When specifying a path to file, it should start with `./`. ```bash -{suite}|{task}|{num_few_shot}|{0 or 1 to automatically reduce `num_few_shot` if prompt is too long} +lighteval accelerate \ + "pretrained=gpt2" \ + ./path/to/lighteval/examples/tasks/recommended_set.txt +# or, e.g., "leaderboard|truthfulqa:mc|0|0|,leaderboard|gsm8k|3|1" ``` -All supported tasks can be found at the [tasks_list](available-tasks). For more details, you can have a look at the -[file](https://github.com/huggingface/lighteval/blob/main/src/lighteval/tasks/default_tasks.py) -implementing them. - ### Evaluate a model on one or more GPUs #### Data parallelism @@ -90,7 +93,7 @@ To evaluate a model using pipeline parallelism on 2 or more GPUs, run: ```bash lighteval accelerate \ - "pretrained=gpt2,model_parallel=True" \ + "pretrained=gpt2,model_parallel=True,dtype=float16" \ "leaderboard|truthfulqa:mc|0|0" ``` From 6df1aad5de7bda1b1219590823502db5ca18d0e0 Mon Sep 17 00:00:00 2001 From: Nik Date: Tue, 21 Jan 2025 11:51:26 -0500 Subject: [PATCH 3/7] delete superfluous edit of float16 --- docs/source/quicktour.mdx | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/source/quicktour.mdx b/docs/source/quicktour.mdx index 8f4b47d37..e165f82f4 100644 --- a/docs/source/quicktour.mdx +++ b/docs/source/quicktour.mdx @@ -93,7 +93,7 @@ To evaluate a model using pipeline parallelism on 2 or more GPUs, run: ```bash lighteval accelerate \ - "pretrained=gpt2,model_parallel=True,dtype=float16" \ + "pretrained=gpt2,model_parallel=True" \ "leaderboard|truthfulqa:mc|0|0" ``` From dfd2a40a2d1984b4982e0c32c90e28da88f10dd7 Mon Sep 17 00:00:00 2001 From: Nik Date: Tue, 21 Jan 2025 11:53:54 -0500 Subject: [PATCH 4/7] deleted , for no reason --- docs/source/quicktour.mdx | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/source/quicktour.mdx b/docs/source/quicktour.mdx index e165f82f4..19fcb7104 100644 --- a/docs/source/quicktour.mdx +++ b/docs/source/quicktour.mdx @@ -24,7 +24,7 @@ Lighteval can be used with a few different commands. ### Evaluate a model on a GPU -To evaluate `GPT-2` on the Truthful QA benchmark run: +To evaluate `GPT-2` on the Truthful QA benchmark, run: ```bash lighteval accelerate \ From 02b3e304d1cae8d455979dc65d28d459cfbf23df Mon Sep 17 00:00:00 2001 From: Nik Date: Tue, 21 Jan 2025 13:13:23 -0500 Subject: [PATCH 5/7] reorganize headers --- docs/source/quicktour.mdx | 21 ++++++++++----------- 1 file changed, 10 insertions(+), 11 deletions(-) diff --git a/docs/source/quicktour.mdx b/docs/source/quicktour.mdx index 19fcb7104..cd76eefdc 100644 --- a/docs/source/quicktour.mdx +++ b/docs/source/quicktour.mdx @@ -20,11 +20,10 @@ Lighteval can be used with a few different commands. - `tgi`: evaluate models on one or more GPUs using [🔗 Text Generation Inference](https://huggingface.co/docs/text-generation-inference/en/index) - `openai`: evaluate models on one or more GPUs using [🔗 OpenAI API](https://platform.openai.com/) -## Accelerate +## Basic usage -### Evaluate a model on a GPU - -To evaluate `GPT-2` on the Truthful QA benchmark, run: +To evaluate `GPT-2` on the Truthful QA benchmark with [🤗 + Accelerate](https://github.com/huggingface/accelerate) , run: ```bash lighteval accelerate \ @@ -32,12 +31,12 @@ lighteval accelerate \ "leaderboard|truthfulqa:mc|0|0" ``` -Here, the first argument specifies the model to run, and the second argument specifies which tasks to run. +Here, we first choose a backend (either `accelerate`, `nanotron`, or `vllm`), and then specify the model and task(s) to run. The syntax for the model arguments is `key1=value1,key2=value2,etc`. -The keys correspond with the backend configuration (accelerate, vllm), and are detailed [below](#Model Arguments). +Valid key-value pairs correspond with the backend configuration, and are detailed [below](#Model Arguments). -The syntax for the task specification might be a bit hard to grasp as first. The format is as follows: +The syntax for the task specification might be a bit hard to grasp at first. The format is as follows: ```txt {suite}|{task}|{num_few_shot}|{0 for strict `num_few_shots`, or 1 to allow a reduction} @@ -65,7 +64,7 @@ lighteval accelerate \ # or, e.g., "leaderboard|truthfulqa:mc|0|0|,leaderboard|gsm8k|3|1" ``` -### Evaluate a model on one or more GPUs +## Evaluate a model on one or more GPUs #### Data parallelism @@ -104,13 +103,13 @@ This will automatically use accelerate to distribute the model across the GPUs. > `model_parallel=True` and using accelerate to distribute the data across the GPUs. -### Model Arguments +## Backend configuration The `model-args` argument takes a string representing a list of model argument. The arguments allowed vary depending on the backend you use (vllm or accelerate). -#### Accelerate +### Accelerate - **pretrained** (str): HuggingFace Hub model ID name or the path to a pre-trained @@ -146,7 +145,7 @@ accelerate). - **trust_remote_code** (bool): Whether to trust remote code during model loading. -#### VLLM +### VLLM - **pretrained** (str): HuggingFace Hub model ID name or the path to a pre-trained model to load. - **gpu_memory_utilisation** (float): The fraction of GPU memory to use. From 6590cdbc19cab6a4098f535498c06025bc2fd12e Mon Sep 17 00:00:00 2001 From: Nik Date: Thu, 23 Jan 2025 09:27:36 -0500 Subject: [PATCH 6/7] fix nit --- docs/source/quicktour.mdx | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/docs/source/quicktour.mdx b/docs/source/quicktour.mdx index cd76eefdc..c2e71abcd 100644 --- a/docs/source/quicktour.mdx +++ b/docs/source/quicktour.mdx @@ -39,15 +39,15 @@ Valid key-value pairs correspond with the backend configuration, and are detaile The syntax for the task specification might be a bit hard to grasp at first. The format is as follows: ```txt -{suite}|{task}|{num_few_shot}|{0 for strict `num_few_shots`, or 1 to allow a reduction} +{suite}|{task}|{num_few_shot}|{0 for strict `num_few_shots`, or 1 to allow a truncation if context size is too small ``` If the fourth value is set to 1, lighteval will check if the prompt (including the few-shot examples) is too long for the context size of the task or the model. If so, the number of few shot examples is automatically reduced. -All officially supported tasks can be found at the [tasks_list](available-tasks). +All officially supported tasks can be found at the [tasks_list](available-tasks) and in the +[extended folder](https://github.com/huggingface/lighteval/tree/main/src/lighteval/tasks/extended). Moreover, community-provided tasks can be found in the -[extended folder](https://github.com/huggingface/lighteval/tree/main/src/lighteval/tasks/extended) and the [community](https://github.com/huggingface/lighteval/tree/main/community_tasks) folder. For more details on the implementation of the tasks, such as how prompts are constructed, or which metrics are used, you can have a look at the [file](https://github.com/huggingface/lighteval/blob/main/src/lighteval/tasks/default_tasks.py) From 381beb25c8c5803af6e8ce8cba6b6ff8b4da0177 Mon Sep 17 00:00:00 2001 From: Nik Date: Thu, 23 Jan 2025 12:59:35 -0500 Subject: [PATCH 7/7] closing bracket --- docs/source/quicktour.mdx | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/source/quicktour.mdx b/docs/source/quicktour.mdx index c2e71abcd..89c4656be 100644 --- a/docs/source/quicktour.mdx +++ b/docs/source/quicktour.mdx @@ -39,7 +39,7 @@ Valid key-value pairs correspond with the backend configuration, and are detaile The syntax for the task specification might be a bit hard to grasp at first. The format is as follows: ```txt -{suite}|{task}|{num_few_shot}|{0 for strict `num_few_shots`, or 1 to allow a truncation if context size is too small +{suite}|{task}|{num_few_shot}|{0 for strict `num_few_shots`, or 1 to allow a truncation if context size is too small} ``` If the fourth value is set to 1, lighteval will check if the prompt (including the few-shot examples) is too long for the context size of the task or the model.