docs: 730 docs add an index to the guide overview (#731)

* Add index page to how-to guides * Apply suggestions from code review Co-authored-by: burtenshaw <[email protected]> --------- Co-authored-by: burtenshaw <[email protected]>
argilla-io · Jun 13, 2024 · 806fd57 · 806fd57
1 parent 9d63f4a
commit 806fd57
Show file tree

Hide file tree

Showing 6 changed files with 116 additions and 93 deletions.
diff --git a/docs/index.md b/docs/index.md
@@ -40,8 +40,6 @@ Distilabel is the **framework for synthetic data and AI feedback for AI engineer
 
 If you just want to get started, we recommend you check the [documentation](http://distilabel.argilla.io/). Curious, and want to know more? Keep reading!
 
-<!-- ![overview](https://github.com/argilla-io/distilabel/assets/36760800/360110da-809d-4e24-a29b-1a1a8bc4f9b7)  -->
-
 ## Why use Distilabel?
 
 Whether you are working on **a predictive model** that computes semantic similarity or the next **generative model** that is going to beat the LLM benchmarks. Our framework ensures that the **hard data work pays off**. Distilabel is the missing piece that helps you **synthesize data** and provide **AI feedback**.
@@ -64,89 +62,4 @@ Distilabel is a tool that can be used to **synthesize data and provide AI feedba
 
 - The [1M OpenHermesPreference](https://huggingface.co/datasets/argilla/OpenHermesPreferences) is a dataset of ~1 million AI preferences derived from teknium/OpenHermes-2.5. It shows how we can use Distilabel to **synthesize data on an immense scale**.
 - Our [distilabeled Intel Orca DPO dataset](https://huggingface.co/datasets/argilla/distilabel-intel-orca-dpo-pairs) and the [improved OpenHermes model](https://huggingface.co/argilla/distilabeled-OpenHermes-2.5-Mistral-7B),, show how we **improve model performance by filtering out 50%** of the original dataset through **AI feedback**.
-- The [haiku DPO data](https://github.com/davanstrien/haiku-dpo) outlines how anyone can create a **dataset for a specific task** and **the latest research papers** to improve the quality of the dataset.
-
-## 👨🏽‍💻 Installation
-
-```sh
-pip install distilabel --upgrade
-```
-
-Requires Python 3.8+
-
-In addition, the following extras are available:
-
-- `anthropic`: for using models available in [Anthropic API](https://www.anthropic.com/api) via the `AnthropicLLM` integration.
-- `cohere`: for using models available in [Cohere](https://cohere.ai/) via the `CohereLLM` integration.
-- `argilla`: for exporting the generated datasets to [Argilla](https://argilla.io/).
-- `groq`: for using models available in [Groq](https://groq.com/) using [`groq`](https://github.com/groq/groq-python) Python client via the `GroqLLM` integration.
-- `hf-inference-endpoints`: for using the [Hugging Face Inference Endpoints](https://huggingface.co/inference-endpoints) via the `InferenceEndpointsLLM` integration.
-- `hf-transformers`: for using models available in [transformers](https://github.com/huggingface/transformers) package via the `TransformersLLM` integration.
-- `litellm`: for using [`LiteLLM`](https://github.com/BerriAI/litellm) to call any LLM using OpenAI format via the `LiteLLM` integration.
-- `llama-cpp`: for using [llama-cpp-python](https://github.com/abetlen/llama-cpp-python) Python bindings for `llama.cpp` via the `LlamaCppLLM` integration.
-- `mistralai`: for using models available in [Mistral AI API](https://mistral.ai/news/la-plateforme/) via the `MistralAILLM` integration.
-- `ollama`: for using [Ollama](https://ollama.com/) and their available models via `OllamaLLM` integration.
-- `openai`: for using [OpenAI API](https://openai.com/blog/openai-api) models via the `OpenAILLM` integration, or the rest of the integrations based on OpenAI and relying on its client as `AnyscaleLLM`, `AzureOpenAILLM`, and `TogetherLLM`.
-- `vertexai`: for using [Google Vertex AI](https://cloud.google.com/vertex-ai) proprietary models via the `VertexAILLM` integration.
-- `vllm`: for using [vllm](https://github.com/vllm-project/vllm) serving engine via the `vLLM` integration.
-
-### Example
-
-To run the following example you must install `distilabel` with both `openai` extra:
-
-```sh
-pip install "distilabel[openai]" --upgrade
-```
-
-Then run:
-
-```python
-from distilabel.llms import OpenAILLM
-from distilabel.pipeline import Pipeline
-from distilabel.steps import LoadDataFromHub
-from distilabel.steps.tasks import TextGeneration
-
-with Pipeline(
-    name="simple-text-generation-pipeline",
-    description="A simple text generation pipeline",
-) as pipeline:
-    load_dataset = LoadDataFromHub(output_mappings={"prompt": "instruction"})
-
-    generate_with_openai = TextGeneration(llm=OpenAILLM(model="gpt-3.5-turbo"))
-
-    load_dataset.connect(generate_with_openai)
-
-if __name__ == "__main__":
-    distiset = pipeline.run(
-        parameters={
-            load_dataset.name: {
-                "repo_id": "distilabel-internal-testing/instruction-dataset-mini",
-                "split": "test",
-            },
-            generate_with_openai.name: {
-                "llm": {
-                    "generation_kwargs": {
-                        "temperature": 0.7,
-                        "max_new_tokens": 512,
-                    }
-                }
-            },
-        },
-    )
-```
-
-## Badges
-
-If you build something cool with `distilabel` consider adding one of these badges to your dataset or model card.
-
-    [<img src="https://raw.githubusercontent.com/argilla-io/distilabel/main/docs/assets/distilabel-badge-light.png" alt="Built with Distilabel" width="200" height="32"/>](https://github.com/argilla-io/distilabel)
-
-[<img src="https://raw.githubusercontent.com/argilla-io/distilabel/main/docs/assets/distilabel-badge-light.png" alt="Built with Distilabel" width="200" height="32"/>](https://github.com/argilla-io/distilabel)
-
-    [<img src="https://raw.githubusercontent.com/argilla-io/distilabel/main/docs/assets/distilabel-badge-dark.png" alt="Built with Distilabel" width="200" height="32"/>](https://github.com/argilla-io/distilabel)
-
-[<img src="https://raw.githubusercontent.com/argilla-io/distilabel/main/docs/assets/distilabel-badge-dark.png" alt="Built with Distilabel" width="200" height="32"/>](https://github.com/argilla-io/distilabel)
-
-## Contribute
-
-To directly contribute with `distilabel`, check our [good first issues](https://github.com/argilla-io/distilabel/issues?q=is%3Aissue+is%3Aopen+label%3A%22good+first+issue%22) or [open a new one](https://github.com/argilla-io/distilabel/issues/new/choose).
+- The [haiku DPO data](https://github.com/davanstrien/haiku-dpo) outlines how anyone can create a **dataset for a specific task** and **the latest research papers** to improve the quality of the dataset.
diff --git a/docs/sections/community/index.md b/docs/sections/community/index.md
@@ -41,4 +41,20 @@ We are an open-source community-driven project not only focused on building a gr
 
     [:octicons-arrow-right-24: Roadmap ↗](https://github.com/orgs/argilla-io/projects/15)
 
-</div>
+</div>
+
+## Badges
+
+If you build something cool with `distilabel` consider adding one of these badges to your dataset or model card.
+
+    [<img src="https://raw.githubusercontent.com/argilla-io/distilabel/main/docs/assets/distilabel-badge-light.png" alt="Built with Distilabel" width="200" height="32"/>](https://github.com/argilla-io/distilabel)
+
+[<img src="https://raw.githubusercontent.com/argilla-io/distilabel/main/docs/assets/distilabel-badge-light.png" alt="Built with Distilabel" width="200" height="32"/>](https://github.com/argilla-io/distilabel)
+
+    [<img src="https://raw.githubusercontent.com/argilla-io/distilabel/main/docs/assets/distilabel-badge-dark.png" alt="Built with Distilabel" width="200" height="32"/>](https://github.com/argilla-io/distilabel)
+
+[<img src="https://raw.githubusercontent.com/argilla-io/distilabel/main/docs/assets/distilabel-badge-dark.png" alt="Built with Distilabel" width="200" height="32"/>](https://github.com/argilla-io/distilabel)
+
+## Contribute
+
+To directly contribute with `distilabel`, check our [good first issues](https://github.com/argilla-io/distilabel/issues?q=is%3Aissue+is%3Aopen+label%3A%22good+first+issue%22) or [open a new one](https://github.com/argilla-io/distilabel/issues/new/choose).
diff --git a/docs/sections/how_to_guides/basic/llm/index.md b/docs/sections/how_to_guides/basic/llm/index.md
@@ -1,4 +1,4 @@
-# Define LLMs as local models or remote APIs
+# Define LLMs as local or remote models
 
 ## Working with LLMs
 

diff --git a/docs/sections/how_to_guides/basic/task/index.md b/docs/sections/how_to_guides/basic/task/index.md
@@ -1,4 +1,4 @@
-# Define Tasks as Steps that rely on LLMs
+# Define Tasks that rely on LLMs
 
 ## Working with Tasks
 

diff --git a/docs/sections/how_to_guides/index.md b/docs/sections/how_to_guides/index.md
@@ -0,0 +1,93 @@
+# How-to guides
+
+Welcome to the how-to guides section! Here you will find a collection of guides that will help you get started with Distilabel. We have divided the guides into two categories: basic and advanced. The basic guides will help you get started with the core concepts of Distilabel, while the advanced guides will help you explore more advanced features.
+
+## Basic
+
+<div class="grid cards" markdown>
+
+-   __Define Steps for your Pipeline__
+
+    ---
+
+    Steps are the building blocks of your pipeline. They can be used to generate data, evaluate models, manipulate data, or any other general task.
+
+    [:octicons-arrow-right-24: Define Steps](basic/step/index.md)
+
+-   __Define Tasks that rely on LLMs__
+
+    ---
+
+    Tasks are a specific type of step that rely on Language Models (LLMs) to generate data.
+
+    [:octicons-arrow-right-24: Define Tasks](basic/task/index.md)
+
+-   __Define LLMs as local or remote models__
+
+    ---
+
+    LLMs are the core of your tasks. They are used to integrate with local models or remote APIs.
+
+    [:octicons-arrow-right-24: Define LLMs](basic/llm/index.md)
+
+-   __Execute Steps and Tasks in a Pipeline__
+
+    ---
+
+    Pipeline is where you put all your steps and tasks together to create a workflow.
+
+    [:octicons-arrow-right-24: Execute Pipeline](basic/pipeline/index.md)
+
+</div>
+
+## Advanced
+
+<div class="grid cards" markdown>
+-  __Using the Distiset dataset object__
+
+    ---
+
+    Distiset is a dataset object based on the datasets library that can be used to store and manipulate data.
+
+    [:octicons-arrow-right-24: Distiset](advanced/distiset.md)
+
+-  __Export data to Argilla__
+
+    ---
+
+    Argilla is a platform that can be used to store, search, and apply feedback to datasets.
+    [:octicons-arrow-right-24: Argilla](advanced/argilla.md)
+
+-  __Using a file system to pass data of batches between steps__
+
+    ---
+
+    File system can be used to pass data between steps in a pipeline.
+
+    [:octicons-arrow-right-24: File System](advanced/fs_to_pass_data.md)
+
+-  __Using CLI to explore and re-run existing Pipelines__
+
+    ---
+
+    CLI can be used to explore and re-run existing pipelines through the command line.
+
+    [:octicons-arrow-right-24: CLI](advanced/cli/index.md)
+
+-  __Cache and recover pipeline executions__
+
+    ---
+
+    Caching can be used to recover pipeline executions to avoid loosing data and precious LLM calls.
+
+    [:octicons-arrow-right-24: Caching](advanced/caching.md)
+
+-  __Structured data generation__
+
+    ---
+
+    Structured data generation can be used to generate data with a specific structure like JSON, function calls, etc.
+
+    [:octicons-arrow-right-24: Structured Generation](advanced/structured_generation.md)
+
+</div>
diff --git a/mkdocs.yml b/mkdocs.yml
@@ -142,15 +142,16 @@ nav:
     - Quickstart: "sections/getting_started/quickstart.md"
     - FAQ: "sections/getting_started/faq.md"
   - How-to guides:
+    - "sections/how_to_guides/index.md"
     - Basic:
       - Define Steps for your Pipeline:
         - "sections/how_to_guides/basic/step/index.md"
         - GeneratorStep: "sections/how_to_guides/basic/step/generator_step.md"
         - GlobalStep: "sections/how_to_guides/basic/step/global_step.md"
-      - Define Tasks as Steps that rely on LLMs:
+      - Define Tasks that rely on LLMs:
         - "sections/how_to_guides/basic/task/index.md"
         - GeneratorTask: "sections/how_to_guides/basic/task/generator_task.md"
-      - Define LLMs as local models or remote APIs: "sections/how_to_guides/basic/llm/index.md"
+      - Define LLMs as local or remote models: "sections/how_to_guides/basic/llm/index.md"
       - Execute Steps and Tasks in a Pipeline: "sections/how_to_guides/basic/pipeline/index.md"
     - Advanced:
         - Using the Distiset dataset object: "sections/how_to_guides/advanced/distiset.md"