From 93c433c81f781f64454f8d3450ce7d52dc5d35a3 Mon Sep 17 00:00:00 2001 From: Merve Noyan Date: Fri, 31 Jan 2025 16:26:42 +0100 Subject: [PATCH] Add ToC and CLI docs (#455) --------- Co-authored-by: Aymeric Roucher <69208727+aymeric-roucher@users.noreply.github.com> --- README.md | 40 +++++++++++++++++++++++++++++++++++++++- 1 file changed, 39 insertions(+), 1 deletion(-) diff --git a/README.md b/README.md index 6da9d4d2..25051222 100644 --- a/README.md +++ b/README.md @@ -46,6 +46,16 @@ Full documentation can be found [here](https://huggingface.co/docs/smolagents/in > [!NOTE] > Check the our [launch blog post](https://huggingface.co/blog/smolagents) to learn more about `smolagents`! +## Table of Contents +- [Introduction](#introduction) +- [Quick Demo](#quick-demo) +- [Command Line Interface](#command-line-interface) +- [Code Agents](#code-agents) +- [How Smol is it Really?](#how-smol-is-it-really) +- [How Strong are Open Models for Agentic Workflows?](#how-strong-are-open-models-for-agentic-workflows) +- [Contributing](#contributing) +- [Citing smolagents](#citing-smolagents) + ## Quick demo First install the package. @@ -63,6 +73,33 @@ agent.run("How many seconds would it take for a leopard at full speed to run thr https://github.com/user-attachments/assets/cd0226e2-7479-4102-aea0-57c22ca47884 + +## Command Line Interface + +You can accomplish multi-step agentic tasks using two commands: `smolagent` and `webagent`. `smolagent` is a more generalist command to run a multi-step CodeAgent that can be equipped with various tools, meanwhile `webagent` is an agent equipped with web browsing tools using [helium](https://github.com/helium). + +**Web Browser in CLI** + +`webagent` allows users to automate web browsing tasks. It uses the Helium library to interact with web pages and uses defined tools to browse the web. Read more about it [here](https://github.com/huggingface/smolagents/blob/main/src/smolagents/vision_web_browser.py). + +Run the following command to get started: +```bash +webagent {YOUR_PROMPT_HERE} --model "LiteLLMModel" --model-id "gpt-4o" +``` + +A good example command to get started is `$ webagent --prompt "go to xyz.com/women, get to sale section, click the first clothing item you see. Get the product details, and the price, return them. note that I'm shopping from France"`. We redacted the website here, modify it with website of your choice. + +**Tool Calling Agent in CLI** + +You can run `smolagent` command to run a multi-step agent with [tools](https://huggingface.co/docs/smolagents/en/reference/tools). It uses web search tool by default. +You can easily get started with `$ smolagent {YOUR_PROMPT_HERE}`. A more custom version of this one-liner is following, see more details [here](https://github.com/huggingface/smolagents/blob/main/src/smolagents/cli.py). + +```bash +smolagent {YOUR_PROMPT_HERE} --model-type "HfApiModel" --model-id "Qwen/Qwen2.5-Coder-32B-Instruct" --imports "pandas numpy" --tools "web_search translation" +``` + +A good example command to get started is `$ smolagent "Plan a trip to Tokyo, Kyoto and Osaka between Mar 28 and Apr 7. Allocate time according to number of public attraction in each, and optimize for distance and travel time. Bring all the public transportation options."`. + ## Code agents? In our `CodeAgent`, the LLM engine writes its actions in code. This approach is demonstrated to work better than the current industry practice of letting the LLM output a dictionary of the tools it wants to calls: [uses 30% fewer steps](https://huggingface.co/papers/2402.01030) (thus 30% fewer LLM calls) @@ -89,7 +126,8 @@ We've created [`CodeAgent`](https://huggingface.co/docs/smolagents/reference/age benchmark of different models on agentic workflows

-This comparison shows that open source models can now take on the best closed models! +This comparison shows that open-source models can now take on the best closed models! + ## Contributing