diff --git a/docs/core_docs/docs/concepts.mdx b/docs/core_docs/docs/concepts.mdx index 16fb4b0a1b51..f2159d6b5f0e 100644 --- a/docs/core_docs/docs/concepts.mdx +++ b/docs/core_docs/docs/concepts.mdx @@ -156,6 +156,18 @@ Chat Models also accept other parameters that are specific to that integration. For specifics on how to use chat models, see the [relevant how-to guides here](/docs/how_to/#chat-models). +### Multimodality + +Some chat models are multimodal, accepting images, audio and even video as inputs. +These are still less common, meaning model providers haven't standardized on the "best" way to define the API. +Multimodal outputs are even less common. As such, we've kept our multimodal abstractions fairly light weight +and plan to further solidify the multimodal APIs and interaction patterns as the field matures. + +In LangChain, most chat models that support multimodal inputs also accept those values in OpenAI's content blocks format. +So far this is restricted to image inputs. For models like Gemini which support video and other bytes input, the APIs also support the native, model-specific representations. + +For specifics on how to use multimodal models, see the [relevant how-to guides here](/docs/how_to/#multimodal). + ### LLMs @@ -579,15 +591,28 @@ If you are still using AgentExecutor, do not fear: we still have a guide on [how It is recommended, however, that you start to transition to [LangGraph](https://github.com/langchain-ai/langgraphjs). In order to assist in this we have put together a [transition guide on how to do so](/docs/how_to/migrate_agent). -### Multimodal +#### ReAct agents -Some models are multimodal, accepting images, audio and even video as inputs. These are still less common, meaning model providers haven't standardized on the "best" way to define the API. -Multimodal **outputs** are even less common. As such, we've kept our multimodal abstractions fairly light weight and plan to further solidify the multimodal APIs and interaction patterns as the field matures. + -In LangChain, most chat models that support multimodal inputs also accept those values in OpenAI's content blocks format. -So far this is restricted to image inputs. For models like Gemini which support video and other bytes input, the APIs also support the native, model-specific representations. +One popular architecture for building agents is [**ReAct**](https://arxiv.org/abs/2210.03629). +ReAct combines reasoning and acting in an iterative process - in fact the name "ReAct" stands for "Reason" and "Act". -For specifics on how to use multimodal models, see the [relevant how-to guides here](/docs/how_to/#multimodal). +The general flow looks like this: + +- The model will "think" about what step to take in response to an input and any previous observations. +- The model will then choose an action from available tools (or choose to respond to the user). +- The model will generate arguments to that tool. +- The agent runtime (executor) will parse out the chosen tool and call it with the generated arguments. +- The executor will return the results of the tool call back to the model as an observation. +- This process repeats until the agent chooses to respond. + +There are general prompting based implementations that do not require any model-specific features, but the most +reliable implementations use features like [tool calling](/docs/how_to/tool_calling/) to reliably format outputs +and reduce variance. + +Please see the [LangGraph documentation](https://langchain-ai.github.io/langgraph/) for more information, +or [this how-to guide](/docs/how_to/migrate_agent/) for specific information on migrating to LangGraph. ### Callbacks diff --git a/docs/core_docs/docs/how_to/migrate_agent.ipynb b/docs/core_docs/docs/how_to/migrate_agent.ipynb index eacda605931a..cdebd6c7ce1a 100644 --- a/docs/core_docs/docs/how_to/migrate_agent.ipynb +++ b/docs/core_docs/docs/how_to/migrate_agent.ipynb @@ -1,5 +1,19 @@ { "cells": [ + { + "cell_type": "raw", + "id": "8f21bf6b", + "metadata": { + "vscode": { + "languageId": "raw" + } + }, + "source": [ + "---\n", + "keywords: [create_react_agent, create_react_agent()]\n", + "---" + ] + }, { "cell_type": "markdown", "id": "579c24a2", @@ -12,7 +26,7 @@ "[`AgentExecutor`](https://api.js.langchain.com/classes/langchain_agents.AgentExecutor.html)\n", "in particular) have multiple configuration parameters. In this notebook we will\n", "show how those parameters map to the LangGraph\n", - "[react agent executor](https://langchain-ai.github.io/langgraphjs/reference/functions/prebuilt.createReactAgent.html).\n", + "react agent executor using the [create_react_agent](https://langchain-ai.github.io/langgraphjs/reference/functions/prebuilt.createReactAgent.html) prebuilt helper method.\n", "\n", "For more information on how to build agentic workflows in LangGraph, check out\n", "the [docs here](https://langchain-ai.github.io/langgraphjs/how-tos/).\n", diff --git a/docs/core_docs/docs/how_to/tool_calling.ipynb b/docs/core_docs/docs/how_to/tool_calling.ipynb index 6c9ab2415ca1..00822e7c03e8 100644 --- a/docs/core_docs/docs/how_to/tool_calling.ipynb +++ b/docs/core_docs/docs/how_to/tool_calling.ipynb @@ -9,7 +9,7 @@ }, "source": [ "---\n", - "keywords: [function, function calling, tool, tool calling]\n", + "keywords: [function, function calling, tool, tool call, tool calling]\n", "---" ] },