You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Currently, in OS assistent chatbot, we use a full prompt to get <thought, action, tool argument>,
thought: the text elaborating of thoughts about next steps.
action: the decision of which tool to select
action_input: the input argument for the selected tool to execute with
We conduct some experiments to evaluate current tool selection performance. We calculate the chatbot's average accuracy in selecting the correct tool from among all visual tools, based on our dataset. The average accuracy is only 70%.
We has proposed a tool selection method called RepC: Briefly, it formats a Prompt and use LLM to generate embedding vector, then uses a SVM to classify result. The experiments show it can improve the average accuracy from 70% to 94%.
To leverage the RepC, we need to refactor the agent execution pipeline (mainly for reactive agent), breaking down the tool prompt execution step in each iteration into multiple steps, which may consists of thought extraction + tool selection + argument generation. Meanwhile we would like to provide agent parameters to let the users choose between single prompt mode and tool selection + augment mode. We will formulate the tools selection part into a prompt execution process and the RepC itself would be an endpoint in sagemaker.
Experiment result
We collect some questions and label the question with correct tool manually.
For RepC, we input the question and check whether the output tool is equal to label.
For current chatbot, we config 8 tools inside chatbot and then ask question. We use trace id to check whether it calls the correct tool.
RepC
Current
Total case number
Average Accuracy
94.99
70.71
519
CatIndexTool
97.97
98.33
180
SearchAlertsTool
96.1
92.86
42
VisualizationTool
72.73
100
9
SearchAnomalyDetectorsTool
85.71
89.47
19
SearchAnomalyResultsTool
72.73
100
10
SearchMonitorsTool
100.0
100
8
PPLTool
91.72
74.07
81
RAGTool
95.36
14.11
170
Introduction to RepC
LLM is a text-generation model but can also be used as an embedding model if we only utilize some parts of the model.
RepC firstly formats a prompt using context and question Then it uses some parts of the LLM to generate an embedding vector. A trained SVM is applied after embedding to classify the embedding vector into different labels (here the labels are different tools.) Thus, RepC can be used for tool selection.
We conduct experiments by configuring all tools to be included in AOS assistent. The total RepC method takes user-question as input and output the tool name. The LLM here is a public mistral model under Apache 2.0 license. The SVM is trained by a small number of samples but the performance is excellent.
To use RepC, we deploy the mistral LLM and SVM on sagemaker together.
Limitation of RepC
RepC has excellent performance but the SVM inside RepC is fixed. That means, it can only handle the tools in training. Here, we train the SVM with the tools which will be in AOS assistent. When RepC meets new tools, it will need to retrain.
Architecture proposals
Currently, we use a full prompt to generate <thought/ final_answer, action, action_input> together. If it is final answer, it will return to customer directly, otherwise, it will call tool next.
Pros:
Lower latency
Ability to handle new tools/customer’s own tool
Ability to handle multi-round problem.
Ability to handle problems which requires multi-tools.
Cons:
Lower accuracy according to our experiment.
option 1
For option 1, we generate thought, action, action_input step by step. It goes through the following steps.
<Common prefix, User input, Chat history, Tool response> → LLM → <Thought / Final Answer>
a. If it is final answer, directly return thought
<Thought, All tool descriptions> → RepC endpoint → Action
Ability to handle problems which requires multi-tools.
Cons:
Cannot handle new tools/customer’s own tool because the RepC endpoint’s model is a simple SVM and fixed.
Higher latency. Currently we need to call LLM twice and RepC endpoint once in a round.
Code and test error. We need to change agent runner code, and it will takes huge effort to test whether it breaks the original logic.
API changes
In request body, we need a extra parameters inside FIELD parameters to let LLM know which tool selection logic it should use.
{
"name": "Root agent-4",
"type": "conversational",
"description": "this is a test agent",
"llm": {
...
},
"memory": {
"type": "conversation_index"
},
"parameters": {
"tool_selection": {
"type": "original/repC",
"model_id": <>,
}
}
"tools": [
...
],
"app_type": "my app"
}
option 2
For option 2, we leverage the FIELD selected_tools in chat agent. It will limit the tool candidates for one question execution. When customer inputs a question, we firstly input chat history with current question to repC endpoint and get candidate tool(currently it will only output one tool.) Then the chatbot will execute question as original logic. To finish this, we need to register a flow agent including MLModelTool(to call RepC) and original root agent.
Pros:
Higher accuracy.
Minor code change. This option is very compatible with current code, and will not break current logic since RepC will be defined in config.
Almost since latency. We only need to call RepC endpoint one time inside one interaction with chatbot. The additional latency would be very small.
Cons:
Cannot handle new tools/customer’s own tool because the RepC endpoint’s model is a simple SVM and fixed.
RepC here is only the filter to all tools. After that, whether to choose tool/choose which tool/tool execution order are still under LLM's decision. According to current observation and experiment, we cannot guarantee the performance.
Need to collect data of multi-round questions. RepC will use chat history plus current question to choose correct tool, we need to collect data and retrain again.
It cannot handle problems which requires multi-tools because RepC currently only support one-tool output. But if we have enough data, we can also have a try on retraining SVM.
API changes
There is no additional API change for option 2. But we need to include Root Agent inside a flow agent like.
{
"name": "Flow agent",
"type": "flow",
"description": "root agent plus repC",
"memory": {
"type": "demo"
},
"tools": [
{
"type": "MLModelTool",
"name": "RepCToolSelection",
"description": "repc",
"parameters": {
"model_id": "<repc endpoint model id>",
"prompt": "${parameters.question}"
}
},
{
"type": "AgentTool",
"name": "RootAgent",
"description": "Use this tool to transfer natural language to generate PPL and execute PPL to query inside. Use this tool after you know the index name, otherwise, call IndexRoutingTool first. The input parameters are: {index:IndexName, question:UserQuestion}",
"parameters": {
"agent_id": "FWfEzo0BvekZqr5T6fmT",
"selected_tools": "[\"${parameters.RepCToolSelection.output}\"]"
}
}
]
}
The text was updated successfully, but these errors were encountered:
For option 1, the draft PR is: #2152
For option 2, the draft PR is: #2150
We prefer the option 1 since in option 2, the RepC is just a filter to tools. But in option 1, RepC's output will the action so we can totally leverage the advantage of RepC. @ylwu-amzn Please have a check on the code the Option 1's draft PR when you have a time, and see whether it will break the original logic.
From the test result, the current solution accuracy is worse just for these two tools : PPLTool and RAGTool. I guess main reason is the description of these tools is not easy for LLM to understand. Let's try to fine tune the description first ?
Background
Currently, in OS assistent chatbot, we use a full prompt to get <thought, action, tool argument>,
We conduct some experiments to evaluate current tool selection performance. We calculate the chatbot's average accuracy in selecting the correct tool from among all visual tools, based on our dataset. The average accuracy is only 70%.
We has proposed a tool selection method called RepC: Briefly, it formats a Prompt and use LLM to generate embedding vector, then uses a SVM to classify result. The experiments show it can improve the average accuracy from 70% to 94%.
To leverage the RepC, we need to refactor the agent execution pipeline (mainly for reactive agent), breaking down the tool prompt execution step in each iteration into multiple steps, which may consists of thought extraction + tool selection + argument generation. Meanwhile we would like to provide agent parameters to let the users choose between single prompt mode and tool selection + augment mode. We will formulate the tools selection part into a prompt execution process and the RepC itself would be an endpoint in sagemaker.
Experiment result
We collect some questions and label the question with correct tool manually.
For RepC, we input the question and check whether the output tool is equal to label.
For current chatbot, we config 8 tools inside chatbot and then ask question. We use trace id to check whether it calls the correct tool.
Introduction to RepC
LLM is a text-generation model but can also be used as an embedding model if we only utilize some parts of the model.
RepC firstly formats a prompt using context and question Then it uses some parts of the LLM to generate an embedding vector. A trained SVM is applied after embedding to classify the embedding vector into different labels (here the labels are different tools.) Thus, RepC can be used for tool selection.
We conduct experiments by configuring all tools to be included in AOS assistent. The total RepC method takes user-question as input and output the tool name. The LLM here is a public mistral model under Apache 2.0 license. The SVM is trained by a small number of samples but the performance is excellent.
To use RepC, we deploy the mistral LLM and SVM on sagemaker together.
Limitation of RepC
RepC has excellent performance but the SVM inside RepC is fixed. That means, it can only handle the tools in training. Here, we train the SVM with the tools which will be in AOS assistent. When RepC meets new tools, it will need to retrain.
Architecture proposals
Currently, we use a full prompt to generate <thought/ final_answer, action, action_input> together. If it is final answer, it will return to customer directly, otherwise, it will call tool next.
Pros:
Cons:
option 1
For option 1, we generate thought, action, action_input step by step. It goes through the following steps.
a. If it is final answer, directly return thought
Pros:
Cons:
API changes
In request body, we need a extra parameters inside FIELD
parameters
to let LLM know which tool selection logic it should use.option 2
For option 2, we leverage the FIELD
selected_tools
in chat agent. It will limit the tool candidates for one question execution. When customer inputs a question, we firstly input chat history with current question to repC endpoint and get candidate tool(currently it will only output one tool.) Then the chatbot will execute question as original logic. To finish this, we need to register a flow agent including MLModelTool(to call RepC) and original root agent.Pros:
Cons:
API changes
There is no additional API change for option 2. But we need to include Root Agent inside a flow agent like.
The text was updated successfully, but these errors were encountered: