[RFC] RepC and Tool selection in chatbot #2151

xinyual · 2024-02-22T08:16:40Z

Background

Currently, in OS assistent chatbot, we use a full prompt to get <thought, action, tool argument>,

thought: the text elaborating of thoughts about next steps.
action: the decision of which tool to select
action_input: the input argument for the selected tool to execute with

We conduct some experiments to evaluate current tool selection performance. We calculate the chatbot's average accuracy in selecting the correct tool from among all visual tools, based on our dataset. The average accuracy is only 70%.

We has proposed a tool selection method called RepC: Briefly, it formats a Prompt and use LLM to generate embedding vector, then uses a SVM to classify result. The experiments show it can improve the average accuracy from 70% to 94%.

To leverage the RepC, we need to refactor the agent execution pipeline (mainly for reactive agent), breaking down the tool prompt execution step in each iteration into multiple steps, which may consists of thought extraction + tool selection + argument generation. Meanwhile we would like to provide agent parameters to let the users choose between single prompt mode and tool selection + augment mode. We will formulate the tools selection part into a prompt execution process and the RepC itself would be an endpoint in sagemaker.

Experiment result

We collect some questions and label the question with correct tool manually.
For RepC, we input the question and check whether the output tool is equal to label.
For current chatbot, we config 8 tools inside chatbot and then ask question. We use trace id to check whether it calls the correct tool.

	RepC	Current	Total case number
Average Accuracy	94.99	70.71	519
CatIndexTool	97.97	98.33	180
SearchAlertsTool	96.1	92.86	42
VisualizationTool	72.73	100	9
SearchAnomalyDetectorsTool	85.71	89.47	19
SearchAnomalyResultsTool	72.73	100	10
SearchMonitorsTool	100.0	100	8
PPLTool	91.72	74.07	81
RAGTool	95.36	14.11	170

Introduction to RepC

LLM is a text-generation model but can also be used as an embedding model if we only utilize some parts of the model.
RepC firstly formats a prompt using context and question Then it uses some parts of the LLM to generate an embedding vector. A trained SVM is applied after embedding to classify the embedding vector into different labels (here the labels are different tools.) Thus, RepC can be used for tool selection.

We conduct experiments by configuring all tools to be included in AOS assistent. The total RepC method takes user-question as input and output the tool name. The LLM here is a public mistral model under Apache 2.0 license. The SVM is trained by a small number of samples but the performance is excellent.

To use RepC, we deploy the mistral LLM and SVM on sagemaker together.

Limitation of RepC

RepC has excellent performance but the SVM inside RepC is fixed. That means, it can only handle the tools in training. Here, we train the SVM with the tools which will be in AOS assistent. When RepC meets new tools, it will need to retrain.

Architecture proposals

Currently, we use a full prompt to generate <thought/ final_answer, action, action_input> together. If it is final answer, it will return to customer directly, otherwise, it will call tool next.

Pros:

Lower latency
Ability to handle new tools/customer’s own tool
Ability to handle multi-round problem.
Ability to handle problems which requires multi-tools.

Cons:

Lower accuracy according to our experiment.

option 1

For option 1, we generate thought, action, action_input step by step. It goes through the following steps.

<Common prefix, User input, Chat history, Tool response> → LLM → <Thought / Final Answer>
a. If it is final answer, directly return thought
<Thought, All tool descriptions> → RepC endpoint → Action
<Argument generation prefix, Thought, Single tool description> → LLM → tool argument

Pros:

Higher accuracy.
Ability to handle multi-round problem.
Ability to handle problems which requires multi-tools.

Cons:

Cannot handle new tools/customer’s own tool because the RepC endpoint’s model is a simple SVM and fixed.
Higher latency. Currently we need to call LLM twice and RepC endpoint once in a round.
Code and test error. We need to change agent runner code, and it will takes huge effort to test whether it breaks the original logic.

API changes

In request body, we need a extra parameters inside FIELD parameters to let LLM know which tool selection logic it should use.

{
  "name": "Root agent-4",
  "type": "conversational", 
  "description": "this is a test agent",
  "llm": {
    ...
  },
  "memory": {
    "type": "conversation_index"
  },
  "parameters": {
    "tool_selection": {
        "type":  "original/repC",
        "model_id": <>,
    }
  }
  "tools": [
     ...
  ],
  "app_type": "my app"
}

option 2

For option 2, we leverage the FIELD selected_tools in chat agent. It will limit the tool candidates for one question execution. When customer inputs a question, we firstly input chat history with current question to repC endpoint and get candidate tool(currently it will only output one tool.) Then the chatbot will execute question as original logic. To finish this, we need to register a flow agent including MLModelTool(to call RepC) and original root agent.

Pros:

Higher accuracy.
Minor code change. This option is very compatible with current code, and will not break current logic since RepC will be defined in config.
Almost since latency. We only need to call RepC endpoint one time inside one interaction with chatbot. The additional latency would be very small.

Cons:

Cannot handle new tools/customer’s own tool because the RepC endpoint’s model is a simple SVM and fixed.
RepC here is only the filter to all tools. After that, whether to choose tool/choose which tool/tool execution order are still under LLM's decision. According to current observation and experiment, we cannot guarantee the performance.
Need to collect data of multi-round questions. RepC will use chat history plus current question to choose correct tool, we need to collect data and retrain again.
It cannot handle problems which requires multi-tools because RepC currently only support one-tool output. But if we have enough data, we can also have a try on retraining SVM.

API changes

There is no additional API change for option 2. But we need to include Root Agent inside a flow agent like.

{
  "name": "Flow agent",
  "type": "flow",
  "description": "root agent plus repC",
  "memory": {
    "type": "demo"
  },
  "tools": [
    {
      "type": "MLModelTool",
      "name": "RepCToolSelection",
      "description": "repc",
      "parameters": {
        "model_id": "<repc endpoint model id>",
        "prompt": "${parameters.question}"
      }
    },
      {
      "type": "AgentTool",
      "name": "RootAgent",
      "description": "Use this tool to transfer natural language to generate PPL and execute PPL to query inside. Use this tool after you know the index name, otherwise, call IndexRoutingTool first. The input parameters are: {index:IndexName, question:UserQuestion}",
      "parameters": {
        "agent_id": "FWfEzo0BvekZqr5T6fmT",
        "selected_tools": "[\"${parameters.RepCToolSelection.output}\"]"
      }
    }
  ]
}

The text was updated successfully, but these errors were encountered:

xinyual · 2024-02-22T10:01:22Z

For option 1, the draft PR is: #2152
For option 2, the draft PR is: #2150
We prefer the option 1 since in option 2, the RepC is just a filter to tools. But in option 1, RepC's output will the action so we can totally leverage the advantage of RepC.
@ylwu-amzn Please have a check on the code the Option 1's draft PR when you have a time, and see whether it will break the original logic.

ylwu-amzn · 2024-02-27T18:49:45Z

I would suggest close this for now as some part is not ready for open source.

ylwu-amzn · 2024-02-28T00:25:07Z

From the test result, the current solution accuracy is worse just for these two tools : PPLTool and RAGTool. I guess main reason is the description of these tools is not easy for LLM to understand. Let's try to fine tune the description first ?

--	RepC	Current
PPLTool	91.72	74.07
RAGTool	95.36	14.11

xinyual added enhancement New feature or request untriaged labels Feb 22, 2024

xinyual mentioned this issue Feb 22, 2024

RepC first option #2152

Draft

5 tasks

Zhangxunmt removed the untriaged label Feb 27, 2024

ylwu-amzn closed this as completed Feb 27, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[RFC] RepC and Tool selection in chatbot #2151

[RFC] RepC and Tool selection in chatbot #2151

xinyual commented Feb 22, 2024 •

edited

Loading

xinyual commented Feb 22, 2024

ylwu-amzn commented Feb 27, 2024

ylwu-amzn commented Feb 28, 2024 •

edited

Loading

[RFC] RepC and Tool selection in chatbot #2151

[RFC] RepC and Tool selection in chatbot #2151

Comments

xinyual commented Feb 22, 2024 • edited Loading

Background

Experiment result

Introduction to RepC

Limitation of RepC

Architecture proposals

option 1

Pros:

Cons:

API changes

option 2

Pros:

Cons:

API changes

xinyual commented Feb 22, 2024

ylwu-amzn commented Feb 27, 2024

ylwu-amzn commented Feb 28, 2024 • edited Loading

xinyual commented Feb 22, 2024 •

edited

Loading

ylwu-amzn commented Feb 28, 2024 •

edited

Loading