Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RFC] RepC and Tool selection in chatbot #2151

Closed
xinyual opened this issue Feb 22, 2024 · 3 comments
Closed

[RFC] RepC and Tool selection in chatbot #2151

xinyual opened this issue Feb 22, 2024 · 3 comments
Labels
enhancement New feature or request

Comments

@xinyual
Copy link
Collaborator

xinyual commented Feb 22, 2024

Background

Currently, in OS assistent chatbot, we use a full prompt to get <thought, action, tool argument>,

  1. thought: the text elaborating of thoughts about next steps.
  2. action: the decision of which tool to select
  3. action_input: the input argument for the selected tool to execute with

We conduct some experiments to evaluate current tool selection performance. We calculate the chatbot's average accuracy in selecting the correct tool from among all visual tools, based on our dataset. The average accuracy is only 70%.

We has proposed a tool selection method called RepC: Briefly, it formats a Prompt and use LLM to generate embedding vector, then uses a SVM to classify result. The experiments show it can improve the average accuracy from 70% to 94%.

To leverage the RepC, we need to refactor the agent execution pipeline (mainly for reactive agent), breaking down the tool prompt execution step in each iteration into multiple steps, which may consists of thought extraction + tool selection + argument generation. Meanwhile we would like to provide agent parameters to let the users choose between single prompt mode and tool selection + augment mode. We will formulate the tools selection part into a prompt execution process and the RepC itself would be an endpoint in sagemaker.

Experiment result

We collect some questions and label the question with correct tool manually.
For RepC, we input the question and check whether the output tool is equal to label.
For current chatbot, we config 8 tools inside chatbot and then ask question. We use trace id to check whether it calls the correct tool.

RepC Current Total case number
Average Accuracy 94.99 70.71 519
CatIndexTool 97.97 98.33 180
SearchAlertsTool 96.1 92.86 42
VisualizationTool 72.73 100 9
SearchAnomalyDetectorsTool 85.71 89.47 19
SearchAnomalyResultsTool 72.73 100 10
SearchMonitorsTool 100.0 100 8
PPLTool 91.72 74.07 81
RAGTool 95.36 14.11 170

Introduction to RepC

LLM is a text-generation model but can also be used as an embedding model if we only utilize some parts of the model.
RepC firstly formats a prompt using context and question Then it uses some parts of the LLM to generate an embedding vector. A trained SVM is applied after embedding to classify the embedding vector into different labels (here the labels are different tools.) Thus, RepC can be used for tool selection.

We conduct experiments by configuring all tools to be included in AOS assistent. The total RepC method takes user-question as input and output the tool name. The LLM here is a public mistral model under Apache 2.0 license. The SVM is trained by a small number of samples but the performance is excellent.

To use RepC, we deploy the mistral LLM and SVM on sagemaker together.

Limitation of RepC

RepC has excellent performance but the SVM inside RepC is fixed. That means, it can only handle the tools in training. Here, we train the SVM with the tools which will be in AOS assistent. When RepC meets new tools, it will need to retrain.

Architecture proposals

Currently, we use a full prompt to generate <thought/ final_answer, action, action_input> together. If it is final answer, it will return to customer directly, otherwise, it will call tool next.

image

Pros:

  1. Lower latency
  2. Ability to handle new tools/customer’s own tool
  3. Ability to handle multi-round problem.
  4. Ability to handle problems which requires multi-tools.

Cons:

  1. Lower accuracy according to our experiment.

option 1

image

For option 1, we generate thought, action, action_input step by step. It goes through the following steps.

  1. <Common prefix, User input, Chat history, Tool response> → LLM → <Thought / Final Answer>
    a. If it is final answer, directly return thought
  2. <Thought, All tool descriptions> → RepC endpoint → Action
  3. <Argument generation prefix, Thought, Single tool description> → LLM → tool argument

Pros:

  1. Higher accuracy.
  2. Ability to handle multi-round problem.
  3. Ability to handle problems which requires multi-tools.

Cons:

  1. Cannot handle new tools/customer’s own tool because the RepC endpoint’s model is a simple SVM and fixed.
  2. Higher latency. Currently we need to call LLM twice and RepC endpoint once in a round.
  3. Code and test error. We need to change agent runner code, and it will takes huge effort to test whether it breaks the original logic.

API changes

In request body, we need a extra parameters inside FIELD parameters to let LLM know which tool selection logic it should use.

{
  "name": "Root agent-4",
  "type": "conversational", 
  "description": "this is a test agent",
  "llm": {
    ...
  },
  "memory": {
    "type": "conversation_index"
  },
  "parameters": {
    "tool_selection": {
        "type":  "original/repC",
        "model_id": <>,
    }
  }
  "tools": [
     ...
  ],
  "app_type": "my app"
}

option 2

image

For option 2, we leverage the FIELD selected_tools in chat agent. It will limit the tool candidates for one question execution. When customer inputs a question, we firstly input chat history with current question to repC endpoint and get candidate tool(currently it will only output one tool.) Then the chatbot will execute question as original logic. To finish this, we need to register a flow agent including MLModelTool(to call RepC) and original root agent.

Pros:

  1. Higher accuracy.
  2. Minor code change. This option is very compatible with current code, and will not break current logic since RepC will be defined in config.
  3. Almost since latency. We only need to call RepC endpoint one time inside one interaction with chatbot. The additional latency would be very small.

Cons:

  1. Cannot handle new tools/customer’s own tool because the RepC endpoint’s model is a simple SVM and fixed.
  2. RepC here is only the filter to all tools. After that, whether to choose tool/choose which tool/tool execution order are still under LLM's decision. According to current observation and experiment, we cannot guarantee the performance.
  3. Need to collect data of multi-round questions. RepC will use chat history plus current question to choose correct tool, we need to collect data and retrain again.
  4. It cannot handle problems which requires multi-tools because RepC currently only support one-tool output. But if we have enough data, we can also have a try on retraining SVM.

API changes

There is no additional API change for option 2. But we need to include Root Agent inside a flow agent like.

{
  "name": "Flow agent",
  "type": "flow",
  "description": "root agent plus repC",
  "memory": {
    "type": "demo"
  },
  "tools": [
    {
      "type": "MLModelTool",
      "name": "RepCToolSelection",
      "description": "repc",
      "parameters": {
        "model_id": "<repc endpoint model id>",
        "prompt": "${parameters.question}"
      }
    },
      {
      "type": "AgentTool",
      "name": "RootAgent",
      "description": "Use this tool to transfer natural language to generate PPL and execute PPL to query inside. Use this tool after you know the index name, otherwise, call IndexRoutingTool first. The input parameters are: {index:IndexName, question:UserQuestion}",
      "parameters": {
        "agent_id": "FWfEzo0BvekZqr5T6fmT",
        "selected_tools": "[\"${parameters.RepCToolSelection.output}\"]"
      }
    }
  ]
}
@xinyual xinyual added enhancement New feature or request untriaged labels Feb 22, 2024
@xinyual xinyual mentioned this issue Feb 22, 2024
5 tasks
@xinyual
Copy link
Collaborator Author

xinyual commented Feb 22, 2024

For option 1, the draft PR is: #2152
For option 2, the draft PR is: #2150
We prefer the option 1 since in option 2, the RepC is just a filter to tools. But in option 1, RepC's output will the action so we can totally leverage the advantage of RepC.
@ylwu-amzn Please have a check on the code the Option 1's draft PR when you have a time, and see whether it will break the original logic.

@ylwu-amzn
Copy link
Collaborator

I would suggest close this for now as some part is not ready for open source.

@ylwu-amzn
Copy link
Collaborator

ylwu-amzn commented Feb 28, 2024

From the test result, the current solution accuracy is worse just for these two tools : PPLTool and RAGTool. I guess main reason is the description of these tools is not easy for LLM to understand. Let's try to fine tune the description first ?

-- RepC Current
PPLTool 91.72 74.07
RAGTool 95.36 14.11

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
Development

No branches or pull requests

3 participants