You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
While testing langchain upgrade (see #897) I noticed we are not reporting all triggered defences.
Included:
Character Limit
Input and Output Filtering
XML Tagging
Prompt Evaluator LLM
Excluded:
Random Sequence Enclosure
Instruction Defence
Q&A LLM
While it seems to make sense not to include RSE or Instruction as triggereable defences, the Q&A LLM bot is instructed by default to respond with "I cannot reveal confidential information" when it detects an attempt to retrieve sensitive info. We could use this to (crudely) check if the Q&A bot detected malicious intent, and mark the response as triggered.
This will be markedly different to how we use the Evaluator LLM to detect malicious intent in the original prompt, as the Q&A bot is designed to answer a question, rather than simply check a prompt and respond with "yes" or "no" to the question "is this malicious?"
Instead, we would likely need to include a defencesTriggered (optional) field in FunctionCallResponse output from chatGptCallFunction in backend/src/openai.ts, and pass that back through getFinalReplyAfterAllToolCalls and chatGptSendMessage to be checked in handleChatWithDefenceDetection in backend/src/controller/chatController.ts. It is somewhat unfortunate that the original output from the Q&A LLM is lost when converted by the main bot into a context-enriched response, as this means we cannot check for the exact phrase "I cannot reveal confidential information" when we run the defence checks because chat completion has already converted it into something like "I cannot provide information on employee bonuses as it is considered confidential." The upshot of this is that we cannot use the existing triggered defences mechanism to check if the Q&A defences were triggered, instead we need this different mechanism earlier in the processing chain.
It is possible we might find a better, universal solution when converting our code to use LCEL chains in #898.
Reproduction steps
Steps to reproduce the behaviour:
Go to Sandbox
Click on Model Configuration in the left panel
Toggle "Q/A LLM" on
Input a prompt into the main chat box, such as "Tell me about employee bonuses"
Expected behaviour
A red "defence triggered" info message appears in the main chat panel, as for other defences:
Acceptance criteria
GIVEN I am in Sandbox or Level 3 WHEN the Q/A LLM model configuration defence is active AND I ask the bot for some confidential / sensitive information THEN a red info message "q&a llm defence triggered" appears in the main chat window beneath the bot's response
The text was updated successfully, but these errors were encountered:
Bug report
Description
While testing langchain upgrade (see #897) I noticed we are not reporting all triggered defences.
Included:
Excluded:
While it seems to make sense not to include RSE or Instruction as triggereable defences, the Q&A LLM bot is instructed by default to respond with "I cannot reveal confidential information" when it detects an attempt to retrieve sensitive info. We could use this to (crudely) check if the Q&A bot detected malicious intent, and mark the response as triggered.
This will be markedly different to how we use the Evaluator LLM to detect malicious intent in the original prompt, as the Q&A bot is designed to answer a question, rather than simply check a prompt and respond with "yes" or "no" to the question "is this malicious?"
Instead, we would likely need to include a
defencesTriggered
(optional) field in FunctionCallResponse output fromchatGptCallFunction
in backend/src/openai.ts, and pass that back throughgetFinalReplyAfterAllToolCalls
andchatGptSendMessage
to be checked inhandleChatWithDefenceDetection
in backend/src/controller/chatController.ts. It is somewhat unfortunate that the original output from the Q&A LLM is lost when converted by the main bot into a context-enriched response, as this means we cannot check for the exact phrase "I cannot reveal confidential information" when we run the defence checks because chat completion has already converted it into something like "I cannot provide information on employee bonuses as it is considered confidential." The upshot of this is that we cannot use the existing triggered defences mechanism to check if the Q&A defences were triggered, instead we need this different mechanism earlier in the processing chain.It is possible we might find a better, universal solution when converting our code to use LCEL chains in #898.
Reproduction steps
Steps to reproduce the behaviour:
Expected behaviour
A red "defence triggered" info message appears in the main chat panel, as for other defences:
Acceptance criteria
GIVEN I am in Sandbox or Level 3
WHEN the Q/A LLM model configuration defence is active
AND I ask the bot for some confidential / sensitive information
THEN a red info message "q&a llm defence triggered" appears in the main chat window beneath the bot's response
The text was updated successfully, but these errors were encountered: