Add GuardRails for Tool input and output #990

LizeRaes · 2024-10-17T06:46:52Z

The GuardRails are really awesome! It would be nice if we could also have them available to perform a check before executing a tool for these reasons:

it is executing our code, so a vulnerability, I would be happy to run checks before performing eg. database operations
tools break when the parameter syntax isn't respected, and depending on the model quality, it's a typical breaking point. We have withRetries() to mitigate this, but (esp. for dumber local models) it would be awesome if we could send back an error to the model like "argument missing, try again" or "variable 2 should be part of the following Enum: ..., make sure this condition is met and try calling the function again" to the model, and that this testing logic is wrapped in dedicated GuardRails

It would also be nice to have a GuardRail option for the Tool output, eg. to have a final check that no private user info is divulged, etc.

geoand · 2024-10-17T06:48:19Z

Thanks for reporting!

cc @cescoffier

cescoffier · 2024-10-26T10:26:08Z

I like the idea. I think it would require dedicated interfaces as the parameters and behavior would be slightly different.

For the output one, we need to design the resilience patterns we want. You already mentioned retry, but I'm wondering if we need to call the method again or ask the LLM to recompute the tool execution request.

LizeRaes · 2024-10-28T15:25:02Z

I would opt for the LLM to recompute (retry) and have the option to provide a message (like "tool output contained customer email address, make sure to not use this tool to divulge private information" or whatever you want to check for).

Unless I'm overlooking something, I think that logic to call the method again (with same parameters) could be handle within the tool itself, if the output isn't to satisfaction.

There is a related discussion on tool calling going on in langchain4j core repo langchain4j/langchain4j#1997
Maybe we should check what to implement/port from where to where?

cescoffier · 2024-10-29T07:27:12Z

So, right now we have the following sequence of messages:

-> User message
<- Tool execution request
-> Tool execution result
<- Assistant message

When the tool execution failed, what do we have?

-> User message
<- Tool execution request
-> Tool execution failure
<- Assistant message with a finished reason indicating a failure or does it retry the execution request?

The question is about where to insert the guardrails:

could we enhance the execution failure by asking the LLM to send another execution request (would be like a prompt)?
should we catch the error and ask another LLM to fix the failure (like some agentic approach)?

langchain4j · 2024-10-29T08:11:12Z

As far as I understand, guardrails act on inputs and outputs to/from AI Service?
Tool calling is not exposed from AI service and happens internally, so current guardrails cannot catch this, right?
Also, guardrail will have to know specifics of each tool to be able to validate the inputs...

tools break when the parameter syntax isn't respected

I guess cases like this (wrong tool name, wrong tool parameter name or type, etc) should be handled by DefaultToolExecutor automatically. Instead of throwing an exception from ToolExecutor (e.g. when parameters cannot be parsed) we should return this error in plain text so that it is sent to the LLM for recovery.

When the tool execution failed, what do we have?

-> User message
<- Tool execution request
-> Tool execution failure
<- Tool execution request (LLM tries to fix the problem)
...

In this case we should probably implement some "max retries" mechanism to make sure smaller LLMs don't go into an endless loop.

langchain4j · 2024-10-29T08:25:22Z

We also need to distinguish between different types of issues here:

Tool cannot be called (e.g. LLM hallucinated tool name or parameter name/type) -> right now we just fail in this case. But we could automatically send the error to the LLM and it will retry (can be a default behavior, with an option to configure desired strategy)
LLM provided "illegal" (from business point of view) tool inputs -> a guardrail-like mechanism could probably handle this
Tool was called, but thrown an exception -> in this case we already convert exception to text and send it to LLM so it can recover. This seems to work pretty well. But perharps we could make the strategy configurable here
Tool was called, but produced "illegal" output (e.g. sensitive info) -> a guardrail-like mechanism could probably handle this

cescoffier · 2024-10-29T08:47:10Z

Thanks @langchain4j ! That is exactly what I was looking for!

Tool cannot be called (e.g. LLM hallucinated tool name or parameter name/type) -> right now we just fail in this case. But we could automatically send the error to the LLM and it will retry (can be a default behavior, with an option to configure desired strategy)

A pre-tools guardrail could handle this and decide what to do.

LLM provided "illegal" (from business point of view) tool inputs -> a guardrail-like mechanism could probably handle this

Yes, a pre-tools guardrail can handle this case.

Tool was called, but thrown an exception -> in this case we already convert exception to text and send it to LLM so it can recover. This seems to work pretty well. But perharps we could make the strategy configurable here

We could imagine having a post-tools guardrail that can update the message and "guide" the LLM

Tool was called, but produced "illegal" output (e.g. sensitive info) -> a guardrail-like mechanism could probably handle this

Yes, a post-tools guardrail can handle this.

langchain4j · 2024-10-29T08:56:42Z

@cescoffier are there pre- and post-tools guardrails already? Or is this just a concept?

Tool cannot be called (e.g. LLM hallucinated tool name or parameter name/type) -> right now we just fail in this case. But we could automatically send the error to the LLM and it will retry (can be a default behavior, with an option to configure desired strategy)

A pre-tools guardrail could handle this and decide what to do.

If we go this way, this should be an out-of-the-box guardrail that users could just use without the need to implement it themselves

cescoffier · 2024-10-29T09:46:17Z

are there pre- and post-tools guardrails already? Or is this just a concept?

It's just a concept for now. As I modify how Quarkus invokes tools, I can easily implement it - well except maybe in the virtual thread case.

If we go this way, this should be an out-of-the-box guardrail that users could just use without the need to implement it themselves

Yes, or having a default strategy or disabling it when guardrails are used. It's not clear what's best for now.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add GuardRails for Tool input and output #990

Add GuardRails for Tool input and output #990

LizeRaes commented Oct 17, 2024

geoand commented Oct 17, 2024

cescoffier commented Oct 26, 2024

LizeRaes commented Oct 28, 2024

cescoffier commented Oct 29, 2024

langchain4j commented Oct 29, 2024 •

edited

Loading

langchain4j commented Oct 29, 2024

cescoffier commented Oct 29, 2024

langchain4j commented Oct 29, 2024

cescoffier commented Oct 29, 2024

Add GuardRails for Tool input and output #990

Add GuardRails for Tool input and output #990

Comments

LizeRaes commented Oct 17, 2024

geoand commented Oct 17, 2024

cescoffier commented Oct 26, 2024

LizeRaes commented Oct 28, 2024

cescoffier commented Oct 29, 2024

langchain4j commented Oct 29, 2024 • edited Loading

langchain4j commented Oct 29, 2024

cescoffier commented Oct 29, 2024

langchain4j commented Oct 29, 2024

cescoffier commented Oct 29, 2024

langchain4j commented Oct 29, 2024 •

edited

Loading