Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add GuardRails for Tool input and output #990

Open
LizeRaes opened this issue Oct 17, 2024 · 9 comments
Open

Add GuardRails for Tool input and output #990

LizeRaes opened this issue Oct 17, 2024 · 9 comments

Comments

@LizeRaes
Copy link

The GuardRails are really awesome! It would be nice if we could also have them available to perform a check before executing a tool for these reasons:

  • it is executing our code, so a vulnerability, I would be happy to run checks before performing eg. database operations
  • tools break when the parameter syntax isn't respected, and depending on the model quality, it's a typical breaking point. We have withRetries() to mitigate this, but (esp. for dumber local models) it would be awesome if we could send back an error to the model like "argument missing, try again" or "variable 2 should be part of the following Enum: ..., make sure this condition is met and try calling the function again" to the model, and that this testing logic is wrapped in dedicated GuardRails

It would also be nice to have a GuardRail option for the Tool output, eg. to have a final check that no private user info is divulged, etc.

@geoand
Copy link
Collaborator

geoand commented Oct 17, 2024

Thanks for reporting!

cc @cescoffier

@cescoffier
Copy link
Collaborator

I like the idea. I think it would require dedicated interfaces as the parameters and behavior would be slightly different.

For the output one, we need to design the resilience patterns we want. You already mentioned retry, but I'm wondering if we need to call the method again or ask the LLM to recompute the tool execution request.

@LizeRaes
Copy link
Author

I would opt for the LLM to recompute (retry) and have the option to provide a message (like "tool output contained customer email address, make sure to not use this tool to divulge private information" or whatever you want to check for).

Unless I'm overlooking something, I think that logic to call the method again (with same parameters) could be handle within the tool itself, if the output isn't to satisfaction.

There is a related discussion on tool calling going on in langchain4j core repo langchain4j/langchain4j#1997
Maybe we should check what to implement/port from where to where?

@cescoffier
Copy link
Collaborator

So, right now we have the following sequence of messages:

-> User message
<- Tool execution request
-> Tool execution result
<- Assistant message

When the tool execution failed, what do we have?

-> User message
<- Tool execution request
-> Tool execution failure
<- Assistant message with a finished reason indicating a failure or does it retry the execution request?

The question is about where to insert the guardrails:

  • could we enhance the execution failure by asking the LLM to send another execution request (would be like a prompt)?
  • should we catch the error and ask another LLM to fix the failure (like some agentic approach)?

@langchain4j
Copy link
Collaborator

langchain4j commented Oct 29, 2024

As far as I understand, guardrails act on inputs and outputs to/from AI Service?
Tool calling is not exposed from AI service and happens internally, so current guardrails cannot catch this, right?
Also, guardrail will have to know specifics of each tool to be able to validate the inputs...

tools break when the parameter syntax isn't respected

I guess cases like this (wrong tool name, wrong tool parameter name or type, etc) should be handled by DefaultToolExecutor automatically. Instead of throwing an exception from ToolExecutor (e.g. when parameters cannot be parsed) we should return this error in plain text so that it is sent to the LLM for recovery.

When the tool execution failed, what do we have?

-> User message
<- Tool execution request
-> Tool execution failure
<- Tool execution request (LLM tries to fix the problem)
...

In this case we should probably implement some "max retries" mechanism to make sure smaller LLMs don't go into an endless loop.

@langchain4j
Copy link
Collaborator

We also need to distinguish between different types of issues here:

  • Tool cannot be called (e.g. LLM hallucinated tool name or parameter name/type) -> right now we just fail in this case. But we could automatically send the error to the LLM and it will retry (can be a default behavior, with an option to configure desired strategy)
  • LLM provided "illegal" (from business point of view) tool inputs -> a guardrail-like mechanism could probably handle this
  • Tool was called, but thrown an exception -> in this case we already convert exception to text and send it to LLM so it can recover. This seems to work pretty well. But perharps we could make the strategy configurable here
  • Tool was called, but produced "illegal" output (e.g. sensitive info) -> a guardrail-like mechanism could probably handle this

@cescoffier
Copy link
Collaborator

Thanks @langchain4j ! That is exactly what I was looking for!

Tool cannot be called (e.g. LLM hallucinated tool name or parameter name/type) -> right now we just fail in this case. But we could automatically send the error to the LLM and it will retry (can be a default behavior, with an option to configure desired strategy)

A pre-tools guardrail could handle this and decide what to do.

LLM provided "illegal" (from business point of view) tool inputs -> a guardrail-like mechanism could probably handle this

Yes, a pre-tools guardrail can handle this case.

Tool was called, but thrown an exception -> in this case we already convert exception to text and send it to LLM so it can recover. This seems to work pretty well. But perharps we could make the strategy configurable here

We could imagine having a post-tools guardrail that can update the message and "guide" the LLM

Tool was called, but produced "illegal" output (e.g. sensitive info) -> a guardrail-like mechanism could probably handle this

Yes, a post-tools guardrail can handle this.

@langchain4j
Copy link
Collaborator

@cescoffier are there pre- and post-tools guardrails already? Or is this just a concept?

Tool cannot be called (e.g. LLM hallucinated tool name or parameter name/type) -> right now we just fail in this case. But we could automatically send the error to the LLM and it will retry (can be a default behavior, with an option to configure desired strategy)

A pre-tools guardrail could handle this and decide what to do.

If we go this way, this should be an out-of-the-box guardrail that users could just use without the need to implement it themselves

@cescoffier
Copy link
Collaborator

are there pre- and post-tools guardrails already? Or is this just a concept?

It's just a concept for now. As I modify how Quarkus invokes tools, I can easily implement it - well except maybe in the virtual thread case.

If we go this way, this should be an out-of-the-box guardrail that users could just use without the need to implement it themselves

Yes, or having a default strategy or disabling it when guardrails are used. It's not clear what's best for now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants