Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow to rewrite LLM result in an OutputGuardrail #1021

Merged
merged 1 commit into from
Nov 3, 2024

Conversation

mariofusco
Copy link
Contributor

@mariofusco mariofusco commented Oct 29, 2024

@mariofusco mariofusco requested a review from a team as a code owner October 29, 2024 16:26
@geoand geoand requested a review from cescoffier October 30, 2024 06:06
@cescoffier
Copy link
Collaborator

I looked at my note as I remembered looking into this in the first implementation. There is a philosophical question that I was not able to crack.

Let's imagine 3 output guardrails: A, B, C.

  • A pass - all good
  • B pass but modify the output value
  • C fails and reprompt

Now the problem is that we are going to reprompt based on a modified value, not the original value. So, the LLM may answer the same, or actually, do not understand the reprompt instruction and start hallucinating.

We could imagine modifying the LLM response in the context to update the value to the modified value, but again, the LLM may see this and can freaks out.

Once possibility is that only the last guardrails can change the value.
Or, we can say that once the value is changed, retry and reprompt are not allowed anymore.

@lordofthejars
Copy link
Contributor

Not allowed or ignored, so no exception is thrown; we just log why they are not executed. What I see here is we can assume that most of time there will be one output guardrails (or you can implement one and put all logic there), so I guess that ignoring might be the best option.

@mariofusco
Copy link
Contributor Author

Once possibility is that only the last guardrails can change the value.

I don't like this solution, mostly because I can totally see 2 guardrails changing the result in sequence one after the other (like in the test case that I added).

Or, we can say that once the value is changed, retry and reprompt are not allowed anymore.

That is a better option already. Another possibility is to keep both the original and modified value in the response object and allow a guardrail to inspect both before deciding for a retry or a reprompt, but probably this uselessly overcomplicates things for users.

I would avoid to overthink on this (using more than a guardrails in a row is already an edge case imo) and simply ignore the problem or at most preventing retry/reprompt on a modified value as you suggested.

@cescoffier please let me know which option you prefer.

@lordofthejars
Copy link
Contributor

I totally agree with you @mariofusco just if my vote counts 😆

@cescoffier
Copy link
Collaborator

I may be bossier but I often use chains of guardrails.

But yes, maybe we could just add an option enabling/disabling retry and reprompt after a change of value.

The thing is that it is likely to work with smart LLM and totally break on less smart ones.

@mariofusco
Copy link
Contributor Author

But yes, maybe we could just add an option enabling/disabling retry and reprompt after a change of value.

Ok, I will try to implement this.

@mariofusco mariofusco force-pushed the out_guard_with_result branch from 76201c2 to 7ffeb74 Compare October 30, 2024 10:51
@mariofusco
Copy link
Contributor Author

@cescoffier Done, please give it a second look.

@cescoffier
Copy link
Collaborator

@cescoffier Done, please give it a second look.

First look... first look :-)

Copy link
Collaborator

@cescoffier cescoffier left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks great.

I would add a few tests with streamed responses, as, unfortunately, streams require a different approach for output guardrails.

}

@Test
@ActivateRequestContext
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question for @geoand - do you know if we can use @ActivateRequestContext on the class itself?

return result;
return accumulatedResults.isRewrittenResult() ? (GR) result.blockRetry() : result;
}
if (result.isRewrittenResult()) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't remember if this method is invoked when using streamed responses. Streams make things slightly more convoluted.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this method is used only for streamed response. I'm keeping this rewriting here regardless, but now if I find that this rewriting happened while streaming I throw an exception as discussed.

@mariofusco
Copy link
Contributor Author

I would add a few tests with streamed responses, as, unfortunately, streams require a different approach for output guardrails.

Sure, I will add tests for streamed responses. I guess that in that case rewriting the output is never allowed, correct?

@mariofusco
Copy link
Contributor Author

I would add a few tests with streamed responses, as, unfortunately, streams require a different approach for output guardrails.

Sure, I will add tests for streamed responses. I guess that in that case rewriting the output is never allowed, correct?

@cescoffier I'm adding some tests to cover the streamed scenarios, but I'm no longer sure of how this is supposed to work. The problem is that in this case the output guardrail is invoked for each and every chunk. Now my rewriting implementation changes all of them one after the other, which probably makes the problem even worse, but even without this I don't see which kind of meaningful validation an output guardrail could perform on a single chunk.

In essence I believe that in general an output guardrail is not compatible with a streamed LLM output and the 2 features should be used in a mutually exclusive way. Can you please clarify your point of view on this?

/cc @geoand

@geoand
Copy link
Collaborator

geoand commented Oct 30, 2024

In essence I believe that in general an output guardrail is not compatible with a streamed LLM output and the 2 features should be used in a mutually exclusive way. Can you please clarify your point of view on this?

/cc @geoand

Tools for example work with streamed responses. I was actually surprised to find this, but it does work :)

@cescoffier
Copy link
Collaborator

@cescoffier I'm adding some tests to cover the streamed scenarios, but I'm no longer sure how this should work. The problem is that in this case the output guardrail is invoked for each and every chunk. Now my rewriting implementation changes all of them one after the other, which probably makes the problem even worse, but even without this I don't see which kind of meaningful validation an output guardrail could perform on a single chunk.

No, that's not the case, it depends on the "accumulator" strategy. It can accumulate the full response (which defeat a bit the purpose of streaming), or each token (which makes things very hard to validate), or anything in between (sentence, paragraph, JSON object...).

You can have plenty of validation with the right accumulation strategy. But, yes, the guardrails are called for every accumulated item.

In essence I believe that in general an output guardrail is not compatible with a streamed LLM output and the 2 features should be used in a mutually exclusive way. Can you please clarify your point of view on this?

It is not exclusive. As I wrote, it depends on the accumulation strategy. I am implementing a chatbot that is streamed because otherwise, the experience is pretty bad, and I still check for hallucinations, out of topic content...

@mariofusco
Copy link
Contributor Author

It is not exclusive. As I wrote, it depends on the accumulation strategy. I am implementing a chatbot that is streamed because otherwise, the experience is pretty bad, and I still check for hallucinations, out of topic content...

Ok, I see your point. Let me rephrase this and contextualize only to this pull request. For what regards these rewriting output guardrails I don't see how this feature could be usable and useful on a partial response. In other words I don't see a scenario when you may want to rewrite each and every chunk of a streamed response. And yes, this also depends on the "accumulator" strategy, but even here, unless you don't accumulate the whole response (which as you said totally defeats the goal of using a streaming API) I don't think that rewriting different subparts of a response could be of any usefulness.

If you think that my point of view is wrong or limited and still believe that a rewriting guardrail could be useful also on a partial response, then I confirm that everything works as expected, I can add the streamed tests that I just wrote to this pull request and my work is finished.

Conversely, if you agree with me, maybe we should add a mechanism that prevents the response rewriting (throwing an exception?) when using the streamed version.

What do you think?

@lordofthejars
Copy link
Contributor

A good solution could be to use a stream and try to modify the output by throwing an exception to explain the situation. I think it is understandable and also let's not rethink a lot at the end the limitation is only if you use stream and you want to modify the output, in any case it affects other output guards.

@cescoffier
Copy link
Collaborator

I took a few minutes to think about it, and I tend to agree that for streaming modifying the value does not make a lot of sense. It's technically possible (the guard would modify the emitted item and it would be emitted downstream).

We can revisit if there is a use case emerging.

@mariofusco
Copy link
Contributor Author

I took a few minutes to think about it, and I tend to agree that for streaming modifying the value does not make a lot of sense. It's technically possible (the guard would modify the emitted item and it would be emitted downstream).

We can revisit if there is a use case emerging.

OK, I will change my pull request to throw an exception in case an output guardrail attempts to rewrite the LLM response while streaming.

@mariofusco mariofusco force-pushed the out_guard_with_result branch from 7ffeb74 to 4e13caa Compare October 31, 2024 09:12
@mariofusco
Copy link
Contributor Author

I implemented all discussed changes, please @cescoffier review again.

@geoand geoand requested a review from cescoffier November 1, 2024 13:24
@cescoffier
Copy link
Collaborator

@mariofusco can you rebase?

@mariofusco mariofusco force-pushed the out_guard_with_result branch from 4e13caa to 4ac0106 Compare November 3, 2024 18:13
@mariofusco
Copy link
Contributor Author

@cescoffier Done.

@cescoffier cescoffier merged commit ed12693 into quarkiverse:main Nov 3, 2024
59 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

OutputGuardrails not permitting override generated output
4 participants