-
Notifications
You must be signed in to change notification settings - Fork 101
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow to rewrite LLM result in an OutputGuardrail #1021
Conversation
I looked at my note as I remembered looking into this in the first implementation. There is a philosophical question that I was not able to crack. Let's imagine 3 output guardrails: A, B, C.
Now the problem is that we are going to reprompt based on a modified value, not the original value. So, the LLM may answer the same, or actually, do not understand the reprompt instruction and start hallucinating. We could imagine modifying the LLM response in the context to update the value to the modified value, but again, the LLM may see this and can freaks out. Once possibility is that only the last guardrails can change the value. |
Not allowed or ignored, so no exception is thrown; we just log why they are not executed. What I see here is we can assume that most of time there will be one output guardrails (or you can implement one and put all logic there), so I guess that ignoring might be the best option. |
I don't like this solution, mostly because I can totally see 2 guardrails changing the result in sequence one after the other (like in the test case that I added).
That is a better option already. Another possibility is to keep both the original and modified value in the response object and allow a guardrail to inspect both before deciding for a retry or a reprompt, but probably this uselessly overcomplicates things for users. I would avoid to overthink on this (using more than a guardrails in a row is already an edge case imo) and simply ignore the problem or at most preventing retry/reprompt on a modified value as you suggested. @cescoffier please let me know which option you prefer. |
I totally agree with you @mariofusco just if my vote counts 😆 |
I may be bossier but I often use chains of guardrails. But yes, maybe we could just add an option enabling/disabling retry and reprompt after a change of value. The thing is that it is likely to work with smart LLM and totally break on less smart ones. |
Ok, I will try to implement this. |
76201c2
to
7ffeb74
Compare
@cescoffier Done, please give it a second look. |
First look... first look :-) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks great.
I would add a few tests with streamed responses, as, unfortunately, streams require a different approach for output guardrails.
} | ||
|
||
@Test | ||
@ActivateRequestContext |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Question for @geoand - do you know if we can use @ActivateRequestContext
on the class itself?
return result; | ||
return accumulatedResults.isRewrittenResult() ? (GR) result.blockRetry() : result; | ||
} | ||
if (result.isRewrittenResult()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can't remember if this method is invoked when using streamed responses. Streams make things slightly more convoluted.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, this method is used only for streamed response. I'm keeping this rewriting here regardless, but now if I find that this rewriting happened while streaming I throw an exception as discussed.
Sure, I will add tests for streamed responses. I guess that in that case rewriting the output is never allowed, correct? |
@cescoffier I'm adding some tests to cover the streamed scenarios, but I'm no longer sure of how this is supposed to work. The problem is that in this case the output guardrail is invoked for each and every chunk. Now my rewriting implementation changes all of them one after the other, which probably makes the problem even worse, but even without this I don't see which kind of meaningful validation an output guardrail could perform on a single chunk. In essence I believe that in general an output guardrail is not compatible with a streamed LLM output and the 2 features should be used in a mutually exclusive way. Can you please clarify your point of view on this? /cc @geoand |
Tools for example work with streamed responses. I was actually surprised to find this, but it does work :) |
No, that's not the case, it depends on the "accumulator" strategy. It can accumulate the full response (which defeat a bit the purpose of streaming), or each token (which makes things very hard to validate), or anything in between (sentence, paragraph, JSON object...). You can have plenty of validation with the right accumulation strategy. But, yes, the guardrails are called for every accumulated item.
It is not exclusive. As I wrote, it depends on the accumulation strategy. I am implementing a chatbot that is streamed because otherwise, the experience is pretty bad, and I still check for hallucinations, out of topic content... |
Ok, I see your point. Let me rephrase this and contextualize only to this pull request. For what regards these rewriting output guardrails I don't see how this feature could be usable and useful on a partial response. In other words I don't see a scenario when you may want to rewrite each and every chunk of a streamed response. And yes, this also depends on the "accumulator" strategy, but even here, unless you don't accumulate the whole response (which as you said totally defeats the goal of using a streaming API) I don't think that rewriting different subparts of a response could be of any usefulness. If you think that my point of view is wrong or limited and still believe that a rewriting guardrail could be useful also on a partial response, then I confirm that everything works as expected, I can add the streamed tests that I just wrote to this pull request and my work is finished. Conversely, if you agree with me, maybe we should add a mechanism that prevents the response rewriting (throwing an exception?) when using the streamed version. What do you think? |
A good solution could be to use a stream and try to modify the output by throwing an exception to explain the situation. I think it is understandable and also let's not rethink a lot at the end the limitation is only if you use stream and you want to modify the output, in any case it affects other output guards. |
I took a few minutes to think about it, and I tend to agree that for streaming modifying the value does not make a lot of sense. It's technically possible (the guard would modify the emitted item and it would be emitted downstream). We can revisit if there is a use case emerging. |
OK, I will change my pull request to throw an exception in case an output guardrail attempts to rewrite the LLM response while streaming. |
7ffeb74
to
4e13caa
Compare
I implemented all discussed changes, please @cescoffier review again. |
@mariofusco can you rebase? |
4e13caa
to
4ac0106
Compare
@cescoffier Done. |
Closes: #1020
/cc @gsmet @geoand @lordofthejars