Skip to content

Commit d380fe1

Browse files
Pouyanpimiyoungc
andauthored
docs: add guide for bot reasoning guardrails (#1479)
* docs: add guide for bot reasoning guardrails update update simplify cleanup * docs: clarify Colang version for bot reasoning guide Add a note specifying that bot reasoning guardrails are supported only in Colang 1.0. Update example file references for improved clarity. * add bot thinking guardrails to toctree * docs: update self-check config link to develop branch * fix typo * fix references to use develop branch * docs: edit #1479 (#1484) --------- Co-authored-by: Miyoung Choi <[email protected]>
1 parent 28826ec commit d380fe1

File tree

2 files changed

+215
-0
lines changed

2 files changed

+215
-0
lines changed

docs/index.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -68,6 +68,7 @@ user-guides/advanced/nemoguard-jailbreakdetect-deployment
6868
user-guides/advanced/kv-cache-reuse
6969
user-guides/advanced/safeguarding-ai-virtual-assistant-blueprint
7070
user-guides/advanced/tools-integration
71+
user-guides/advanced/bot-thinking-guardrails
7172
```
7273

7374
```{toctree}
Lines changed: 214 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,214 @@
1+
# Guardrailing Bot Reasoning Content
2+
3+
Reasoning-capable large language models (LLMs) expose their internal thought process as reasoning traces. These traces reveal how the model arrives at its conclusions, providing transparency into the decision-making process. However, they may also contain sensitive information or problematic reasoning patterns that need to be monitored and controlled.
4+
5+
The NeMo Guardrails toolkit helps you set up guardrails to inspect and control these reasoning traces by extracting them. With this feature, you can configure guardrails that can block responses based on the model's reasoning process, enhance moderation decisions with reasoning context, or monitor reasoning patterns.
6+
7+
```{note}
8+
This guide uses Colang 1.0 syntax. Colang 1.0 currently supports bot reasoning guardrails only.
9+
```
10+
11+
```{important}
12+
The examples in this guide range from minimal toy examples (for understanding concepts) to complete reference implementations. These examples teach you how to access and work with `bot_thinking` in different contexts, not as production-ready code to copy-paste. Adapt these patterns to your specific use case with appropriate validation, error handling, and business logic for your application.
13+
```
14+
15+
---
16+
17+
## Accessing Reasoning Content
18+
19+
When an LLM generates a response with reasoning traces, the NeMo Guardrails toolkit extracts the reasoning and makes it available through the `bot_thinking` variable. You can use this variable in the following ways.
20+
21+
### In Colang Flows
22+
23+
The reasoning content is available as a context variable in Colang output rails. For example, in `config/rails.co`, you can set up a flow to capture the reasoning content by setting the `$captured_reasoning` variable to `$bot_thinking`.
24+
25+
```{code-block}
26+
define flow check_reasoning
27+
if $bot_thinking
28+
$captured_reasoning = $bot_thinking
29+
```
30+
31+
### In Custom Actions
32+
33+
When you write Python action functions in `config/actions.py`, you can access the reasoning through the context dictionary. For example, the following is an example action function that checks if the reasoning retrieved through `context.get("bot_thinking")` contains the word `"sensitive"`. It returns `False` if the bot reasoning contains the word `"sensitive"`.
34+
35+
```{code-block} python
36+
@action(is_system_action=True)
37+
async def check_reasoning(context: Optional[dict] = None):
38+
bot_thinking = context.get("bot_thinking")
39+
if bot_thinking and "sensitive" in bot_thinking:
40+
return False
41+
return True
42+
```
43+
44+
### In Prompt Templates
45+
46+
When you render prompts for LLM tasks such as `self check output`, the reasoning is available as a Jinja2 template variable. For example, in `prompts.yml`, you can set up a prompt to check if the reasoning contains the word `"sensitive"` and block the response if it does.
47+
48+
```yaml
49+
prompts:
50+
- task: self_check_output
51+
content: |
52+
Bot message: "{{ bot_response }}"
53+
54+
{% if bot_thinking %}
55+
Bot reasoning: "{{ bot_thinking }}"
56+
{% endif %}
57+
58+
Should this be blocked (Yes or No)?
59+
```
60+
61+
```{important}
62+
Always check if reasoning exists before using it, as not all models provide reasoning traces.
63+
```
64+
65+
---
66+
67+
## Guardrailing with Output Rails
68+
69+
You can use the `$bot_thinking` variable in output rails to inspect and control responses based on reasoning content.
70+
71+
1. Write a basic pattern matching flow that uses the `$bot_thinking` variable in `config/rails.co` as follows:
72+
73+
```{code-block}
74+
define bot refuse to respond
75+
"I'm sorry, I can't respond to that."
76+
77+
define flow block_sensitive_reasoning
78+
if $bot_thinking
79+
if "confidential" in $bot_thinking or "internal only" in $bot_thinking
80+
bot refuse to respond
81+
stop
82+
```
83+
84+
2. Add the flow to your output rails in `config.yml` as follows:
85+
86+
```{code-block}
87+
rails:
88+
output:
89+
flows:
90+
- block_sensitive_reasoning
91+
```
92+
93+
```{note}
94+
This demonstrates how to set up a basic pattern matching flow for learning purposes. Production implementations must use more comprehensive validation and consider edge cases.
95+
```
96+
97+
---
98+
99+
## Guardrailing with Custom Actions
100+
101+
For complex validation logic or reusable checks across multiple flows, you can write custom Python actions.
102+
This approach provides better code organization and makes it easier to share validation logic across different guardrails.
103+
104+
1. Write the custom action function in `config/actions.py` as follows:
105+
106+
```{code-block} python
107+
from typing import Optional
108+
from nemoguardrails.actions import action
109+
110+
@action(is_system_action=True)
111+
async def check_reasoning_quality(context: Optional[dict] = None):
112+
bot_thinking = context.get("bot_thinking")
113+
114+
if not bot_thinking:
115+
return True
116+
117+
forbidden_patterns = [
118+
"proprietary information",
119+
"trade secret",
120+
"confidential data"
121+
]
122+
123+
for pattern in forbidden_patterns:
124+
if pattern.lower() in bot_thinking.lower():
125+
return False
126+
127+
return True
128+
```
129+
130+
2. Write the flow that uses the custom action function in `config/rails.co` as follows:
131+
132+
```{code-block}
133+
define bot refuse to respond
134+
"I'm sorry, I can't respond to that."
135+
136+
define flow quality_check_reasoning
137+
$is_safe = execute check_reasoning_quality
138+
139+
if not $is_safe
140+
bot refuse to respond
141+
stop
142+
```
143+
144+
3. Add the flow to your output rails in `config.yml`.
145+
146+
```{code-block}
147+
rails:
148+
output:
149+
flows:
150+
- quality_check_reasoning
151+
```
152+
153+
---
154+
155+
## Using Reasoning in Self-Check Output
156+
157+
The following steps show how to use `bot_thinking` in a self-check output rail. This pattern provides reasoning traces to your moderation LLM, allowing it to make more informed decisions by evaluating both the response and the reasoning process.
158+
159+
This extends the [self check thinking configuration examples](https://github.com/NVIDIA/NeMo-Guardrails/tree/develop/examples/configs/self_check_thinking) in the NeMo Guardrails toolkit repository.
160+
161+
1. Write the `config.yml` file as follows:
162+
163+
```yaml
164+
models:
165+
- type: main
166+
engine: <your_engine>
167+
model: <your_reasoning_model>
168+
- type: self_check_output
169+
model: <your_moderation_model>
170+
engine: <your_engine>
171+
172+
rails:
173+
output:
174+
flows:
175+
- self check output
176+
```
177+
178+
2. Write the `prompts.yml` file as follows:
179+
180+
```yaml
181+
prompts:
182+
- task: self_check_output
183+
content: |
184+
Your task is to check if the bot message below complies with the company policy.
185+
186+
Company policy for the bot:
187+
- messages should not contain any explicit content
188+
- messages should not contain abusive language or offensive content
189+
- messages should not contain any harmful content
190+
- messages should not contain racially insensitive content
191+
- if a message is a refusal, should be polite
192+
193+
Bot message: "{{ bot_response }}"
194+
195+
{% if bot_thinking %}
196+
Bot thinking/reasoning: "{{ bot_thinking }}"
197+
{% endif %}
198+
199+
Question: Should the message be blocked (Yes or No)?
200+
Answer:
201+
```
202+
203+
The `{% if bot_thinking %}` conditional ensures that the prompt works with both reasoning and non-reasoning models. When reasoning is available, the self-check LLM can evaluate both the final response and the reasoning process.
204+
205+
---
206+
207+
## Related Guides
208+
209+
Use the following guides to learn more about the features used in this guide.
210+
211+
- [LLM Configuration - Using LLMs with Reasoning Traces](../configuration-guide/llm-configuration.md#using-llms-with-reasoning-traces): API response handling and breaking changes.
212+
- [Output Rails](../../getting-started/5-output-rails/README.md): General guide on output rails.
213+
- [Self-Check Output Example](https://github.com/NVIDIA/NeMo-Guardrails/tree/develop/examples/configs/self_check_thinking): Complete working configuration example in the NeMo Guardrails toolkit repository.
214+
- [Custom Actions](../../colang-language-syntax-guide.md#actions): Guide on writing custom actions.

0 commit comments

Comments
 (0)