From 3f8cf5a1537ab0e1cfa685b7c3da00a9a641eba5 Mon Sep 17 00:00:00 2001 From: Pouyanpi <13303554+Pouyanpi@users.noreply.github.com> Date: Fri, 17 Oct 2025 12:11:51 +0200 Subject: [PATCH 1/2] docs(examples): add nemoguards cache configuration example --- examples/configs/nemoguards_cache/README.md | 24 ++++ examples/configs/nemoguards_cache/config.yml | 48 ++++++++ .../configs/nemoguards_cache/prompts.yaml | 105 ++++++++++++++++++ 3 files changed, 177 insertions(+) create mode 100644 examples/configs/nemoguards_cache/README.md create mode 100644 examples/configs/nemoguards_cache/config.yml create mode 100644 examples/configs/nemoguards_cache/prompts.yaml diff --git a/examples/configs/nemoguards_cache/README.md b/examples/configs/nemoguards_cache/README.md new file mode 100644 index 000000000..991975f55 --- /dev/null +++ b/examples/configs/nemoguards_cache/README.md @@ -0,0 +1,24 @@ +# NeMoGuard Safety Rails Example + +This example showcases the use of NVIDIA's NeMoGuard NIMs for comprehensive AI safety including content moderation, topic control, and jailbreak detection. + +## Configuration Files + +- `config.yml` - Defines the models configuration including the main LLM and three NeMoGuard NIMs for safety checks +- `prompts.yml` - Contains prompt templates for content safety and topic control checks + +## NeMoGuard NIMs Used + +1. **Content Safety** (`nvidia/llama-3.1-nemoguard-8b-content-safety`) - Checks for unsafe content across 23 safety categories +2. **Topic Control** (`nvidia/llama-3.1-nemoguard-8b-topic-control`) - Ensures conversations stay within allowed topics +3. **Jailbreak Detection** - Detects and prevents jailbreak attempts (configured via `nim_server_endpoint`) + +## Documentation + +For more details about NeMoGuard NIMs and deployment options, see: + +- [NeMo Guardrails Documentation](https://docs.nvidia.com/nemo/guardrails/index.html) +- [Llama 3.1 NemoGuard 8B ContentSafety NIM](https://docs.nvidia.com/nim/llama-3-1-nemoguard-8b-contentsafety/latest/) +- [Llama 3.1 NemoGuard 8B TopicControl NIM](https://docs.nvidia.com/nim/llama-3-1-nemoguard-8b-topiccontrol/latest/) +- [NemoGuard JailbreakDetect NIM](https://docs.nvidia.com/nim/nemoguard-jailbreakdetect/latest/) +- [NeMoGuard Models on NVIDIA API Catalog](https://build.nvidia.com/search?q=nemoguard) diff --git a/examples/configs/nemoguards_cache/config.yml b/examples/configs/nemoguards_cache/config.yml new file mode 100644 index 000000000..1de285503 --- /dev/null +++ b/examples/configs/nemoguards_cache/config.yml @@ -0,0 +1,48 @@ +models: + - type: main + engine: nim + model: meta/llama-3.3-70b-instruct + + - type: content_safety + engine: nim + model: nvidia/llama-3.1-nemoguard-8b-content-safety + cache: + enabled: true + maxsize: 10000 + stats: + enabled: true + + - type: topic_control + engine: nim + model: nvidia/llama-3.1-nemoguard-8b-topic-control + cache: + enabled: true + maxsize: 10000 + stats: + enabled: true + + - type: jailbreak_detection + engine: nim + model: jailbreak_detect + cache: + enabled: true + maxsize: 10000 + stats: + enabled: true + +rails: + input: + flows: + - jailbreak detection model + - content safety check input $model=content_safety + - topic safety check input $model=topic_control + + output: + flows: + - content safety check output $model=content_safety + + config: + jailbreak_detection: + nim_base_url: "https://ai.api.nvidia.com" + nim_server_endpoint: "/v1/security/nvidia/nemoguard-jailbreak-detect" + api_key_env_var: NVIDIA_API_KEY diff --git a/examples/configs/nemoguards_cache/prompts.yaml b/examples/configs/nemoguards_cache/prompts.yaml new file mode 100644 index 000000000..d2e2075ee --- /dev/null +++ b/examples/configs/nemoguards_cache/prompts.yaml @@ -0,0 +1,105 @@ +# These are the default prompts released by Meta, except for policy O7, which was added to address direct insults. +prompts: + - task: content_safety_check_input $model=content_safety + content: | + Task: Check if there is unsafe content in the user or agent messages in the below conversation in accordance with our safety policy with the below categories. Provide a 'safe' or 'unsafe' rating for both the user message and the agent response. If there is unsafe content, please also provide a list of violated categories according to our safety policy below. + + + S1: Violence. + S2: Sexual. + S3: Criminal Planning/Confessions. + S4: Guns and Illegal Weapons. + S5: Controlled/Regulated Substances. + S6: Suicide and Self Harm. + S7: Sexual (minor). + S8: Hate/Identity Hate. + S9: PII/Privacy. + S10: Harassment. + S11: Threat. + S12: Profanity. + S13: Needs Caution. + S14: Other. + S15: Manipulation. + S16: Fraud/Deception. + S17: Malware. + S18: High Risk Gov Decision Making. + S19: Political/Misinformation/Conspiracy. + S20: Copyright/Trademark/Plagiarism. + S21: Unauthorized Advice. + S22: Illegal Activity. + S23: Immoral/Unethical. + + + + + user: {{ user_input }} + + + + Provide your safety assessment for the above conversation in the below output JSON format: + {"User Safety": safe or unsafe rating for the user message, "Response Safety": safe or unsafe rating for the agent response. Omit if no agent response present. "Safety Categories": a comma-separated list of applicable safety categories from the provided taxonomy. Omit if all safe.} + + Do not include anything other than the output JSON in your response. + Output JSON: + output_parser: nemoguard_parse_prompt_safety + max_tokens: 50 + + - task: content_safety_check_output $model=content_safety + content: | + Task: Check if there is unsafe content in the user or agent messages in the below conversation in accordance with our safety policy with the below categories. Provide a 'safe' or 'unsafe' rating for both the user message and the agent response. If there is unsafe content, please also provide a list of violated categories according to our safety policy below. + + + S1: Violence. + S2: Sexual. + S3: Criminal Planning/Confessions. + S4: Guns and Illegal Weapons. + S5: Controlled/Regulated Substances. + S6: Suicide and Self Harm. + S7: Sexual (minor). + S8: Hate/Identity Hate. + S9: PII/Privacy. + S10: Harassment. + S11: Threat. + S12: Profanity. + S13: Needs Caution. + S14: Other. + S15: Manipulation. + S16: Fraud/Deception. + S17: Malware. + S18: High Risk Gov Decision Making. + S19: Political/Misinformation/Conspiracy. + S20: Copyright/Trademark/Plagiarism. + S21: Unauthorized Advice. + S22: Illegal Activity. + S23: Immoral/Unethical. + + + + + user: {{ user_input }} + + response: agent: {{ bot_response }} + + + + Provide your safety assessment for the above conversation in the below output JSON format: + {"User Safety": safe or unsafe rating for the user message, "Response Safety": safe or unsafe rating for the agent response. Omit if no agent response present. "Safety Categories": a comma-separated list of applicable safety categories from the provided taxonomy. Omit if all safe.} + + Do not include anything other than the output JSON in your response. + Output JSON: + output_parser: nemoguard_parse_response_safety + max_tokens: 50 + + - task: topic_safety_check_input $model=topic_control + content: | + You are to act as a customer service agent, providing users with factual information in accordance to the knowledge base. Your role is to ensure that you respond only to relevant queries and adhere to the following guidelines + + Guidelines for the user messages: + - Do not answer questions related to personal opinions or advice on user's order, future recommendations + - Do not provide any information on non-company products or services. + - Do not answer enquiries unrelated to the company policies. + - Do not answer questions asking for personal details about the agent or its creators. + - Do not answer questions about sensitive topics related to politics, religion, or other sensitive subjects. + - If a user asks topics irrelevant to the company's customer service relations, politely redirect the conversation or end the interaction. + - Your responses should be professional, accurate, and compliant with customer relations guidelines, focusing solely on providing transparent, up-to-date information about the company that is already publicly available. + - allow user comments that are related to small talk and chit-chat. From ebb5f0908d8c6515c25c3ded48e6371bac4c3dd9 Mon Sep 17 00:00:00 2001 From: Pouyanpi <13303554+Pouyanpi@users.noreply.github.com> Date: Mon, 20 Oct 2025 12:53:46 +0200 Subject: [PATCH 2/2] docs: update README with caching and safety config details --- examples/configs/nemoguards_cache/README.md | 239 +++++++++++++++++++- 1 file changed, 231 insertions(+), 8 deletions(-) diff --git a/examples/configs/nemoguards_cache/README.md b/examples/configs/nemoguards_cache/README.md index 991975f55..577f6f454 100644 --- a/examples/configs/nemoguards_cache/README.md +++ b/examples/configs/nemoguards_cache/README.md @@ -1,17 +1,240 @@ -# NeMoGuard Safety Rails Example +# NeMoGuard Safety Rails with Caching -This example showcases the use of NVIDIA's NeMoGuard NIMs for comprehensive AI safety including content moderation, topic control, and jailbreak detection. +This example demonstrates how to configure NeMo Guardrails with caching support for multiple NVIDIA NeMoGuard NIMs, including content safety, topic control, and jailbreak detection. -## Configuration Files +## Features -- `config.yml` - Defines the models configuration including the main LLM and three NeMoGuard NIMs for safety checks -- `prompts.yml` - Contains prompt templates for content safety and topic control checks +- **Content Safety Checks**: Validates content against 23 safety categories (input and output) +- **Topic Control**: Ensures conversations stay within allowed topics (input) +- **Jailbreak Detection**: Detects and prevents jailbreak attempts (input) +- **Per-Model Caching**: Each safety model has its own dedicated cache instance +- **Thread Safety**: Fully thread-safe for use in multi-threaded web servers +- **Cache Statistics**: Optional performance monitoring for each model + +## Folder Structure + +- `config.yml` - Main configuration file with model definitions, rails configuration, and cache settings +- `prompts.yml` - Prompt templates for content safety and topic control checks + +## Configuration Overview + +### Basic Configuration with Caching + +```yaml +models: + - type: main + engine: nim + model: meta/llama-3.3-70b-instruct + + - type: content_safety + engine: nim + model: nvidia/llama-3.1-nemoguard-8b-content-safety + cache: + enabled: true + maxsize: 10000 + stats: + enabled: true + + - type: topic_control + engine: nim + model: nvidia/llama-3.1-nemoguard-8b-topic-control + cache: + enabled: true + maxsize: 10000 + stats: + enabled: true + + - type: jailbreak_detection + engine: nim + model: jailbreak_detect + cache: + enabled: true + maxsize: 10000 + stats: + enabled: true + +rails: + input: + flows: + - jailbreak detection model + - content safety check input $model=content_safety + - topic safety check input $model=topic_control + + output: + flows: + - content safety check output $model=content_safety + + config: + jailbreak_detection: + nim_base_url: "https://ai.api.nvidia.com" + nim_server_endpoint: "/v1/security/nvidia/nemoguard-jailbreak-detect" + api_key_env_var: NVIDIA_API_KEY +``` ## NeMoGuard NIMs Used -1. **Content Safety** (`nvidia/llama-3.1-nemoguard-8b-content-safety`) - Checks for unsafe content across 23 safety categories -2. **Topic Control** (`nvidia/llama-3.1-nemoguard-8b-topic-control`) - Ensures conversations stay within allowed topics -3. **Jailbreak Detection** - Detects and prevents jailbreak attempts (configured via `nim_server_endpoint`) +### 1. Content Safety (`nvidia/llama-3.1-nemoguard-8b-content-safety`) + +Checks for unsafe content across 23 safety categories including violence, hate speech, sexual content, and more. + +**Cache Configuration:** + +```yaml +- type: content_safety + engine: nim + model: nvidia/llama-3.1-nemoguard-8b-content-safety + cache: + enabled: true + maxsize: 10000 + stats: + enabled: true +``` + +### 2. Topic Control (`nvidia/llama-3.1-nemoguard-8b-topic-control`) + +Ensures conversations stay within allowed topics and prevents topic drift. + +**Cache Configuration:** + +```yaml +- type: topic_control + engine: nim + model: nvidia/llama-3.1-nemoguard-8b-topic-control + cache: + enabled: true + maxsize: 10000 + stats: + enabled: true +``` + +### 3. Jailbreak Detection (`jailbreak_detect`) + +Detects and prevents jailbreak attempts that try to bypass safety measures. + +**IMPORTANT**: For jailbreak detection caching to work, the `type` and `model` **MUST** be set to these exact values: + +- `type: jailbreak_detection` +- `model: jailbreak_detect` + +**Cache Configuration:** + +```yaml +- type: jailbreak_detection + engine: nim + model: jailbreak_detect + cache: + enabled: true + maxsize: 10000 + stats: + enabled: true +``` + +The actual NIM endpoint is configured separately in the `rails.config` section: + +```yaml +rails: + config: + jailbreak_detection: + nim_base_url: "https://ai.api.nvidia.com" + nim_server_endpoint: "/v1/security/nvidia/nemoguard-jailbreak-detect" + api_key_env_var: NVIDIA_API_KEY +``` + +## How It Works + +1. **User Input**: When a user sends a message, it goes through multiple safety checks: + - Jailbreak detection evaluates for manipulation attempts + - Content safety checks for unsafe content + - Topic control validates topic adherence + +2. **Caching**: Each model has its own cache: + - First check: API call to NeMoGuard NIM, result cached + - Subsequent identical inputs: Cache hit, no API call needed + +3. **Response Generation**: If all input checks pass, the main model generates a response + +4. **Output Check**: The response is checked by content safety before returning to user + +## Cache Configuration Options + +### Default Behavior (No Caching) + +By default, caching is **disabled**. Models without cache configuration will have no caching. + +### Enabling Cache + +Add cache configuration to any model definition: + +```yaml +cache: + enabled: true # Enable caching + maxsize: 10000 # Cache capacity (number of entries) + stats: + enabled: true # Enable statistics tracking + log_interval: 300.0 # Log stats every 5 minutes (optional) +``` + +### Cache Configuration Parameters + +- **enabled**: `true` to enable caching, `false` to disable +- **maxsize**: Maximum number of entries in the cache (LRU eviction when full) +- **stats.enabled**: Track cache hit/miss rates and performance metrics +- **stats.log_interval**: How often to log statistics (in seconds, optional) + +## Architecture + +Each NeMoGuard model gets its own dedicated cache instance, providing: + +- **Isolated cache management** per model +- **Different cache capacities** for different models +- **Model-specific performance tuning** +- **Thread-safe concurrent access** + +This architecture allows you to: + +- Set larger caches for frequently-used models +- Disable caching for specific models +- Monitor performance per model + +## Thread Safety + +The implementation is fully thread-safe: + +- **Concurrent Requests**: Safely handles multiple simultaneous safety checks +- **Efficient Locking**: Uses RLock for minimal performance impact +- **Atomic Operations**: Prevents duplicate LLM calls for the same content + +Suitable for: + +- Multi-threaded web servers (FastAPI, Flask, Django) +- Concurrent request processing +- High-traffic applications + +## Running the Example + +```bash +export NVIDIA_API_KEY=your_api_key_here + +nemoguardrails server --config examples/configs/nemoguards_cache/ +``` + +## Benefits + +1. **Performance**: Avoid redundant NeMoGuard API calls for repeated inputs +2. **Cost Savings**: Reduce API usage significantly +3. **Flexibility**: Enable caching per model based on usage patterns +4. **Clean Architecture**: Each model has its own dedicated cache +5. **Scalability**: Easy to add new models with different caching strategies +6. **Observability**: Cache statistics help monitor effectiveness + +## Tips + +- Start with moderate cache sizes (5,000-10,000 entries) and adjust based on usage +- Enable stats logging to monitor cache effectiveness +- Jailbreak detection typically has high cache hit rates +- Content safety caching is most effective for chatbots with common queries +- Topic control benefits from caching when topics are well-defined +- Adjust cache sizes independently for each model based on their usage patterns ## Documentation