Skip to content

Conversation

@tgasser-nv
Copy link
Collaborator

@tgasser-nv tgasser-nv commented Oct 29, 2025

Description

Related Issue(s)

Checklist

  • I've read the CONTRIBUTING guidelines.
  • I've updated the documentation if applicable.
  • I've added tests if applicable.
  • @mentions of the person or team responsible for reviewing proposed changes.

@tgasser-nv tgasser-nv self-assigned this Oct 29, 2025
@tgasser-nv tgasser-nv marked this pull request as draft October 29, 2025 01:08
Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Greptile Overview

Greptile Summary

This PR adds documentation for the in-memory caching feature that allows users to configure caching for LLM models within Guardrails. The new file docs/user-guides/advanced/model-memory-cache.md demonstrates how to enable caching on both the main LLM and content-safety models through YAML configuration, with options for maxsize and statistics reporting. This fits into the advanced user guides section alongside other performance-optimization topics like KV-cache reuse, providing users with a mechanism to reduce redundant API calls by caching prompt/response pairs at the model level.

Important Files Changed

Filename Score Overview
docs/user-guides/advanced/model-memory-cache.md 2/5 New documentation file for in-memory caching with incomplete sections (LFU Cache, Telemetry, Horizontal scaling) and minor punctuation error on line 6

Confidence score: 2/5

  • This PR contains incomplete documentation that should not be merged in its current state due to three empty section headers.
  • Score reflects missing critical content (cache eviction behavior, telemetry details, horizontal scaling implications) and lack of information about defaults, limitations, or cache key construction. The punctuation error on line 6 and empty sections indicate this is work-in-progress documentation.
  • Pay close attention to lines 66-71where three section headers are present but have no content, and verify whether the cache configuration example accurately reflects the actual implementation before merging.

Sequence Diagram

sequenceDiagram
    participant User
    participant "Guardrails Runtime" as Runtime
    participant "Cache Layer" as Cache
    participant "LLM Model" as LLM
    
    User->>Runtime: "Send user prompt"
    Runtime->>Cache: "Check cache for prompt"
    alt Cache Hit
        Cache->>Runtime: "Return cached response"
        Runtime->>Cache: "Update cache stats (hit)"
        Runtime->>User: "Return cached response"
    else Cache Miss
        Cache->>Runtime: "No cached entry found"
        Runtime->>LLM: "Send prompt to model"
        LLM->>Runtime: "Return LLM response"
        Runtime->>Cache: "Store prompt-response pair"
        alt Cache Full
            Cache->>Cache: "Evict LFU entry"
        end
        Cache->>Runtime: "Confirm storage"
        Runtime->>Cache: "Update cache stats (miss)"
        Runtime->>User: "Return LLM response"
    end
    
    opt Telemetry Enabled
        Runtime->>Runtime: "Report cache statistics"
    end
Loading

1 file reviewed, 2 comments

Edit Code Review Agent Settings | Greptile

@github-actions
Copy link
Contributor

Documentation preview

https://nvidia-nemo.github.io/Guardrails/review/pr-1480

@codecov-commenter
Copy link

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

@tgasser-nv tgasser-nv added the documentation Improvements or additions to documentation label Oct 29, 2025
@tgasser-nv tgasser-nv marked this pull request as ready for review October 29, 2025 21:49
Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Greptile Overview

Greptile Summary

Adds comprehensive documentation for the in-memory LFU caching feature in NeMo Guardrails. The documentation covers configuration examples, cache mechanics, telemetry, and horizontal scaling considerations.

Key additions:

  • Configuration examples showing cache setup for Content-Safety, Topic-Control, and Jailbreak detection models
  • Explanation of LFU eviction algorithm and SHA256-based cache key hashing
  • Documentation of cache statistics logging format and metrics
  • Discussion of cache behavior in horizontally-scaled deployments

Issues found:

  • Two TODO comments need resolution (stats reset behavior and log format)
  • Minor typos and grammatical issues in horizontal scaling section
  • The log format shown matches the implementation exactly, so the TODO comment can be removed

Confidence Score: 4/5

  • This PR is safe to merge after addressing minor documentation issues (TODOs and typos)
  • Documentation-only PR that accurately describes the caching implementation. The content aligns with the actual code in lfu.py. Main issues are incomplete TODO comments and minor typos that should be resolved before merge
  • The single documentation file needs minor cleanup of TODO comments and typo fixes

Important Files Changed

File Analysis

Filename Score Overview
docs/user-guides/advanced/model-memory-cache.md 4/5 New documentation for in-memory LFU caching feature with config examples, minor TODOs and typos to resolve

Sequence Diagram

sequenceDiagram
    participant User
    participant Guardrails
    participant Cache
    participant LLM

    User->>Guardrails: Send prompt
    Guardrails->>Cache: Check cache (SHA256 hash of prompt)
    
    alt Cache Hit
        Cache-->>Guardrails: Return cached response
        Guardrails->>Cache: Update frequency count
        Guardrails-->>User: Return response (fast)
    else Cache Miss
        Cache-->>Guardrails: Not found
        Guardrails->>LLM: Send prompt
        LLM-->>Guardrails: Generate response
        Guardrails->>Cache: Store response
        alt Cache Full
            Cache->>Cache: Evict LFU entry
        end
        Guardrails-->>User: Return response
    end
    
    Note over Cache: Log stats every log_interval seconds
Loading

1 file reviewed, 4 comments

Edit Code Review Agent Settings | Greptile

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Greptile Overview

Greptile Summary

Adds comprehensive documentation for the new in-memory LFU caching feature for NeMo Guard models. The documentation covers configuration examples, cache mechanics (exact-match with SHA256 hashing, LFU eviction), telemetry/logging, and considerations for horizontally-scaled deployments.

Key additions:

  • Example configurations showing cache setup for Content-Safety, Topic-Control, and Jailbreak Detection models
  • Detailed explanation of cache operation with exact-match lookup and LFU eviction algorithm
  • Documentation of cache statistics and OTEL telemetry integration
  • Discussion of cache behavior in horizontally-scaled environments

Issues requiring attention:

  • Two TODOs remain in the documentation (lines 127, 131) that need to be resolved
  • Typo on line 158: "ne" should be "be"
  • Line 157 has grammatically incomplete sentence about load balancing
  • Previous comments have identified these issues with suggested fixes

Confidence Score: 3/5

  • Documentation is mostly complete but contains unresolved TODOs and typos that should be fixed before merging
  • Score reflects that this is documentation-only with no code changes (low risk), but the presence of explicit TODO markers and identified typos indicate the documentation is not yet production-ready. The author's commit message "some todos to fill in based on local integration testing" confirms this is work-in-progress.
  • docs/user-guides/advanced/model-memory-cache.md requires cleanup of TODOs and typo corrections before merge

Important Files Changed

File Analysis

Filename Score Overview
docs/user-guides/advanced/index.rst 5/5 Added new cache documentation page to the table of contents
docs/user-guides/advanced/model-memory-cache.md 3/5 New comprehensive caching documentation with TODOs remaining and a few typos that need correction before merge

Sequence Diagram

sequenceDiagram
    participant User
    participant Guardrails
    participant Cache as LFU Cache
    participant LLM as NeMo Guard LLM
    
    User->>Guardrails: Send prompt
    Guardrails->>Cache: Check cache (SHA256 hash of prompt)
    
    alt Cache Hit
        Cache-->>Guardrails: Return cached response
        Guardrails-->>User: Return response (fast)
        Note over Cache: Update frequency counter<br/>Log hit to OTEL
    else Cache Miss
        Cache-->>Guardrails: Cache miss
        Guardrails->>LLM: Send prompt
        LLM-->>Guardrails: Return response
        Guardrails->>Cache: Store response (hash -> response)
        Note over Cache: Evict LFU entry if at maxsize<br/>Log miss to OTEL
        Guardrails-->>User: Return response
    end
    
    opt Every log_interval seconds
        Cache->>Cache: Log statistics<br/>(hits, misses, hit rate, etc.)
    end
Loading

1 file reviewed, no comments

Edit Code Review Agent Settings | Greptile

@tgasser-nv tgasser-nv force-pushed the docs/in-memory-model-cache branch from 82ac81a to 35177fa Compare October 30, 2025 14:47
Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Greptile Overview

Greptile Summary

This PR adds comprehensive documentation for the in-memory caching feature in NeMo Guardrails. The documentation covers LFU cache configuration for Nemoguard models (Content-Safety, Topic-Control, and Jailbreak Detection), explaining how the cache works with exact-match lookups using SHA256-hashed prompts, telemetry/logging capabilities, and considerations for horizontally-scaled deployments.

Key additions:

  • Example configurations showing before/after with cache enabled
  • Technical explanation of cache behavior (exact-match with LFU eviction)
  • Cache statistics and monitoring via OTEL telemetry and periodic logging
  • Discussion of cache behavior in multi-node deployments with load balancing

Issues to address:

  • Several TODOs remain in the document (lines 127, 131) that need resolution
  • Grammatical errors and typos (lines 6, 157, 158) identified in previous comments
  • Previous reviewers provided specific suggestions for corrections that should be applied

Confidence Score: 3/5

  • This documentation PR is mostly safe to merge but contains several TODO items and errors that should be resolved first
  • The documentation provides good technical coverage and accurate examples, but has unresolved TODOs (lines 127, 131) and several syntax errors (missing period line 6, typo "ne" vs "be" line 158, grammatically incomplete sentence line 157) that were identified by previous reviewers. These issues don't break functionality but should be cleaned up before merging for a polished user experience
  • docs/user-guides/advanced/model-memory-cache.md requires attention to resolve TODOs and fix syntax errors

Important Files Changed

File Analysis

Filename Score Overview
docs/user-guides/advanced/model-memory-cache.md 3/5 New documentation for in-memory caching feature with several TODOs and typos that need fixing

Sequence Diagram

sequenceDiagram
    participant Client
    participant Guardrails
    participant Cache as In-Memory Cache
    participant LLM as Nemoguard Model

    Client->>Guardrails: User prompt
    Guardrails->>Guardrails: Remove whitespace
    Guardrails->>Guardrails: Hash prompt (SHA256)
    Guardrails->>Cache: Check cache (hashed key)
    
    alt Cache Hit
        Cache-->>Guardrails: Return cached response
        Guardrails-->>Client: Response (from cache)
    else Cache Miss
        Cache-->>Guardrails: Not found
        Guardrails->>LLM: Send prompt
        LLM-->>Guardrails: LLM response
        Guardrails->>Cache: Store response (LFU eviction if full)
        Guardrails-->>Client: Response (from LLM)
    end
    
    Note over Cache: Periodic stats logging<br/>(every log_interval seconds)
Loading

2 files reviewed, no comments

Edit Code Review Agent Settings | Greptile

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Greptile Overview

Greptile Summary

Adds comprehensive documentation for in-memory caching feature for NeMoGuard models.

  • Documents cache configuration with practical YAML examples showing Content-Safety, Topic-Control, and Jailbreak detection models
  • Explains LFU (Least Frequently Used) cache algorithm with exact-match lookup using SHA256 hashing
  • Details telemetry/logging capabilities including OTEL traces and configurable statistics logging
  • Covers horizontal scaling considerations for distributed deployments
  • Minor formatting issue in log example (line 132) shows decimal instead of fraction format

Confidence Score: 4/5

  • Safe to merge with one minor correction to the log format example
  • Documentation is comprehensive, technically accurate (verified against implementation), and well-structured. The only issue is a formatting error in the log output example. Previous reviewer comments have already identified the typos on lines 156-157 which should be addressed.
  • No files require special attention beyond fixing the log format on line 132

Important Files Changed

File Analysis

Filename Score Overview
docs/user-guides/advanced/model-memory-cache.md 4/5 New cache documentation - comprehensive and well-written with minor formatting error in log example

Sequence Diagram

sequenceDiagram
    participant User
    participant Guardrails
    participant Cache
    participant LLM
    
    User->>Guardrails: Send prompt
    Guardrails->>Cache: Check if prompt exists (SHA256 hash)
    
    alt Cache Hit
        Cache-->>Guardrails: Return cached response
        Guardrails-->>User: Return response (fast)
    else Cache Miss
        Cache-->>Guardrails: No cached response
        Guardrails->>LLM: Send prompt to LLM
        LLM-->>Guardrails: Return LLM response
        Guardrails->>Cache: Store response (with eviction if needed)
        Guardrails-->>User: Return response
    end
    
    Note over Cache: Statistics tracked:<br/>hits, misses, evictions,<br/>puts, updates, hit rate
Loading

1 file reviewed, 1 comment

Edit Code Review Agent Settings | Greptile

Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
Signed-off-by: Miyoung Choi <[email protected]>
Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Greptile Overview

Greptile Summary

Adds comprehensive documentation for in-memory caching feature supporting NeMoGuard models (Content-Safety, Topic-Control, and Jailbreak Detection). The documentation includes practical configuration examples, explains the LFU cache algorithm with SHA256 hashing, covers telemetry/logging, and discusses horizontal scaling considerations.

Key additions:

  • Complete configuration examples showing cache setup with maxsize, stats.enabled, and stats.log_interval parameters
  • Explanation of cache operation using exact-match lookup and LFU eviction policy
  • Documentation of cache statistics metrics and OTEL telemetry integration
  • Discussion of cache behavior in horizontally-scaled deployments

The documentation is well-structured and informative, though some previous comments about formatting issues (log output format on line 132, sentence structure on line 156) have not yet been fully addressed.

Confidence Score: 4/5

  • This PR is safe to merge with minor documentation formatting corrections
  • The documentation is comprehensive, accurate, and provides valuable guidance for users implementing caching. However, there are a few minor issues flagged in previous comments that remain unresolved: (1) the log format example on line 132 doesn't match the actual implementation output format, and (2) sentence structure on line 156 could be clearer. These are documentation-only issues that don't affect functionality.
  • No files require special attention - this is documentation-only with minor formatting issues

Important Files Changed

File Analysis

Filename Score Overview
docs/user-guides/advanced/model-memory-cache.md 4/5 New comprehensive caching documentation with clear examples and explanations; minor formatting issues remain in log example

Sequence Diagram

sequenceDiagram
    participant User
    participant Guardrails
    participant Cache
    participant LLM

    User->>Guardrails: Send prompt
    Guardrails->>Cache: Check cache (SHA256 hash of prompt)
    
    alt Cache Hit
        Cache-->>Guardrails: Return cached response
        Guardrails-->>User: Return response (fast)
    else Cache Miss
        Cache-->>Guardrails: Not found
        Guardrails->>LLM: Send prompt
        LLM-->>Guardrails: Return response
        Guardrails->>Cache: Store response (LFU eviction if full)
        Guardrails-->>User: Return response
    end
    
    Note over Cache: Stats logged every log_interval seconds
    Cache->>Cache: Update stats (hits/misses/evictions)
Loading

1 file reviewed, no comments

Edit Code Review Agent Settings | Greptile

@miyoungc miyoungc mentioned this pull request Oct 31, 2025
4 tasks
Edits to the memory-cache docs
Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Greptile Overview

Greptile Summary

This PR adds comprehensive documentation for the in-memory LLM cache feature in NeMo Guardrails. The documentation covers configuration examples, cache mechanics, telemetry/logging, and horizontal scaling considerations.

Key additions:

  • New documentation page docs/user-guides/advanced/model-memory-cache.md explaining the cache feature
  • Configuration examples showing before/after scenarios with cache enabled
  • Details on LFU (Least-Frequently-Used) eviction algorithm and SHA256 hashing for security
  • Explanation of OTEL telemetry integration and configurable statistics logging
  • Discussion of cache behavior in horizontally-scaled deployments

Issues to address:
The previous review comments identified several syntax issues that need correction before merging:

  • Missing period at end of sentence (line 6)
  • Empty placeholder sections that should be removed (line 66)
  • Incorrect log format example showing decimal instead of fraction format (line 156)
  • Grammatical issues in horizontal scaling section

Once these syntax corrections are applied, this documentation will provide clear guidance for users implementing caching in their Guardrails deployments.

Confidence Score: 4/5

  • This documentation PR is safe to merge after addressing the syntax corrections identified in previous comments
  • The documentation is well-structured and comprehensive, but contains several syntax and formatting errors that were flagged in previous review comments. These are straightforward corrections that don't affect the technical accuracy of the content.
  • Pay attention to docs/user-guides/advanced/model-memory-cache.md which needs syntax corrections before merging

Important Files Changed

File Analysis

Filename Score Overview
docs/index.md 5/5 Added reference to new model-memory-cache documentation page in the table of contents
docs/user-guides/advanced/model-memory-cache.md 4/5 New documentation page for in-memory LLM cache feature with configuration examples, telemetry details, and horizontal scaling considerations. Several syntax issues flagged in previous comments need correction.

Sequence Diagram

sequenceDiagram
    participant User
    participant Guardrails
    participant Cache
    participant LLM

    User->>Guardrails: Send request with prompt
    Guardrails->>Guardrails: Remove whitespace from prompt
    Guardrails->>Guardrails: Hash prompt with SHA256
    Guardrails->>Cache: Check for cached response
    
    alt Cache Hit
        Cache-->>Guardrails: Return cached response
        Guardrails-->>User: Return response (fast)
    else Cache Miss
        Cache-->>Guardrails: No cached response
        Guardrails->>LLM: Send prompt to LLM
        LLM-->>Guardrails: Return LLM response
        Guardrails->>Cache: Store response (LFU eviction if full)
        Guardrails-->>User: Return response
    end
    
    Note over Cache: Statistics logged every log_interval seconds
    Note over Cache: Stats tracked via OTEL telemetry
Loading

3 files reviewed, no comments

Edit Code Review Agent Settings | Greptile

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Greptile Overview

Greptile Summary

Added comprehensive documentation for the in-memory caching feature for Nemoguard models (Content-Safety, Topic-Control, and Jailbreak Detection). The documentation covers configuration examples, cache mechanics (LFU algorithm with SHA256 hashing), telemetry/logging capabilities, and horizontal scaling considerations.

Key additions:

  • Configuration examples showing before/after enabling caching
  • Detailed explanation of cache operations (exact matching after whitespace removal, LFU eviction policy)
  • Documentation of cache statistics logging format and metrics
  • Discussion of cache behavior in horizontally-scaled deployments
  • Integration with OpenTelemetry for observability

Issues addressed in previous comments:

  • Several syntax corrections (punctuation, typos, grammatical fixes)
  • Log format corrections to match actual implementation
  • Style suggestions for placeholder sections

Confidence Score: 4/5

  • This documentation-only PR is safe to merge after addressing the previously noted syntax corrections
  • The PR adds new documentation without modifying any code. Previous review comments identified several minor syntax issues (punctuation, typos, log format accuracy) that should be addressed. The documentation is well-structured, accurate to the implementation, and provides comprehensive coverage of the caching feature
  • docs/user-guides/advanced/model-memory-cache.md requires syntax corrections from previous comments before merge

Important Files Changed

File Analysis

Filename Score Overview
docs/user-guides/advanced/model-memory-cache.md 4/5 New documentation file for in-memory caching feature; well-structured with examples, implementation details, and operational considerations; previous comments addressed syntax issues

Sequence Diagram

sequenceDiagram
    participant User
    participant Guardrails
    participant Cache
    participant LLM

    User->>Guardrails: Send prompt
    Guardrails->>Guardrails: Remove whitespace & hash with SHA256
    Guardrails->>Cache: Check if prompt exists (exact match)
    
    alt Cache Hit
        Cache->>Guardrails: Return cached response
        Guardrails->>User: Return response (fast path)
    else Cache Miss
        Cache->>Guardrails: No cached response
        Guardrails->>LLM: Forward prompt
        LLM->>Guardrails: Return LLM response
        Guardrails->>Cache: Store response (LFU eviction if full)
        Cache->>Cache: Update frequency counters
        Guardrails->>User: Return response
    end
    
    Note over Cache: Stats logged every log_interval seconds
    Cache->>Cache: Track hits, misses, evictions, hit rate
Loading

1 file reviewed, no comments

Edit Code Review Agent Settings | Greptile

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Greptile Overview

Greptile Summary

Adds comprehensive documentation for the in-memory caching feature that optimizes LLM calls by storing and reusing responses for repeated prompts.

Major additions:

  • Complete guide for configuring caches on NeMoGuard models (Content-Safety, Topic-Control, Jailbreak Detection)
  • Detailed explanation of LFU (Least Frequently Used) cache eviction algorithm
  • Documentation of cache statistics, telemetry, and logging capabilities
  • Discussion of performance considerations in horizontally-scaled deployments
  • Before/after configuration examples showing cache setup

The documentation accurately reflects the implementation in nemoguardrails/llm/cache/lfu.py. Previous review comments have already identified minor syntax issues that should be addressed before merging.

Confidence Score: 4/5

  • Safe to merge after addressing minor syntax issues already identified in previous comments
  • Documentation-only PR with accurate technical content. Previous comments caught syntax/formatting issues that need fixes. No logical errors or security concerns.
  • docs/user-guides/advanced/model-memory-cache.md requires attention to address the syntax issues flagged in previous comments

Important Files Changed

File Analysis

Filename Score Overview
docs/user-guides/advanced/model-memory-cache.md 4/5 New comprehensive documentation for in-memory caching feature. Previous comments identified minor syntax issues that need resolution.

Sequence Diagram

sequenceDiagram
    participant User
    participant Guardrails
    participant Cache
    participant LLM

    User->>Guardrails: Send prompt
    Guardrails->>Cache: Check cache (hash of prompt)
    alt Cache Hit
        Cache-->>Guardrails: Return cached response
        Guardrails-->>User: Return response (no LLM call)
    else Cache Miss
        Cache-->>Guardrails: Not found
        Guardrails->>LLM: Send prompt
        LLM-->>Guardrails: Return response
        Guardrails->>Cache: Store response (hash -> value)
        alt Cache Full
            Cache->>Cache: Evict LFU entry
        end
        Guardrails-->>User: Return response
    end
    
    Note over Cache: Stats tracked: hits, misses,<br/>evictions, hit rate
    Cache->>Cache: Log stats (configurable interval)
Loading

2 files reviewed, no comments

Edit Code Review Agent Settings | Greptile

@tgasser-nv
Copy link
Collaborator Author

The Python 3.10 pr-tests-matrix tests are failing with a strange error:

"/home/runner/work/_temp/2e06cfe6-b7ff-47d3-a958-b6f22ed389fb.sh: line 1: 2770 Illegal instruction (core dumped) poetry run pytest -v

I've seen this flaky behaviour in another PR. There's no way this PR could cause it to fail, there are no code changes. Bypassing this check and merging

@tgasser-nv tgasser-nv merged commit 121fc8f into develop Oct 31, 2025
17 of 20 checks passed
@tgasser-nv tgasser-nv deleted the docs/in-memory-model-cache branch October 31, 2025 16:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

documentation Improvements or additions to documentation

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants