-
Notifications
You must be signed in to change notification settings - Fork 556
docs(cache): Documentation on in-memory caching #1480
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Greptile Overview
Greptile Summary
This PR adds documentation for the in-memory caching feature that allows users to configure caching for LLM models within Guardrails. The new file docs/user-guides/advanced/model-memory-cache.md demonstrates how to enable caching on both the main LLM and content-safety models through YAML configuration, with options for maxsize and statistics reporting. This fits into the advanced user guides section alongside other performance-optimization topics like KV-cache reuse, providing users with a mechanism to reduce redundant API calls by caching prompt/response pairs at the model level.
Important Files Changed
| Filename | Score | Overview |
|---|---|---|
| docs/user-guides/advanced/model-memory-cache.md | 2/5 | New documentation file for in-memory caching with incomplete sections (LFU Cache, Telemetry, Horizontal scaling) and minor punctuation error on line 6 |
Confidence score: 2/5
- This PR contains incomplete documentation that should not be merged in its current state due to three empty section headers.
- Score reflects missing critical content (cache eviction behavior, telemetry details, horizontal scaling implications) and lack of information about defaults, limitations, or cache key construction. The punctuation error on line 6 and empty sections indicate this is work-in-progress documentation.
- Pay close attention to lines 66-71where three section headers are present but have no content, and verify whether the cache configuration example accurately reflects the actual implementation before merging.
Sequence Diagram
sequenceDiagram
participant User
participant "Guardrails Runtime" as Runtime
participant "Cache Layer" as Cache
participant "LLM Model" as LLM
User->>Runtime: "Send user prompt"
Runtime->>Cache: "Check cache for prompt"
alt Cache Hit
Cache->>Runtime: "Return cached response"
Runtime->>Cache: "Update cache stats (hit)"
Runtime->>User: "Return cached response"
else Cache Miss
Cache->>Runtime: "No cached entry found"
Runtime->>LLM: "Send prompt to model"
LLM->>Runtime: "Return LLM response"
Runtime->>Cache: "Store prompt-response pair"
alt Cache Full
Cache->>Cache: "Evict LFU entry"
end
Cache->>Runtime: "Confirm storage"
Runtime->>Cache: "Update cache stats (miss)"
Runtime->>User: "Return LLM response"
end
opt Telemetry Enabled
Runtime->>Runtime: "Report cache statistics"
end
1 file reviewed, 2 comments
Documentation preview |
Codecov Report✅ All modified and coverable lines are covered by tests. 📢 Thoughts on this report? Let us know! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Greptile Overview
Greptile Summary
Adds comprehensive documentation for the in-memory LFU caching feature in NeMo Guardrails. The documentation covers configuration examples, cache mechanics, telemetry, and horizontal scaling considerations.
Key additions:
- Configuration examples showing cache setup for Content-Safety, Topic-Control, and Jailbreak detection models
- Explanation of LFU eviction algorithm and SHA256-based cache key hashing
- Documentation of cache statistics logging format and metrics
- Discussion of cache behavior in horizontally-scaled deployments
Issues found:
- Two TODO comments need resolution (stats reset behavior and log format)
- Minor typos and grammatical issues in horizontal scaling section
- The log format shown matches the implementation exactly, so the TODO comment can be removed
Confidence Score: 4/5
- This PR is safe to merge after addressing minor documentation issues (TODOs and typos)
- Documentation-only PR that accurately describes the caching implementation. The content aligns with the actual code in lfu.py. Main issues are incomplete TODO comments and minor typos that should be resolved before merge
- The single documentation file needs minor cleanup of TODO comments and typo fixes
Important Files Changed
File Analysis
| Filename | Score | Overview |
|---|---|---|
| docs/user-guides/advanced/model-memory-cache.md | 4/5 | New documentation for in-memory LFU caching feature with config examples, minor TODOs and typos to resolve |
Sequence Diagram
sequenceDiagram
participant User
participant Guardrails
participant Cache
participant LLM
User->>Guardrails: Send prompt
Guardrails->>Cache: Check cache (SHA256 hash of prompt)
alt Cache Hit
Cache-->>Guardrails: Return cached response
Guardrails->>Cache: Update frequency count
Guardrails-->>User: Return response (fast)
else Cache Miss
Cache-->>Guardrails: Not found
Guardrails->>LLM: Send prompt
LLM-->>Guardrails: Generate response
Guardrails->>Cache: Store response
alt Cache Full
Cache->>Cache: Evict LFU entry
end
Guardrails-->>User: Return response
end
Note over Cache: Log stats every log_interval seconds
1 file reviewed, 4 comments
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Greptile Overview
Greptile Summary
Adds comprehensive documentation for the new in-memory LFU caching feature for NeMo Guard models. The documentation covers configuration examples, cache mechanics (exact-match with SHA256 hashing, LFU eviction), telemetry/logging, and considerations for horizontally-scaled deployments.
Key additions:
- Example configurations showing cache setup for Content-Safety, Topic-Control, and Jailbreak Detection models
- Detailed explanation of cache operation with exact-match lookup and LFU eviction algorithm
- Documentation of cache statistics and OTEL telemetry integration
- Discussion of cache behavior in horizontally-scaled environments
Issues requiring attention:
- Two TODOs remain in the documentation (lines 127, 131) that need to be resolved
- Typo on line 158: "ne" should be "be"
- Line 157 has grammatically incomplete sentence about load balancing
- Previous comments have identified these issues with suggested fixes
Confidence Score: 3/5
- Documentation is mostly complete but contains unresolved TODOs and typos that should be fixed before merging
- Score reflects that this is documentation-only with no code changes (low risk), but the presence of explicit TODO markers and identified typos indicate the documentation is not yet production-ready. The author's commit message "some todos to fill in based on local integration testing" confirms this is work-in-progress.
- docs/user-guides/advanced/model-memory-cache.md requires cleanup of TODOs and typo corrections before merge
Important Files Changed
File Analysis
| Filename | Score | Overview |
|---|---|---|
| docs/user-guides/advanced/index.rst | 5/5 | Added new cache documentation page to the table of contents |
| docs/user-guides/advanced/model-memory-cache.md | 3/5 | New comprehensive caching documentation with TODOs remaining and a few typos that need correction before merge |
Sequence Diagram
sequenceDiagram
participant User
participant Guardrails
participant Cache as LFU Cache
participant LLM as NeMo Guard LLM
User->>Guardrails: Send prompt
Guardrails->>Cache: Check cache (SHA256 hash of prompt)
alt Cache Hit
Cache-->>Guardrails: Return cached response
Guardrails-->>User: Return response (fast)
Note over Cache: Update frequency counter<br/>Log hit to OTEL
else Cache Miss
Cache-->>Guardrails: Cache miss
Guardrails->>LLM: Send prompt
LLM-->>Guardrails: Return response
Guardrails->>Cache: Store response (hash -> response)
Note over Cache: Evict LFU entry if at maxsize<br/>Log miss to OTEL
Guardrails-->>User: Return response
end
opt Every log_interval seconds
Cache->>Cache: Log statistics<br/>(hits, misses, hit rate, etc.)
end
1 file reviewed, no comments
82ac81a to
35177fa
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Greptile Overview
Greptile Summary
This PR adds comprehensive documentation for the in-memory caching feature in NeMo Guardrails. The documentation covers LFU cache configuration for Nemoguard models (Content-Safety, Topic-Control, and Jailbreak Detection), explaining how the cache works with exact-match lookups using SHA256-hashed prompts, telemetry/logging capabilities, and considerations for horizontally-scaled deployments.
Key additions:
- Example configurations showing before/after with cache enabled
- Technical explanation of cache behavior (exact-match with LFU eviction)
- Cache statistics and monitoring via OTEL telemetry and periodic logging
- Discussion of cache behavior in multi-node deployments with load balancing
Issues to address:
- Several TODOs remain in the document (lines 127, 131) that need resolution
- Grammatical errors and typos (lines 6, 157, 158) identified in previous comments
- Previous reviewers provided specific suggestions for corrections that should be applied
Confidence Score: 3/5
- This documentation PR is mostly safe to merge but contains several TODO items and errors that should be resolved first
- The documentation provides good technical coverage and accurate examples, but has unresolved TODOs (lines 127, 131) and several syntax errors (missing period line 6, typo "ne" vs "be" line 158, grammatically incomplete sentence line 157) that were identified by previous reviewers. These issues don't break functionality but should be cleaned up before merging for a polished user experience
- docs/user-guides/advanced/model-memory-cache.md requires attention to resolve TODOs and fix syntax errors
Important Files Changed
File Analysis
| Filename | Score | Overview |
|---|---|---|
| docs/user-guides/advanced/model-memory-cache.md | 3/5 | New documentation for in-memory caching feature with several TODOs and typos that need fixing |
Sequence Diagram
sequenceDiagram
participant Client
participant Guardrails
participant Cache as In-Memory Cache
participant LLM as Nemoguard Model
Client->>Guardrails: User prompt
Guardrails->>Guardrails: Remove whitespace
Guardrails->>Guardrails: Hash prompt (SHA256)
Guardrails->>Cache: Check cache (hashed key)
alt Cache Hit
Cache-->>Guardrails: Return cached response
Guardrails-->>Client: Response (from cache)
else Cache Miss
Cache-->>Guardrails: Not found
Guardrails->>LLM: Send prompt
LLM-->>Guardrails: LLM response
Guardrails->>Cache: Store response (LFU eviction if full)
Guardrails-->>Client: Response (from LLM)
end
Note over Cache: Periodic stats logging<br/>(every log_interval seconds)
2 files reviewed, no comments
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Greptile Overview
Greptile Summary
Adds comprehensive documentation for in-memory caching feature for NeMoGuard models.
- Documents cache configuration with practical YAML examples showing Content-Safety, Topic-Control, and Jailbreak detection models
- Explains LFU (Least Frequently Used) cache algorithm with exact-match lookup using SHA256 hashing
- Details telemetry/logging capabilities including OTEL traces and configurable statistics logging
- Covers horizontal scaling considerations for distributed deployments
- Minor formatting issue in log example (line 132) shows decimal instead of fraction format
Confidence Score: 4/5
- Safe to merge with one minor correction to the log format example
- Documentation is comprehensive, technically accurate (verified against implementation), and well-structured. The only issue is a formatting error in the log output example. Previous reviewer comments have already identified the typos on lines 156-157 which should be addressed.
- No files require special attention beyond fixing the log format on line 132
Important Files Changed
File Analysis
| Filename | Score | Overview |
|---|---|---|
| docs/user-guides/advanced/model-memory-cache.md | 4/5 | New cache documentation - comprehensive and well-written with minor formatting error in log example |
Sequence Diagram
sequenceDiagram
participant User
participant Guardrails
participant Cache
participant LLM
User->>Guardrails: Send prompt
Guardrails->>Cache: Check if prompt exists (SHA256 hash)
alt Cache Hit
Cache-->>Guardrails: Return cached response
Guardrails-->>User: Return response (fast)
else Cache Miss
Cache-->>Guardrails: No cached response
Guardrails->>LLM: Send prompt to LLM
LLM-->>Guardrails: Return LLM response
Guardrails->>Cache: Store response (with eviction if needed)
Guardrails-->>User: Return response
end
Note over Cache: Statistics tracked:<br/>hits, misses, evictions,<br/>puts, updates, hit rate
1 file reviewed, 1 comment
Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com> Signed-off-by: Miyoung Choi <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Greptile Overview
Greptile Summary
Adds comprehensive documentation for in-memory caching feature supporting NeMoGuard models (Content-Safety, Topic-Control, and Jailbreak Detection). The documentation includes practical configuration examples, explains the LFU cache algorithm with SHA256 hashing, covers telemetry/logging, and discusses horizontal scaling considerations.
Key additions:
- Complete configuration examples showing cache setup with
maxsize,stats.enabled, andstats.log_intervalparameters - Explanation of cache operation using exact-match lookup and LFU eviction policy
- Documentation of cache statistics metrics and OTEL telemetry integration
- Discussion of cache behavior in horizontally-scaled deployments
The documentation is well-structured and informative, though some previous comments about formatting issues (log output format on line 132, sentence structure on line 156) have not yet been fully addressed.
Confidence Score: 4/5
- This PR is safe to merge with minor documentation formatting corrections
- The documentation is comprehensive, accurate, and provides valuable guidance for users implementing caching. However, there are a few minor issues flagged in previous comments that remain unresolved: (1) the log format example on line 132 doesn't match the actual implementation output format, and (2) sentence structure on line 156 could be clearer. These are documentation-only issues that don't affect functionality.
- No files require special attention - this is documentation-only with minor formatting issues
Important Files Changed
File Analysis
| Filename | Score | Overview |
|---|---|---|
| docs/user-guides/advanced/model-memory-cache.md | 4/5 | New comprehensive caching documentation with clear examples and explanations; minor formatting issues remain in log example |
Sequence Diagram
sequenceDiagram
participant User
participant Guardrails
participant Cache
participant LLM
User->>Guardrails: Send prompt
Guardrails->>Cache: Check cache (SHA256 hash of prompt)
alt Cache Hit
Cache-->>Guardrails: Return cached response
Guardrails-->>User: Return response (fast)
else Cache Miss
Cache-->>Guardrails: Not found
Guardrails->>LLM: Send prompt
LLM-->>Guardrails: Return response
Guardrails->>Cache: Store response (LFU eviction if full)
Guardrails-->>User: Return response
end
Note over Cache: Stats logged every log_interval seconds
Cache->>Cache: Update stats (hits/misses/evictions)
1 file reviewed, no comments
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Greptile Overview
Greptile Summary
This PR adds comprehensive documentation for the in-memory LLM cache feature in NeMo Guardrails. The documentation covers configuration examples, cache mechanics, telemetry/logging, and horizontal scaling considerations.
Key additions:
- New documentation page
docs/user-guides/advanced/model-memory-cache.mdexplaining the cache feature - Configuration examples showing before/after scenarios with cache enabled
- Details on LFU (Least-Frequently-Used) eviction algorithm and SHA256 hashing for security
- Explanation of OTEL telemetry integration and configurable statistics logging
- Discussion of cache behavior in horizontally-scaled deployments
Issues to address:
The previous review comments identified several syntax issues that need correction before merging:
- Missing period at end of sentence (line 6)
- Empty placeholder sections that should be removed (line 66)
- Incorrect log format example showing decimal instead of fraction format (line 156)
- Grammatical issues in horizontal scaling section
Once these syntax corrections are applied, this documentation will provide clear guidance for users implementing caching in their Guardrails deployments.
Confidence Score: 4/5
- This documentation PR is safe to merge after addressing the syntax corrections identified in previous comments
- The documentation is well-structured and comprehensive, but contains several syntax and formatting errors that were flagged in previous review comments. These are straightforward corrections that don't affect the technical accuracy of the content.
- Pay attention to
docs/user-guides/advanced/model-memory-cache.mdwhich needs syntax corrections before merging
Important Files Changed
File Analysis
| Filename | Score | Overview |
|---|---|---|
| docs/index.md | 5/5 | Added reference to new model-memory-cache documentation page in the table of contents |
| docs/user-guides/advanced/model-memory-cache.md | 4/5 | New documentation page for in-memory LLM cache feature with configuration examples, telemetry details, and horizontal scaling considerations. Several syntax issues flagged in previous comments need correction. |
Sequence Diagram
sequenceDiagram
participant User
participant Guardrails
participant Cache
participant LLM
User->>Guardrails: Send request with prompt
Guardrails->>Guardrails: Remove whitespace from prompt
Guardrails->>Guardrails: Hash prompt with SHA256
Guardrails->>Cache: Check for cached response
alt Cache Hit
Cache-->>Guardrails: Return cached response
Guardrails-->>User: Return response (fast)
else Cache Miss
Cache-->>Guardrails: No cached response
Guardrails->>LLM: Send prompt to LLM
LLM-->>Guardrails: Return LLM response
Guardrails->>Cache: Store response (LFU eviction if full)
Guardrails-->>User: Return response
end
Note over Cache: Statistics logged every log_interval seconds
Note over Cache: Stats tracked via OTEL telemetry
3 files reviewed, no comments
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Greptile Overview
Greptile Summary
Added comprehensive documentation for the in-memory caching feature for Nemoguard models (Content-Safety, Topic-Control, and Jailbreak Detection). The documentation covers configuration examples, cache mechanics (LFU algorithm with SHA256 hashing), telemetry/logging capabilities, and horizontal scaling considerations.
Key additions:
- Configuration examples showing before/after enabling caching
- Detailed explanation of cache operations (exact matching after whitespace removal, LFU eviction policy)
- Documentation of cache statistics logging format and metrics
- Discussion of cache behavior in horizontally-scaled deployments
- Integration with OpenTelemetry for observability
Issues addressed in previous comments:
- Several syntax corrections (punctuation, typos, grammatical fixes)
- Log format corrections to match actual implementation
- Style suggestions for placeholder sections
Confidence Score: 4/5
- This documentation-only PR is safe to merge after addressing the previously noted syntax corrections
- The PR adds new documentation without modifying any code. Previous review comments identified several minor syntax issues (punctuation, typos, log format accuracy) that should be addressed. The documentation is well-structured, accurate to the implementation, and provides comprehensive coverage of the caching feature
- docs/user-guides/advanced/model-memory-cache.md requires syntax corrections from previous comments before merge
Important Files Changed
File Analysis
| Filename | Score | Overview |
|---|---|---|
| docs/user-guides/advanced/model-memory-cache.md | 4/5 | New documentation file for in-memory caching feature; well-structured with examples, implementation details, and operational considerations; previous comments addressed syntax issues |
Sequence Diagram
sequenceDiagram
participant User
participant Guardrails
participant Cache
participant LLM
User->>Guardrails: Send prompt
Guardrails->>Guardrails: Remove whitespace & hash with SHA256
Guardrails->>Cache: Check if prompt exists (exact match)
alt Cache Hit
Cache->>Guardrails: Return cached response
Guardrails->>User: Return response (fast path)
else Cache Miss
Cache->>Guardrails: No cached response
Guardrails->>LLM: Forward prompt
LLM->>Guardrails: Return LLM response
Guardrails->>Cache: Store response (LFU eviction if full)
Cache->>Cache: Update frequency counters
Guardrails->>User: Return response
end
Note over Cache: Stats logged every log_interval seconds
Cache->>Cache: Track hits, misses, evictions, hit rate
1 file reviewed, no comments
Signed-off-by: Miyoung Choi <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Greptile Overview
Greptile Summary
Adds comprehensive documentation for the in-memory caching feature that optimizes LLM calls by storing and reusing responses for repeated prompts.
Major additions:
- Complete guide for configuring caches on NeMoGuard models (Content-Safety, Topic-Control, Jailbreak Detection)
- Detailed explanation of LFU (Least Frequently Used) cache eviction algorithm
- Documentation of cache statistics, telemetry, and logging capabilities
- Discussion of performance considerations in horizontally-scaled deployments
- Before/after configuration examples showing cache setup
The documentation accurately reflects the implementation in nemoguardrails/llm/cache/lfu.py. Previous review comments have already identified minor syntax issues that should be addressed before merging.
Confidence Score: 4/5
- Safe to merge after addressing minor syntax issues already identified in previous comments
- Documentation-only PR with accurate technical content. Previous comments caught syntax/formatting issues that need fixes. No logical errors or security concerns.
- docs/user-guides/advanced/model-memory-cache.md requires attention to address the syntax issues flagged in previous comments
Important Files Changed
File Analysis
| Filename | Score | Overview |
|---|---|---|
| docs/user-guides/advanced/model-memory-cache.md | 4/5 | New comprehensive documentation for in-memory caching feature. Previous comments identified minor syntax issues that need resolution. |
Sequence Diagram
sequenceDiagram
participant User
participant Guardrails
participant Cache
participant LLM
User->>Guardrails: Send prompt
Guardrails->>Cache: Check cache (hash of prompt)
alt Cache Hit
Cache-->>Guardrails: Return cached response
Guardrails-->>User: Return response (no LLM call)
else Cache Miss
Cache-->>Guardrails: Not found
Guardrails->>LLM: Send prompt
LLM-->>Guardrails: Return response
Guardrails->>Cache: Store response (hash -> value)
alt Cache Full
Cache->>Cache: Evict LFU entry
end
Guardrails-->>User: Return response
end
Note over Cache: Stats tracked: hits, misses,<br/>evictions, hit rate
Cache->>Cache: Log stats (configurable interval)
2 files reviewed, no comments
|
The Python 3.10 "/home/runner/work/_temp/2e06cfe6-b7ff-47d3-a958-b6f22ed389fb.sh: line 1: 2770 Illegal instruction (core dumped) poetry run pytest -v I've seen this flaky behaviour in another PR. There's no way this PR could cause it to fail, there are no code changes. Bypassing this check and merging |
Description
Related Issue(s)
Checklist