chore: change log levels #3333

MasterPtato · 2025-11-03T23:27:57Z

No description provided.

vercel · 2025-11-03T23:28:04Z

The latest updates on your projects. Learn more about Vercel for GitHub.

Project	Deployment	Preview	Comments	Updated (UTC)
rivetkit-serverless	Ready	Preview	Comment	Nov 5, 2025 2:08am

3 Skipped Deployments

Project	Deployment	Preview	Updated (UTC)
rivet-cloud	Ignored		Nov 5, 2025 2:08am
rivet-inspector	Ignored	Preview	Nov 5, 2025 2:08am
rivet-site	Ignored	Preview	Nov 5, 2025 2:08am

MasterPtato · 2025-11-03T23:28:19Z

Warning

This pull request is not mergeable via GitHub because a downstack PR is open. Once all requirements are satisfied, merge this PR as a stack on Graphite.
Learn more

How to use the Graphite Merge Queue

Add the label merge-queue to this PR to add it to the merge queue.

You must have a Graphite account in order to use the merge queue. Sign up using this link.

_{An organization admin has enabled the Graphite Merge Queue in this repository.} _{Please do not merge from GitHub as this will restart CI on PRs being processed by the merge queue.}

This stack of pull requests is managed by Graphite. Learn more about stacking.

claude · 2025-11-03T23:29:18Z

Code Review

This PR systematically adjusts logging levels across the codebase, primarily changing tracing::info! calls to tracing::debug! and removing some startup/shutdown log messages. Here's my detailed feedback:

✅ Positive Aspects

Consistent approach: The changes follow a clear pattern of reducing log verbosity for detailed operational information
Reduced log noise: Moving verbose consensus protocol messages (PreAccept, Accept, Commit, etc.) from INFO to DEBUG is appropriate for production environments
Code formatting improvements: The PR also standardizes spacing in structured logging (e.g., runner_name=?path.runner_name instead of runner_name = ?path.runner_name)

🔍 Areas of Concern

1. Critical Event Visibility ⚠️

Some changes might hide important operational events:

engine/packages/cache-purge/src/lib.rs:

Removed "starting cache purge subscriber service" and "cache purge subscriber service stopped" messages
Recommendation: Keep these at INFO level. Service lifecycle events are typically important for operators to track service health and restarts.

engine/packages/epoxy/src/workflows/coordinator/reconfigure.rs:

Lines like "added N replicas as joining" (line 202) changed to DEBUG
Recommendation: Configuration changes in a distributed consensus system are significant events that operators should see by default. Consider keeping at INFO or even WARN level.

engine/packages/epoxy/src/workflows/replica/setup.rs:

"notifying coordinator that replica is active" and "notified coordinator of active status" moved to DEBUG
Recommendation: Replica lifecycle transitions should remain at INFO level for visibility into cluster health.

2. Incomplete Context in Service Manager 📝

engine/packages/service-manager/src/lib.rs:
The diff was truncated but shows changes to service startup/restart logging. Based on the visible changes:

service=%service.name formatting changes are good
Moving "service exited unexpectedly" and "service crashed" to DEBUG would be inappropriate (they appear to still be ERROR, which is correct)

3. Warning Level Changes ⚠️

engine/packages/gasoline/src/ctx/message.rs:165:

-tracing::warn!(?err, "publish message failed, trying again");
+tracing::debug!(?err, "publish message failed, trying again");

Concern: Transient failures that trigger retries should probably remain at WARN level, especially in a workflow engine where message delivery is critical
Recommendation: Keep this at WARN level to help identify flaky network conditions or overloaded systems

engine/packages/gasoline/src/worker.rs:155:
The improved message "stop channel closed, workflow likely already stopped" is better, but the context suggests this might be a race condition worth tracking at WARN level.

4. Sleep Duration Warning Threshold

engine/packages/gasoline/src/ctx/workflow.rs:1054:

-if !replay && duration < -50 {
+if !replay && duration < -25 {

Question: Why change the threshold from 50ms to 25ms? This makes the warning more sensitive
Recommendation: Add a comment explaining the rationale, or consider reverting this change as it's unrelated to log level adjustments

📊 Performance Considerations

✅ No performance concerns. The changes reduce string formatting overhead when DEBUG level is disabled, which is a minor improvement.

🔒 Security Concerns

✅ No security issues identified.

🧪 Test Coverage

Recommendation: Consider adding integration tests that verify:

Critical lifecycle events are logged at appropriate levels
Operational visibility is maintained for cluster health monitoring
Error conditions are still visible at default log levels

📋 Suggested Changes

Revert lifecycle events to INFO:
- Cache purge service start/stop
- Replica joining/leaving events
- Configuration epoch changes
- Replica status transitions
Keep transient errors at WARN:
- Message publishing retries
- Workflow stop channel failures (if they indicate race conditions)
Consider using INFO for configuration changes:
- engine/packages/epoxy/src/workflows/coordinator/reconfigure.rs - config changes
- engine/packages/epoxy/src/workflows/coordinator/replica_status_change.rs - status updates
Add context to the PR description:
- What prompted this change?
- What's the target default log level for production?
- Are there specific log aggregation/analysis tools being used?

📚 Reference

From CLAUDE.md:

Use tracing for logging. Do not format parameters into the main message, instead use tracing's structured logging.

Log messages should be lowercase unless mentioning specific code symbols.

✅ The PR follows these guidelines correctly.

Summary

The direction is good - reducing log verbosity is important for production systems. However, I recommend being more selective about which events move to DEBUG level. Lifecycle events, configuration changes, and transient failures should generally remain visible at INFO or WARN levels to maintain operational visibility.

Would you like me to suggest specific lines to revert, or do you have additional context about the operational requirements that informed these changes?

claude · 2025-11-04T21:29:54Z

PR Review: Log Level Adjustments

Summary

This PR adjusts logging levels across multiple packages in the Rivet engine, primarily converting tracing::info! calls to tracing::debug! for verbose operational logs and removing some redundant startup/shutdown messages. This is a good housekeeping change to reduce log noise in production.

Code Quality & Best Practices ✅

Strengths:

Consistent application: The changes are systematically applied across the codebase, converting verbose operational logs to debug level
Follows CLAUDE.md conventions: Uses structured logging with ? and =? for debug formatting, lowercase messages (e.g., engine/packages/api-public/src/runner_configs/upsert.rs:174)
Preserves important logs: Appropriately keeps tracing::warn! and tracing::error! calls at their current levels
Improves signal-to-noise ratio: Moving granular consensus protocol messages (pre-accept, accept, prepare, commit) to debug level is appropriate

Minor Observations:

Spacing consistency (engine/packages/api-public/src/runner_configs/upsert.rs:174): Changed runner_name = ?path.runner_name to runner_name=?path.runner_name (removed space around =). While this is consistent with other changes in the PR, I noticed the codebase has mixed spacing styles. Consider standardizing this across the project in a future PR.
Message clarity improvements (engine/packages/gasoline/src/worker.rs:155-158): Excellent improvement! The updated message "stop channel closed, workflow likely already stopped" provides better context than the previous version.
Log removal (engine/packages/cache-purge/src/lib.rs:7-8, 43-44): Removed "starting cache purge subscriber service" and "cache purge subscriber service stopped" messages. This is reasonable since the subscription message at line 12 provides sufficient context.

Specific Changes Worth Noting

EPaxos Protocol Logs (Multiple files in engine/packages/epoxy/):

✅ Converting consensus protocol messages (pre-accept, accept, prepare, commit) from info to debug is appropriate - these are high-frequency messages that should only be visible during debugging
✅ HTTP client fanout messages also appropriately moved to debug level

Gasoline Workflow Engine (engine/packages/gasoline/src/ctx/workflow.rs:1054):

✅ Adjusted threshold from -50 to -25 for negative sleep duration warning. This is a reasonable change, though it would be helpful to understand the rationale (is this to reduce false positives due to timing variations?). Consider adding a comment explaining the threshold choice.

Service Manager (engine/packages/service-manager/src/lib.rs):

✅ Service restart/crash logs remain at appropriate levels (info for restarts, error for crashes)
✅ Only the verbose per-service debug logs were moved to debug level

Tracing Reconfigure (engine/packages/tracing-reconfigure/src/lib.rs:21):

✅ Changed initial subscription message from info to debug and removed startup/shutdown messages, which is appropriate for a background service

Potential Issues 🔍

No critical issues identified, but a few considerations:

Loss of production visibility: While these changes improve log cleanliness, ensure that important operational events are still visible at info level. The changes look good, but consider:
- Do ops teams have proper access to debug logs when troubleshooting?
- Are there metrics/alerts to compensate for reduced log visibility?
Removed shutdown messages: Several services had shutdown/exit messages removed (e.g., cache-purge, tracing-reconfigure). While this reduces noise, these messages can be helpful for debugging unexpected service terminations. Consider if any should be retained at warn level.

Performance Considerations ✅

Positive impact: Reducing log volume at info level will improve performance in production by reducing I/O, serialization overhead, and log processing costs
No runtime cost: Debug logs are filtered at compile time or early in the logging pipeline when disabled
The changes to high-frequency paths (EPaxos consensus, HTTP message sending) will have the most noticeable impact

Security Concerns ✅

No security concerns identified. The changes only affect logging levels, not application logic.

Test Coverage

Question: Are there any tests that assert on specific log messages at info level that might now fail? Consider running:

cargo test -- --nocapture 2>&1 | grep -i "assert.*log\|expect.*log\|verify.*log"

Recommendations

Documentation: Consider documenting the logging level conventions in CLAUDE.md or a separate LOGGING.md:
- error: Application errors that require attention
- warn: Unexpected conditions that don't cause failures
- info: Important state changes and lifecycle events
- debug: Verbose operational details for troubleshooting
Future improvement: Consider using log levels more semantically:
- Configuration changes → info (retained correctly)
- Health checks → debug (changed correctly)
- Consensus protocol details → debug (changed correctly)
- Service lifecycle → info (some were removed; consider retaining at info or warn)
Consistency check: Run this to ensure consistent spacing in structured logging:
```
rg 'tracing::(info|debug|warn|error)!\([^)]*\s=\s[?]' engine/
```
Then decide on a standard (space or no space around =).

Conclusion

Overall Assessment: ✅ Approved with minor suggestions

This is a well-executed cleanup PR that improves production log quality by moving verbose operational logs to debug level. The changes are consistent, preserve important information, and follow project conventions. The specific improvements to EPaxos, Gasoline, and service manager logs are particularly valuable for reducing noise in production environments.

Suggested follow-ups (not blockers):

Document logging level conventions
Standardize structured logging spacing
Verify no test failures due to log level changes
Consider if any removed shutdown messages should be retained

Great work on improving the logging hygiene! 🎉

claude · 2025-11-04T21:29:56Z

Code Review

Thank you for this PR! I've reviewed the changes to standardize log levels across the codebase. Here's my feedback:

Overview

This PR systematically changes log levels from info to debug for verbose operational messages, and makes minor formatting improvements to structured logging. The changes span 30 files with 92 additions and 94 deletions.

Positive Aspects

Improved Log Signal-to-Noise Ratio: Converting verbose operational messages to debug level is a good practice. Messages like "handling pre-accept message", "sending message to replica", and similar low-level protocol operations are more appropriate at debug level.
Consistent Structured Logging: The formatting changes (e.g., runner_name = ?path.runner_name → runner_name=?path.runner_name) improve consistency, though this is a minor style point.
Removal of Redundant Logs: Removing logs like "starting cache purge subscriber service" at the beginning of functions when there's already a more informative log right after is good cleanup.
Appropriate Level Selection: The changes show good judgment about what should be debug vs info vs warn. For example:
- Protocol message handling (debug) ✓
- Health checks (debug) ✓
- Config updates (debug) ✓
- Service crashes (error) ✓ (kept as-is)

Concerns and Suggestions

1. Sleep Duration Warning Threshold Change (`packages/gasoline/src/ctx/workflow.rs:1054`)

-if !replay && duration < -50 {
+if !replay && duration < -25 {
    tracing::warn!(%duration, "tried to sleep for a negative duration");
}

Concern: This changes the warning threshold from -50ms to -25ms, making warnings more sensitive. This could lead to more noise if there are legitimate cases of small negative sleep durations due to clock skew or timing variations.

Question: What's the rationale for this change? Is there evidence that the -50ms threshold was masking problematic behavior? This seems unrelated to the log level changes and might deserve its own investigation.

Recommendation: Consider reverting this change or providing justification in the PR description.

2. Message Publishing Retry Logging (`packages/gasoline/src/ctx/message.rs:165`)

-tracing::warn!(?err, "publish message failed, trying again");
+tracing::debug!(?err, "publish message failed, trying again");

Concern: Downgrading publish failures from warn to debug might make it harder to detect message delivery issues in production. While retries are expected, frequent failures could indicate network issues, NATS problems, or resource constraints.

Recommendation: Consider keeping this at warn level, or at minimum info, since failed message publishes (even with retry) are operationally significant events that operators should be aware of.

3. Removed Service Start/Stop Logs (`packages/cache-purge/src/lib.rs`)

-tracing::info!("starting cache purge subscriber service");
 // ... service code ...
-tracing::warn!("cache purge subscriber service stopped");

Concern: While the "starting" message might be redundant with the subscription log, the "stopped" warning is valuable for understanding service lifecycle, especially for debugging unexpected shutdowns.

Recommendation: Consider keeping the "service stopped" log at warn or info level, as it signals an important state transition.

4. Sampler Ratio Updates (`packages/metrics/src/providers.rs:69`)

-tracing::info!(?ratio, "updated sampler ratio");
+tracing::debug!(?ratio, "updated sampler ratio");

Concern: Sampling ratio changes affect what metrics are collected and can impact observability. This is a configuration change that operators might want to track.

Recommendation: Consider keeping this at info level as it's an operationally significant event.

5. Log Filter Reload (`packages/runtime/src/traces.rs:155`)

Similar concern as #4 - reloading log filters is a significant operational event that changes observability behavior and should probably remain at info level.

Testing Recommendations

Verify Production Observability: Ensure that with these changes, you can still effectively monitor and debug production issues. Consider:
- Can you detect and diagnose EPaxos consensus issues?
- Can you identify message delivery problems?
- Can you track configuration changes?
Integration Testing: Test that important operational events are still visible at default log levels in your monitoring systems.
Performance: While not a primary concern here, verify that the changes don't inadvertently affect performance (though moving to debug should generally improve it).

Minor Issues

Inconsistent Formatting: Some changes standardize the spacing in structured logging (removing spaces around =), but this appears inconsistent across the codebase. Consider a follow-up to standardize this everywhere if it's the preferred style.

Worker Stop Channel (packages/gasoline/src/worker.rs:155-158):

-tracing::warn!(?workflow_id, "stop channel closed");
+tracing::warn!(?workflow_id, "stop channel closed, workflow likely already stopped");

Good improvement! The additional context makes the log message more actionable.

Summary

Overall Assessment: This PR makes valuable improvements to log hygiene by reducing noise at info level. However, there are a few cases where the downgrade might reduce operational visibility for important events.

Recommendation:

✅ Approve the majority of changes (protocol operations, health checks, etc. → debug)
⚠️ Reconsider: message publish failures, service lifecycle events, config changes, and the sleep threshold change
📝 Document: Consider adding a comment in the PR description about the criteria used for choosing log levels

Questions for the Author

What prompted this change? Was production logging too noisy?
Have you validated these changes against production logs to ensure critical events remain visible?
Is the sleep threshold change intentional and related to this PR?

Let me know if you'd like to discuss any of these points!

claude · 2025-11-05T01:58:32Z

PR Review: Log Level Changes

Summary

This PR adjusts logging levels across the codebase, primarily downgrading info logs to debug and removing some redundant startup/shutdown logs. The changes touch 30 files with 92 additions and 94 deletions.

Positive Observations

Reduced Log Noise: Moving verbose protocol-level logs to debug (e.g., EPaxos message handling in epoxy/) is appropriate for production environments
Consistent Formatting: Fixed spacing inconsistencies in structured logging (e.g., runner_name = ? → runner_name=?)
Strategic Downgrades: Protocol-level operations like pre-accept, accept, commit messages are correctly moved to debug level
Cleanup: Removed redundant "starting/stopping" wrapper logs in cache-purge/src/lib.rs

Issues & Concerns

1. Critical: Potential Loss of Important Operational Visibility

Several downgraded logs may be important for production operations:

Service Manager (service-manager/src/lib.rs):

Lines 141, 148: Service crash/restart logs downgraded from error/info to all using service name formatting
While the formatting change is fine, ensure crash logs remain at error level for alerting

Workflow Worker (gasoline/src/worker.rs:155):

Changed warning message about closed stop channel, but this might indicate a real issue
Consider if "workflow likely already stopped" assumption is always safe

Epoxy Configuration (epoxy/src/workflows/coordinator/reconfigure.rs):

Lines 269-285: Config changes during reconfigure downgraded to debug
These seem like important operational events that should remain at info or warn

2. Inconsistency: Selective Downgrades

Some similar operations have different log levels:

Health checks are debug, but some coordinator operations remain at higher levels
Consider if there's a clear policy for what should be info vs debug

3. Code Style: Adheres to CLAUDE.md Guidelines ✅

Per the project's logging patterns, the structured logging format changes are correct:

Using tracing::info!(?x, "foo") pattern properly
Lowercase messages maintained
Proper use of structured fields

4. Specific Concern: Sleep Duration Warning Threshold

gasoline/src/ctx/workflow.rs:1054:

-if !replay && duration < -50 {
+if !replay && duration < -25 {

This tightens the warning threshold for negative sleep durations
Question: Was this causing too many false positives? Is -25ms the right threshold?
Consider documenting why this specific threshold was chosen

5. Removed Logs Without Replacement

cache-purge/src/lib.rs:

Removed "starting cache purge subscriber service" log
Removed "cache purge subscriber service stopped" log
Concern: These lifecycle events can be useful for debugging service startup issues
Consider keeping at least one log at startup to confirm subscription succeeded

tracing-reconfigure/src/lib.rs:

Similar removal of startup logs
May make it harder to verify the service is running

Recommendations

Review Downgraded Error/Warn Logs: Ensure configuration change failures in coordinator remain at warn level for visibility
Document Threshold Change: Add a comment explaining the -25ms sleep duration threshold
Consider Keeping Lifecycle Logs: At least one info log per service on successful startup helps operations
Test Log Volume: Verify these changes achieve the desired reduction in production log volume without losing critical information

Testing Coverage

No test changes included. Consider:

Manual verification of log output at different levels in development
Checking production log volume impact in staging environment

Security Concerns

None identified - purely logging changes.

Performance Impact

Positive - reducing log volume should improve performance slightly, especially in high-throughput paths like EPaxos message handling.

Verdict

The PR is generally well-intentioned and improves log hygiene, but please address:

The sleep duration threshold change rationale
Whether config change failures should remain at warn
Consider keeping minimal lifecycle logs for operational visibility

Overall: Approve with minor suggestions ✅

The core changes are sound, but some specific downgrades may need reconsideration based on operational requirements.

claude · 2025-11-05T02:01:21Z

PR Review: Log Level Changes

Thanks for this PR @MasterPtato! I've reviewed the changes and have the following feedback:

✅ Positive Aspects

Consistent approach: The PR systematically downgrades many tracing::info! calls to tracing::debug! across multiple packages, which will reduce log noise in production
Adheres to logging conventions: The spacing changes (e.g., runner_name = ?path.runner_name → runner_name=?path.runner_name) align with the structured logging patterns shown in CLAUDE.md
Removes redundant logs: Removing duplicate "starting" messages in cache-purge/src/lib.rs (lines 8 & 14) improves log clarity
Appropriate scope: Changes are focused on logging only, making the PR easy to review and unlikely to introduce functional bugs

📊 Key Changes Summary

92 additions, 94 deletions across 30 files
Main change: info! → debug! for verbose operational messages
Spacing standardization: key = ?value → key=?value
Removal of some startup/shutdown log messages

🔍 Observations & Suggestions

1. Sleep Duration Warning Threshold Change (gasoline/src/ctx/workflow.rs:1054)

-if !replay && duration < -50 {
+if !replay && duration < -25 {

Question: This changes the warning threshold from -50ms to -25ms for negative sleep durations. This makes the warning more sensitive (will trigger more often). Was this intentional? If so, it would be helpful to understand the reasoning in the PR description.

2. Critical Path Logging (epoxy/src/http_client.rs)

The changes downgrade several HTTP message sending logs from info! to debug!:

send_message_to_address function logs (lines 118, 129, 186)
fanout_quorum function logs (line 60)

Consideration: These are core consensus protocol operations. While I understand the desire to reduce log noise, consider whether you want visibility into message passing for debugging distributed consensus issues in production. You might want to keep some of these at info! level, especially for failed attempts or slow operations.

3. Removed Service Lifecycle Logs (cache-purge/src/lib.rs)

-tracing::info!("starting cache purge subscriber service");
// ...
-tracing::warn!("cache purge subscriber service stopped");

Suggestion: While the "starting" message was redundant (there's already a subscription log), the "stopped" message could be valuable for diagnosing unexpected service terminations. Consider keeping it at warn! level since a stopped service is potentially problematic.

4. Service Manager Warning (gasoline/src/worker.rs:155-158)

 if wf.stop.send(()).is_err() {
     tracing::warn!(
         ?workflow_id,
-        "stop channel closed"
+        "stop channel closed, workflow likely already stopped"
     );
 }

Good change: The improved message provides better context for this expected race condition.

5. Performance Impact

✅ Positive: Downgrading verbose logs to debug! will improve performance in production since debug logs are typically filtered out. The tracing crate is efficient, but reducing log volume still helps.

🐛 Potential Issues

None identified. The changes are mechanical and don't alter control flow or logic.

🔒 Security Concerns

None identified. The changes don't affect authentication, authorization, or data handling.

🧪 Test Coverage

Note: This PR doesn't include test changes, which is appropriate since it only modifies logging. However, you might want to:

Verify that critical error paths still have adequate logging at warn! or error! levels
Ensure that diagnostic information needed for debugging production issues is still available

📝 Recommendations

Add PR description: Explain the motivation (e.g., "reducing log noise in production environments" or "improving signal-to-noise ratio in logs")
Clarify the sleep threshold change: Document why -25ms threshold is better than -50ms
Consider selective info logs: For critical distributed consensus operations (EPaxos protocol), you might want to keep some info-level logs for production debugging
Document log level philosophy: Consider adding guidance to CLAUDE.md about when to use debug! vs info! vs warn! to maintain consistency going forward

✨ Overall Assessment

This is a solid cleanup PR that will improve log quality by reducing noise. The changes are mechanical, safe, and follow established patterns. The main suggestion is to add context in the PR description and reconsider a few strategically important logs.

Recommendation: ✅ Approve with minor suggestions - The changes are good, just add some context to the PR description.

Generated with Claude Code

claude · 2025-11-05T02:10:54Z

Code Review - PR #3333: Change Log Levels

Summary

This PR makes systematic changes to logging levels throughout the codebase, primarily converting tracing::info! calls to tracing::debug! and standardizing formatting. Overall, this is a positive change that will reduce log noise in production environments.

Strengths

Consistent approach: The changes follow a clear pattern - demoting verbose operational logs from info to debug level
Good targeting: Correctly identifies logs that are more useful for debugging than production monitoring:
- Protocol message handling (pre-accept, accept, commit, etc.)
- HTTP client communication details
- Consensus algorithm internals
- Workflow state transitions
Formatting improvements: Standardizes parameter formatting (e.g., runner_name=?path.runner_name instead of runner_name = ?path.runner_name)
Cleanup: Removes redundant startup/shutdown messages in cache-purge and tracing-reconfigure services

Issues & Concerns

1. Potentially Important Logs Demoted (Minor)

Some demoted logs might still be valuable at info level for production monitoring:

engine/packages/epoxy/src/workflows/coordinator/replica_status_change.rs:88-92: Replica status changes are significant events that operators likely want to see
engine/packages/service-manager/src/lib.rs:125: Service start count is useful startup information
engine/packages/cache-purge/src/lib.rs:12: Initial subscription confirmation helps verify the service is working

Recommendation: Consider keeping these at info level, or document the rationale for demoting them.

2. Sleep Duration Warning Threshold Change (Needs Clarification)

engine/packages/gasoline/src/ctx/workflow.rs:1054:

-if !replay && duration < -50 {
+if !replay && duration < -25 {

This changes the threshold for warning about negative sleep durations from 50ms to 25ms. This will produce more warnings for smaller timing discrepancies.

Questions:

Was this intentional or an oversight?
Is there a specific reason for the tighter threshold?
Should this be in a separate commit/PR since it's a behavioral change, not just a log level change?

3. Missing Context in Commit Message

The commit message "chore: change log levels" doesn't explain:

Why these changes were made
What problem they solve (e.g., "reduce log noise in production")
Whether any alerting or monitoring depends on these log levels

Recommendation: Add more context to help future maintainers understand the rationale.

4. Worker Stop Channel Warning (Good Fix)

engine/packages/gasoline/src/worker.rs:155-158: This is a good improvement - explaining that a closed channel likely means the workflow already stopped provides helpful context.

Code Quality

✅ Follows CLAUDE.md conventions:

Lowercase log messages ✓
Structured logging with proper parameter syntax ✓
Consistent formatting ✓

✅ No functionality changes (except the sleep warning threshold)

✅ No security concerns

✅ No performance impact

Test Coverage

⚠️ No tests included - This is acceptable for a logging change, but consider:

Manual verification that critical operational events are still visible
Checking if any monitoring/alerting relies on these log levels

Recommendations

Split the sleep threshold change: Consider moving the duration < -25 change to a separate PR with explanation
Document the strategy: Add a comment in the PR description explaining the criteria for info vs debug:
- info: Service lifecycle, significant state changes, errors
- debug: Protocol details, individual message handling, detailed workflow steps
Audit remaining info logs: Consider if other protocol-level logs should also be demoted for consistency
Consider structured logging levels: For critical services like Epoxy coordinator, consider if replica status changes merit info level

Verdict

Approve with minor suggestions ✅

This PR improves the signal-to-noise ratio in production logs. The changes are safe and follow good practices. The sleep threshold change should be clarified, but the overall direction is sound.

Generated by Claude Code - Review based on CLAUDE.md conventions and Rust best practices

claude · 2025-11-05T02:13:17Z

PR Review: Log Level Changes

Overall Assessment

This PR makes systematic changes to adjust log levels across the codebase. The changes are consistent and follow a clear pattern. Overall, this is a clean refactoring that improves observability by reducing log noise.

Positive Observations

✅ Consistent Pattern: The PR consistently moves verbose operational logs from info to debug level, which is appropriate for:

Internal protocol messages (Epoxy consensus messages: pre-accept, accept, commit, etc.)
HTTP client operations
Routine state transitions
Health checks and configuration updates

✅ CLAUDE.md Compliance: The changes follow the structured logging convention correctly:

Uses ? for debug formatting (e.g., ?replica_id, ?instance)
Properly structured field names
Consistent spacing around = in log statements

✅ Removes Redundant Logs: Appropriately removes service start/stop logs in cache-purge/src/lib.rs:7-8,42-43 that don't add value

✅ Improves Message Clarity: In gasoline/src/worker.rs:155-157, the warning message was enhanced to be more descriptive: "stop channel closed" → "stop channel closed, workflow likely already stopped"

Areas of Concern

⚠️ Inconsistent Spacing in One Location: In engine/packages/api-public/src/runner_configs/upsert.rs:174, the change introduces inconsistent spacing:

// Changed from:
runner_name = ?path.runner_name
// To:
runner_name=?path.runner_name  // Missing space around =

This is inconsistent with the rest of the codebase which uses field = value (with spaces). Recommendation: Change to runner_name = ?path.runner_name for consistency.

⚠️ Threshold Change Without Context: In gasoline/src/ctx/workflow.rs:1054, the threshold for negative duration warnings was changed from -50 to -25 ms:

if !replay && duration < -25 {  // Was -50
    tracing::warn!(%duration, "tried to sleep for a negative duration");
}

Question: Was this an intentional tightening of the tolerance, or related to the log level changes? This seems orthogonal to the log level refactoring. Consider explaining this change in the PR description or commit message.

Impact Analysis

Performance: ✅ Positive - Reducing log verbosity will improve performance in production by reducing I/O and serialization overhead.

Observability: ✅ Positive - Moving routine operational logs to debug level reduces noise while keeping important information available when needed.

Security: ✅ No concerns - No security-sensitive data is being exposed or hidden.

Test Coverage: ℹ️ N/A - Log level changes don't require test coverage changes.

Recommendations

Fix spacing inconsistency in api-public/src/runner_configs/upsert.rs:174 to use runner_name = ?path.runner_name
Document the threshold change in gasoline/src/ctx/workflow.rs:1054 or move it to a separate commit if it's unrelated to log levels
Consider: If you haven't already, verify that important error conditions are still logged at warn or error levels (though this PR doesn't seem to downgrade any error logs)

Verdict

Approve with minor suggestions - This is a solid cleanup that improves the signal-to-noise ratio in production logs. The inconsistency and undocumented threshold change are minor issues that should be addressed, but they don't block the PR.

🤖 Generated with Claude Code

graphite-app · 2025-11-05T02:20:54Z

Merge activity

Nov 5, 2:20 AM UTC: NathanFlurry added this pull request to the Graphite merge queue.
Nov 5, 2:21 AM UTC: CI is running for this pull request on a draft pull request (#3367) due to your merge queue CI optimization settings.
Nov 5, 2:29 AM UTC: Merged by the Graphite merge queue via draft PR: #3367.

MasterPtato requested a review from NathanFlurry November 3, 2025 23:27

MasterPtato mentioned this pull request Nov 3, 2025

chore: misc bug fixes, add logs for outbound req #3332

Closed

vercel bot had a problem deploying to Preview – rivetkit-serverless November 3, 2025 23:28 Failure

NathanFlurry approved these changes Nov 4, 2025

View reviewed changes

MasterPtato force-pushed the 11-03-chore_misc_bug_fixes_add_logs_for_outbound_req branch from 564f956 to 58f931d Compare November 4, 2025 21:28

MasterPtato force-pushed the 11-03-chore_change_log_levels branch from 64217ec to 9c1bfa7 Compare November 4, 2025 21:28

MasterPtato requested a review from NathanFlurry November 4, 2025 21:28

MasterPtato mentioned this pull request Nov 4, 2025

fix: fix all dashboards #3358

Closed

vercel bot deployed to Preview – rivetkit-serverless November 4, 2025 21:29 View deployment

MasterPtato mentioned this pull request Nov 4, 2025

feat(pb): add reschedule_ts to actors #3363

Closed

NathanFlurry force-pushed the 11-03-chore_change_log_levels branch from 9c1bfa7 to e78aa83 Compare November 5, 2025 01:54

NathanFlurry force-pushed the 11-03-chore_misc_bug_fixes_add_logs_for_outbound_req branch from 58f931d to 867707c Compare November 5, 2025 01:54

vercel bot deployed to Preview – rivetkit-serverless November 5, 2025 01:55 View deployment

chore: change log levels

f41d8a5

NathanFlurry force-pushed the 11-03-chore_change_log_levels branch from e78aa83 to f41d8a5 Compare November 5, 2025 02:07

NathanFlurry force-pushed the 11-03-chore_misc_bug_fixes_add_logs_for_outbound_req branch from 867707c to f73435d Compare November 5, 2025 02:07

vercel bot deployed to Preview – rivetkit-serverless November 5, 2025 02:08 View deployment

graphite-app bot pushed a commit that referenced this pull request Nov 5, 2025

chore: change log levels (#3333)

f0f4d2b

claude bot mentioned this pull request Nov 5, 2025

[Graphite MQ] Draft PR GROUP:spec_33b0f8 (PRs 3332, 3333, 3358) #3367

Closed

graphite-app bot closed this Nov 5, 2025

graphite-app bot deleted the 11-03-chore_change_log_levels branch November 5, 2025 02:29

This was referenced Nov 5, 2025

fix(next-js): auto shut down runners when source code updates #3369

Merged

chore(pegboard): allow configuring reschedule_backoff_max_exponent #3370

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

chore: change log levels #3333

chore: change log levels #3333

Uh oh!

MasterPtato commented Nov 3, 2025

Uh oh!

vercel bot commented Nov 3, 2025 •

edited

Loading

Uh oh!

MasterPtato commented Nov 3, 2025 •

edited by NathanFlurry

Loading

Uh oh!

claude bot commented Nov 3, 2025

Uh oh!

claude bot commented Nov 4, 2025

Uh oh!

claude bot commented Nov 4, 2025

Uh oh!

claude bot commented Nov 5, 2025

Uh oh!

claude bot commented Nov 5, 2025

Uh oh!

claude bot commented Nov 5, 2025

Uh oh!

claude bot commented Nov 5, 2025

Uh oh!

graphite-app bot commented Nov 5, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

chore: change log levels #3333

chore: change log levels #3333

Uh oh!

Conversation

MasterPtato commented Nov 3, 2025

Uh oh!

vercel bot commented Nov 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

MasterPtato commented Nov 3, 2025 • edited by NathanFlurry Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

How to use the Graphite Merge Queue

Uh oh!

claude bot commented Nov 3, 2025

Code Review

✅ Positive Aspects

🔍 Areas of Concern

1. Critical Event Visibility ⚠️

2. Incomplete Context in Service Manager 📝

3. Warning Level Changes ⚠️

4. Sleep Duration Warning Threshold

📊 Performance Considerations

🔒 Security Concerns

🧪 Test Coverage

📋 Suggested Changes

📚 Reference

Summary

Uh oh!

claude bot commented Nov 4, 2025

PR Review: Log Level Adjustments

Summary

Code Quality & Best Practices ✅

Specific Changes Worth Noting

Potential Issues 🔍

Performance Considerations ✅

Security Concerns ✅

Test Coverage

Recommendations

Conclusion

Uh oh!

claude bot commented Nov 4, 2025

Code Review

Overview

Positive Aspects

Concerns and Suggestions

1. Sleep Duration Warning Threshold Change (packages/gasoline/src/ctx/workflow.rs:1054)

2. Message Publishing Retry Logging (packages/gasoline/src/ctx/message.rs:165)

3. Removed Service Start/Stop Logs (packages/cache-purge/src/lib.rs)

4. Sampler Ratio Updates (packages/metrics/src/providers.rs:69)

5. Log Filter Reload (packages/runtime/src/traces.rs:155)

Testing Recommendations

Minor Issues

Summary

Questions for the Author

Uh oh!

claude bot commented Nov 5, 2025

PR Review: Log Level Changes

Summary

Positive Observations

Issues & Concerns

1. Critical: Potential Loss of Important Operational Visibility

2. Inconsistency: Selective Downgrades

3. Code Style: Adheres to CLAUDE.md Guidelines ✅

4. Specific Concern: Sleep Duration Warning Threshold

5. Removed Logs Without Replacement

Recommendations

Testing Coverage

Security Concerns

Performance Impact

Verdict

Uh oh!

claude bot commented Nov 5, 2025

PR Review: Log Level Changes

✅ Positive Aspects

📊 Key Changes Summary

🔍 Observations & Suggestions

1. Sleep Duration Warning Threshold Change (gasoline/src/ctx/workflow.rs:1054)

2. Critical Path Logging (epoxy/src/http_client.rs)

3. Removed Service Lifecycle Logs (cache-purge/src/lib.rs)

4. Service Manager Warning (gasoline/src/worker.rs:155-158)

vercel bot commented Nov 3, 2025 •

edited

Loading

MasterPtato commented Nov 3, 2025 •

edited by NathanFlurry

Loading

1. Sleep Duration Warning Threshold Change (`packages/gasoline/src/ctx/workflow.rs:1054`)

2. Message Publishing Retry Logging (`packages/gasoline/src/ctx/message.rs:165`)

3. Removed Service Start/Stop Logs (`packages/cache-purge/src/lib.rs`)

4. Sampler Ratio Updates (`packages/metrics/src/providers.rs:69`)

5. Log Filter Reload (`packages/runtime/src/traces.rs:155`)

graphite-app bot commented Nov 5, 2025 •

edited

Loading