-
Notifications
You must be signed in to change notification settings - Fork 19.6k
fix(langchain): ensure HITL middleware edit decisions persist in agent state #33789
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
fix(langchain): ensure HITL middleware edit decisions persist in agent state #33789
Conversation
1c13e07 to
5bd39be
Compare
5bd39be to
5a2c343
Compare
Fix issues langchain-ai#33787 and langchain-ai#33784 where Human-in-the-Loop middleware edits were not persisting correctly in the agent's message history. The problem occurred because the middleware was directly mutating the AIMessage.tool_calls attribute, but LangGraph's state management doesn't properly persist direct object mutations. This caused the agent to see the original (unedited) tool calls in subsequent model invocations, leading to duplicate or incorrect tool executions. Changes: - Create new AIMessage instance instead of mutating the original - Ensure message has an ID (generate UUID if needed) so add_messages reducer properly replaces instead of appending - Add comprehensive test case that reproduces and verifies the fix
5a2c343 to
69d4f40
Compare
|
@sydney-runkle Hi! I've investigated the CI failure and found: The failing test is unrelated to my PR:
The test only fails on Python 3.12:
All my HITL tests pass:
This appears to be a Python 3.12-specific issue in |
e50a698 to
69d4f40
Compare
|
Hi @sydney-runkle, I've reverted the master merge that was causing CI failures. Here's what happened: Timeline:
Analysis:
Resolution:
The PR is ready for review. I can merge master again after the core test issue is resolved. |
Enhances the fix for issues langchain-ai#33787 and langchain-ai#33784 by adding a HumanMessage that informs the AI when a tool call has been edited by a human operator. This ensures that the AI's subsequent responses reference the edited parameters rather than the original request parameters. Changes: - Modified _process_decision to create a HumanMessage on edit - The message informs the AI about the edited tool call arguments - Uses HumanMessage instead of ToolMessage to avoid interfering with actual tool execution - Updated all affected tests to expect the context message - All 70 middleware agent tests pass This complements the previous fix that ensured tool calls persist correctly in state by also providing context to the AI about the edit.
- Updated _process_decision return type to allow HumanMessage - Updated artificial_tool_messages list type annotation - Removed unused BaseMessage import
This commit adds a before_model hook to inject a reminder message after tool execution for edited tool calls. This ensures the AI's final response references the edited parameters rather than the original user request. The fix addresses issue langchain-ai#33787 where the AI would generate a final response referencing the original parameters despite the tool being executed with edited parameters. Now a [System Reminder] message is injected after tool execution to provide context about the edited parameters. Changes: - Added _pending_edit_contexts dict to track edited tool calls - Added before_model hook to inject post-execution reminder messages - Updated test to expect two context messages (pre and post execution) - Added type guard for tool_call_id to satisfy mypy Fixes langchain-ai#33787
CI Failure AnalysisThe failing test in is unrelated to this PR's changes. Details:
Root Cause:This is a flaky timing-sensitive test: The test expects producer/consumer to run in parallel with minimal delay, but CI machine load caused 90ms delay, exceeding the 20ms tolerance. Evidence This is Unrelated:
Request: Could a maintainer please re-run the failed CI job? This appears to be a transient infrastructure issue. |
Enhances the fix for issue langchain-ai#33787 by improving context messages that inform LLMs about edited tool calls. This helps prevent LLMs from attempting to re-execute tools after they've already completed with edited parameters. ## Problem After implementing the state persistence fix for langchain-ai#33787, tool calls are correctly persisted with edited parameters and context messages are injected. However, some LLMs (e.g., llama-3.3-70b, gpt-3.5-turbo) may still attempt to re-execute the original tool call, trying to be "helpful" by fulfilling the user's original request even though the task is already complete. ## Solution Strengthen the post-execution reminder message with more explicit language: - Replace "[System Reminder]" with "[IMPORTANT - DO NOT IGNORE]" - Add "ALREADY BEEN EXECUTED SUCCESSFULLY" emphasis - Include explicit "DO NOT execute this tool again" instruction - Emphasize "The task is COMPLETE" This makes the context messages more effective at guiding LLM behavior without requiring changes to the framework's architecture. ## Changes 1. **human_in_the_loop.py** - Strengthen post-execution reminder message language - Extract args_json to avoid long lines - Use more directive language to prevent tool re-execution 2. **test_middleware_agent.py** - Update test expectations for stronger message format - Verify "ALREADY BEEN EXECUTED" language is present - All 16 HITL tests pass ## Testing - ✅ All 16 HITL middleware tests pass - ✅ Lint checks pass (ruff, mypy) - ✅ Verified with real LLM (GROQ llama-3.3-70b-versatile) ## Documentation Best practices guide created for using HITL middleware with appropriate system prompts. See /tmp/HITL_BEST_PRACTICES.md for recommendations on system prompt configuration to ensure optimal LLM behavior. ## Design Decision This change keeps the fix localized to the middleware layer rather than modifying the `create_agent` factory. This approach: - Maintains separation of concerns (middleware manages its own messages) - Avoids tight coupling between factory and specific middleware - Keeps the architecture clean and extensible - Users control LLM behavior via system prompts (as documented) Fixes langchain-ai#33787 (enhancement to state persistence fix)
Latest Update: Message Enhancement (Commit cba1fa2)What Changed: Key Improvements:
Testing: Architecture Decision: Why This Approach?
Code changes: Only +10 -5 lines in 2 files, but significantly improves LLM behavior. |
Human-in-the-Loop Middleware Best PracticesOverviewWhen using ProblemAfter a human edits a tool call and it executes successfully:
This happens because LLMs try to be "helpful" and fulfill the user's original request, even though the task is already complete. SolutionProvide clear system instructions that tell the LLM:
Recommended System PromptWhen creating an agent with HITL middleware: from langchain.agents import create_agent
from langchain.agents.middleware import HumanInTheLoopMiddleware
agent = create_agent(
model="groq/llama-3.3-70b-versatile", # or any model
tools=[your_tools],
middleware=[
HumanInTheLoopMiddleware(
interrupt_on={
"dangerous_tool": {
"allowed_decisions": ["approve", "edit", "reject"]
}
}
)
],
system_prompt="""You are a helpful assistant.
IMPORTANT INSTRUCTIONS FOR TOOL EXECUTION:
1. When you see a ToolMessage (tool execution result), that tool has ALREADY been executed. Do NOT execute it again.
2. If you see a message about edited parameters (e.g., "[IMPORTANT - DO NOT IGNORE]"), you MUST reference those edited parameters in your response, NOT the user's original request.
3. After a tool execution completes successfully, provide a summary of what was accomplished and STOP. Do not re-attempt the same tool call.
4. The presence of a ToolMessage means the action is COMPLETE - your job is to report the result, not to repeat the action.
""",
)ExampleWithout System Instructions (❌ May Fail)# Agent without proper system prompt
agent = create_agent(
model="groq/llama-3.3-70b-versatile",
tools=[send_email],
middleware=[HumanInTheLoopMiddleware(...)],
# No system_prompt - LLM may misbehave
)
# User: "Send email to [email protected]"
# Agent proposes: send_email(to="[email protected]", ...)
# Human edits to: send_email(to="[email protected]", ...)
# Tool executes: ✅ Email sent to [email protected]
# Agent may then try: send_email(to="[email protected]", ...) again ❌With System Instructions (✅ Works Correctly)# Agent with proper system prompt
agent = create_agent(
model="groq/llama-3.3-70b-versatile",
tools=[send_email],
middleware=[HumanInTheLoopMiddleware(...)],
system_prompt="""You are a helpful assistant.
IMPORTANT: When you see a ToolMessage, the tool has already been executed.
Do not re-execute it. Report the result and stop.""",
)
# User: "Send email to [email protected]"
# Agent proposes: send_email(to="[email protected]", ...)
# Human edits to: send_email(to="[email protected]", ...)
# Tool executes: ✅ Email sent to [email protected]
# Agent responds: ✅ "Email successfully sent to [email protected]" (references edited params)Model-Specific ConsiderationsSome models are more prone to re-execution than others: More Sensitive Models
Less Sensitive Models
Recommendation: Always include system instructions regardless of model, as a defensive practice. Minimal System PromptIf you prefer brevity, this minimal version also works: system_prompt="""You are a helpful assistant.
When you see a ToolMessage, that tool has already been executed.
Do not execute it again. Report the result."""Testing Your SetupTo verify your system prompt works correctly:
Expected behavior:
Technical DetailsThe HITL middleware injects two context messages when a tool is edited:
The system prompt helps the LLM understand and follow these context messages. Related Issues
Summary✅ DO: Provide clear system instructions about tool execution ❌ DON'T: Rely solely on context messages without system instructions The HITL middleware handles state management correctly. System prompts are needed to guide LLM behavior. |
## Problem The previous implementation violated OpenAI's strict message ordering rule: AIMessage with tool_calls MUST be immediately followed by ToolMessage. User lesong36 reported in issue langchain-ai#33787: > BadRequestError: An assistant message with 'tool_calls' must be followed > by tool messages responding to each 'tool_call_id' This happened because we inserted a HumanMessage between AIMessage and ToolMessage: 1. AIMessage (with tool_calls) 2. HumanMessage ("[System Note] edited...") ❌ Breaks OpenAI rule! 3. ToolMessage (execution result) ## Solution Embed pre-execution edit context directly in AIMessage.content instead of creating a separate HumanMessage: 1. AIMessage (with tool_calls and edit info in content) ✅ 2. ToolMessage (immediately follows) ✅ 3. HumanMessage (post-execution reminder, after tool completes) ✅ ### Changes **Core middleware (`human_in_the_loop.py`):** - Added `_build_updated_content()` helper method to embed edit information - Modified `_process_decision()` to return None instead of HumanMessage for edits - Updated `after_model()` to embed edit context in AIMessage.content **Tests (`test_middleware_agent.py`):** - Updated 6 tests to expect embedded edit context in AIMessage.content - Changed assertions to verify only one HumanMessage (post-execution) - Verified pre-execution context is in AIMessage.content ## Testing ✅ All 16 HITL middleware tests pass ✅ Lint checks pass (ruff, mypy) ✅ Message ordering complies with OpenAI API requirements ## Impact - Fixes OpenAI API compatibility issue reported in langchain-ai#33787 - Maintains functionality with all LLM providers - Backward compatible (no breaking changes to public API) Fixes langchain-ai#33787
🚨 Critical Fix: OpenAI Message Ordering ViolationProblem IdentifiedThank you @lesong36 for reporting the OpenAI API error! You were absolutely correct - the previous implementation violated OpenAI's strict message ordering rule. The Issue: Root Cause: Solution Implemented (Commit
|
…ation ## Improvements ### 1. Enhanced System Notification Format - Added clear visual separators (60 "=" characters) - More explicit header: "[SYSTEM NOTIFICATION - NOT AI RESPONSE]" - Direct instruction to avoid attribution: "Do not attribute to AI" - Warning emoji and clear guidance: "⚠️ IMPORTANT: Do not reference..." - This significantly reduces the risk of semantic confusion ### 2. Comprehensive Design Documentation - Added detailed "Design Note" in class docstring explaining: - Why edit notifications are embedded (OpenAI compatibility) - How semantic confusion is minimized - Recommendation to use get_recommended_system_prompt() - Future enhancement direction (provider-specific adapters) ### 3. New Helper Function: get_recommended_system_prompt() - Static method to generate provider-specific system prompts - Supports: openai, anthropic, groq, google - Provides clear instructions to avoid: - Referencing system notifications as AI's own words - Re-executing already completed tools - Includes examples of correct and incorrect responses ## Benefits ✅ Reduces semantic confusion risk (AI mistaking system notes as its own) ✅ Provides clear guidance to users via helper function ✅ Documents design trade-offs transparently ✅ Maintains OpenAI API compatibility ✅ Preserves backward compatibility (no breaking changes) ## Testing ✅ All 16 HITL middleware tests pass ✅ Lint checks pass (ruff, mypy) ✅ Code formatted correctly ## Architecture Philosophy This refactor embodies the "improved current approach" recommended by top-level architecture experts: balancing OpenAI API compatibility with semantic clarity through enhanced formatting and comprehensive documentation, while keeping the door open for future provider-specific adapters. Related: langchain-ai#33789
Description
Fixes critical bugs in the Human-in-the-Loop middleware where edited tool calls weren't properly handled, leading to the agent re-executing tools or referencing original parameters instead of edited ones.
The Problem
When a user edited a tool call via HITL middleware:
AIMessage.tool_callswould be updated in stateExample:
Root Causes & Progressive Fixes
This PR contains five progressive fixes addressing different layers of the problem:
Fix 1: Data Persistence (Commit
69d4f40)Problem: Direct mutations to
AIMessage.tool_callsweren't persisting in LangGraph's state.Solution: Create a new
AIMessageinstance instead of mutating:Fix 2: Pre-execution Context (Commit
ce9892f)Problem: Even with persisted edits, the AI didn't know the tool call had been edited.
Solution: Add a
[System Note]message before tool execution:Fix 3: Post-execution Reminder (Commit
4d4039f)Problem: The pre-execution context was too far from the AI's final response generation.
Solution: Add a
before_model()hook that injects a reminder immediately before the AI generates its final response:Fix 4: Strengthen Message Language (Commit
cba1fa2)Problem: Some LLMs (e.g., llama-3.3-70b, gpt-3.5-turbo) would still attempt to re-execute tools despite the context messages. They tried to be "helpful" by fulfilling the user's original request even though the task was already complete.
Solution: Use more explicit, directive language in the post-execution reminder:
Before:
"[System Reminder] The tool was executed with edited parameters..."After:
Fix 5: OpenAI Message Ordering Compliance (Commit
68549cf) ← Critical FixProblem: The implementation violated OpenAI's strict message ordering rule:
AIMessagewithtool_callsMUST be immediately followed byToolMessage. User @lesong36 reported:Our previous Fix 2 created this invalid sequence:
Solution: Embed pre-execution edit context directly in
AIMessage.contentinstead of creating a separateHumanMessage:Design rationale: This fix maintains OpenAI API compatibility while preserving functionality across all LLM providers. Added
_build_updated_content()helper method for clean separation of concerns.Message Flow After All Fixes
Changes
Core middleware changes:
_build_updated_content(): Helper method to embed edit information in AIMessage.content_pending_edit_contexts: Dictionary to track edited tool calls across middleware hooksbefore_model()hook: Injects post-execution reminder messagesafter_model(): Embeds edit context in AIMessage.content (OpenAI compliance)_process_decision(): ReturnsNoneinstead ofHumanMessagefor editsTest updates:
Testing
✅ All 16 HITL middleware tests pass
✅ Key test
test_human_in_the_loop_middleware_edit_actually_executes_with_edited_argsvalidates:✅ Linting passes (ruff, mypy)
✅ Type checking passes
✅ Verified with real LLM (GROQ llama-3.3-70b-versatile)
✅ OpenAI API compatible (message ordering complies with requirements)
Best Practices
For optimal results with HITL middleware, users should provide appropriate system prompts:
See the PR discussion for a complete best practices guide.
Architecture
This solution maintains clean architecture by:
Issue
Fixes #33787
Fixes #33784
Dependencies
No new dependencies added.
Summary: Five progressive fixes that together solve the HITL edit persistence problem at multiple layers - from state management to LLM behavior guidance to OpenAI API compliance.