You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I'd appreciate your help with designing a production human-in-the-loop system. I don't think it's covered by the existing documentation.
My use-case is pretty generic:
a crew is doing a mission-critical work, so it must take input from a human while executing the task.
If a human says that's no good, the crew must re-do the current and potentially previous tasks, considering the human feedback.
This iterative process shall repeat until the human approves the output of the task.
I know people suggested implementing the "ask human tool" and a dedicated agent that can use this tool to get the human input.
However, this is not sufficient once we consider how the crew is deployed: the crew is running on a backend in a background task, with the frontend connected through a web socket. It can also be running within the HTTP request processing context, makes little difference.
Once the crew decides to ask for the human input, it must:
save the current state/context to a persistent storage like a database, so it can continue in case the crew dies before the user can provide the input
yield control to the calling process so it can send the user prompt over the web socket
wait for the user to provide input without timing out or dying
restore the crew context if needed
continue the crew based on the human input.
So this setup raises some questions:
How do I make sure the "ask human" tool is called every time the task produces a result?
How do I make sure the tasks are re-run if the "ask human" tool requests changes, as many times as needed?
How do I make the "ask human" tool to pause the crew and yield control to the calling process?
How do I save and restore the context of a crew?
I see that the human_input=True flag of a Task can solve the first two questions, but looks like it's limited to the stdin inputs.
Alternatively, I could abuse(?) the task guardrail mechanism to as user confirmation after the task is finished and return validation error if the user requests changes. However, I will need to do an additional LLM call to know what the user said.
Memory looks promising for saving state, but won't do the trick.
The agents have two types of memory that save interactions:
Short-term memory. Saves agent's outputs in ChromaDB vector store, which is separate for each agent.
Long-term memory. Saves the evaluation (0-10) of the agent's output (not the output itself) to a SQLite DB, one DB for all agents.
Neither type of memory seems to save actual messages that led to the agent's output, that's a bit confusing to me since the agent may lose some useful details provided by the user earlier that are not detected as entities.
In conclusion, neither short-term nor long-term memory can help save the full context of all agents when the crew is interrupted by the human input.
There is a bigger problem with data isolation: memories from interactions with different users will be shared across all users through the common databases used by the default storage classes. I think this is solved by using some other storage for memories.
Describe alternatives you've considered
Currently, the way I see it can be implemented is:
Let the "ask human" tool just block the whole crew until the user provides input. Not ideal since now I need to run crew in a separate process, so when the "ask human" tool blocks, I can still use the socket. Async execution should help here.
Provide the "ask human" with two queues, so it can send the user prompt in one and wait for the user's answer in another.
In the web socket process, read/write on these queues and hope the crew process doesn't die meanwhile, losing the context.
Probably use a Flow with the first crew talking to the human and producing results, then Python code verifying the human input was taken into account, then another crew acting on the verified results of the first crew.
The quick&dirty solution is just to patch the CrewAgentExecutorMixin._ask_human_input on the CrewAgentExecutor class in the agent.py module:
fromfunctoolsimportpartialimportcrewaifromcrewai.agents.crew_agent_executorimportCrewAgentExecutordefreplace_class(new_class=None, *, class_module, class_name):
ifnew_classisNone:
returnpartial(replace_class, class_module=class_module, class_name=class_name)
original_class=class_module.__dict__[class_name]
class_module.__dict__[class_name] =new_classassertoriginal_classinnew_class.__bases__new_class.replaces_class=original_classreturnnew_class@replace_class(class_module=crewai.agent, class_name="CrewAgentExecutor")classCustomCrewAgentExecutor(CrewAgentExecutor):
is_custom=Truedef_ask_human_input(self, final_answer: str) ->str:
# send final_answer to the frontend somehow and wait for the user feedbackuser_feedback= ...
returnuser_feedback
Obviously, it ignores saving/restoring the state for all agents.
Additional context
No response
Willingness to Contribute
Yes, I'd be happy to submit a pull request
The text was updated successfully, but these errors were encountered:
Feature Area
Core functionality
Is your feature request related to a an existing bug? Please link it here.
Human in the loop with CrewAI: #258 (closed)
Related forum
Describe the solution you'd like
I'd appreciate your help with designing a production human-in-the-loop system. I don't think it's covered by the existing documentation.
My use-case is pretty generic:
I know people suggested implementing the "ask human tool" and a dedicated agent that can use this tool to get the human input.
However, this is not sufficient once we consider how the crew is deployed: the crew is running on a backend in a background task, with the frontend connected through a web socket. It can also be running within the HTTP request processing context, makes little difference.
Once the crew decides to ask for the human input, it must:
So this setup raises some questions:
I see that the
human_input=True
flag of a Task can solve the first two questions, but looks like it's limited to the stdin inputs.Alternatively, I could abuse(?) the task guardrail mechanism to as user confirmation after the task is finished and return validation error if the user requests changes. However, I will need to do an additional LLM call to know what the user said.
Memory looks promising for saving state, but won't do the trick.
The agents have two types of memory that save interactions:
Neither type of memory seems to save actual messages that led to the agent's output, that's a bit confusing to me since the agent may lose some useful details provided by the user earlier that are not detected as entities.
In conclusion, neither short-term nor long-term memory can help save the full context of all agents when the crew is interrupted by the human input.
There is a bigger problem with data isolation: memories from interactions with different users will be shared across all users through the common databases used by the default storage classes. I think this is solved by using some other storage for memories.
Describe alternatives you've considered
Currently, the way I see it can be implemented is:
The quick&dirty solution is just to patch the
CrewAgentExecutorMixin._ask_human_input
on theCrewAgentExecutor
class in the agent.py module:Obviously, it ignores saving/restoring the state for all agents.
Additional context
No response
Willingness to Contribute
Yes, I'd be happy to submit a pull request
The text was updated successfully, but these errors were encountered: