-
Notifications
You must be signed in to change notification settings - Fork 57
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Is it possible to Combined Response Model Structure Output and Function Calling? #709
Comments
I'm not quite sure I fully understand the desired behavior. Can you elaborate further on what you expect for the downstream usage of such a function? What does the |
The idea is I'd like to impose a specific structure on the model's responses. For instance, I want it to generate a brainstorming phase or a step-by-step reasoning process before providing a final answer. However, when using the response model, I'm unable to leverage the tool_call functionality For example, I want to archive something like this: from mirascope.core import (
openai,
BaseTool,
prompt_template,
Messages,
BaseMessageParam,
BaseDynamicConfig,
litellm,
)
from pydantic import BaseModel, Field
from typing import cast
class ReAct(BaseModel):
thought: str
action: str
have_final_answer: bool
class GetBookAuthor(BaseTool):
"""Returns the author of the book with the given title."""
title: str = Field(..., description="The title of the book.")
def call(self) -> str:
if self.title == "The Name of the Wind":
return "Patrick Rothfuss"
elif self.title == "Mistborn: The Final Empire":
return "Brandon Sanderson"
else:
return "Unknown"
class AuthorProfile(BaseTool):
"""Returns the profile of the author with the given name."""
name: str = Field(..., description="The name of the author.")
def call(self) -> str:
return f"Author {self.name} has written many books. He was born in 1977."
@litellm.call(
"gemini/gemini-1.5-flash-002",
response_model=ReAct,
tools=[GetBookAuthor, AuthorProfile],
json_mode=True,
)
@prompt_template(
"""
SYSTEM: You are a helpful assistant that can answer questions about books.
Provide a thought based on the observation.
Determine the most optimal action to take.
If you have the final answer in this step, set have_final_answer to True.
Available tools:
- GetBookAuthor: Returns the author of the book with the given title.
- AuthorProfile: Returns the profile of the author with the given name.
MESSAGES: {history}
"""
)
def reasoning_call(history: list[BaseMessageParam]) -> BaseDynamicConfig:
return {"computed_fields": {"history": history}}
@litellm.call(
"gemini/gemini-1.5-flash-002",
)
@prompt_template(
"""
SYSTEM: You are a helpful assistant that can answer questions about books.
MESSAGES: {history}
"""
)
def final_call(history: list[BaseMessageParam]): ...
text = "Can you tell me more about the author of the book 'The Name of the Wind'?"
history = []
history.append(Messages.User(content=text))
while True:
ai_response = reasoning_call(history)
history.append(Messages.Assistant(content=str(ai_response)))
print(ai_response)
if tool := ai_response.tool:
tools_and_outputs = [(tool, tool.call())]
history += ai_response.tool_message_params(tools_and_outputs)
print(tools_and_outputs)
if ai_response.have_final_answer:
break
# final call
print(final_call(history))
But get error like this: uv run .\test.py
thought="To find the author of 'The Name of the Wind', I need to use the available tool 'GetBookAuthor'." action="Use the GetBookAuthor tool with the book title 'The Name of the Wind'" have_final_answer=False
Traceback (most recent call last):
File "D:\Work\personal\personal-agent\test.py", line 82, in <module>
if tool := ai_response.tool:
^^^^^^^^^^^^^^^^
File "D:\Work\personal\personal-agent\.venv\Lib\site-packages\pydantic\main.py", line 856, in __getattr__
raise AttributeError(f'{type(self).__name__!r} object has no attribute {item!r}')
AttributeError: 'ReAct' object has no attribute 'tool' |
When you set It's also worth noting that we use tools under the hood (unless IIUC, it seems like what you really want is a structured output that is powered by tool calls? Do you care to be calling the tools yourself? What if we provided a second-layer agent decorator that called the tools under the hood and structured the final response for you? |
What if we provided a second-layer agent decorator that called the tools under the hood and structured the final response for you? Yes, I currently implement follow this solution. But the problem is I need to make two call in one reasoning call, and one action call to execute the tool. The cost will be double. I'm not even sure how we would be able to use both at the same time. The only thing that comes to mind would be to return type ReAct | BaseCallResponse in this case, but I'm not sure that interface makes sense (and I'm honestly not sure it's possible to implement). It okay, this is just a question to see if possible to improve further. Not a feature request. Really thank you for take time to support. |
Of course! We're always happy to help and discuss ways to improve the library. We want Mirascope to be the best version of itself it can be.
I'm curious what you mean by the cost being double. When calling tools, you'll have to iteratively call the LLM with the updated history (including the tool calls) until the agent returns it's final response. Are you saying it would be double because you would then have to take the final response to make another call to structure the final response? One solution here would be to provide another tool (e.g. Maybe we could do this as part of the Of course, this would require that we do some additional prompt engineering under the hood to make this work. We would also need to update all of the tool properties to be cached so that the internal tool construction is not duplicated when the user accesses the properties. Might be worth caching those properties anyway. |
Sorry, I may have misunderstood your previous response in some instances. It would be helpful to provide a simple code snippet for illustration your idea (FinalResponse, and second layer decorator). Currently, I am implementing the ReAct (Reasoning + Acting) engine as follows. I would appreciate any advice to enhance this flow: ReAct Engine Implementation Overview🎯 PurposeThe ReAct (Reasoning + Acting) engine implements a decision-making loop that allows an AI agent to:
🔄 Core Flowgraph TD
A[Start] --> B[Reasoning Step]
B --> C[Action Step]
C --> D{Check Conditions}
D -->|Continue| B
D -->|Complete/Stuck| E[End]
💡 Key Components1. Reasoning Action Modelclass ReasoningAction(BaseModel):
thought: str # Analysis and observations
action: str # Next action to take
goal_completed: bool # Task completion status
# ... other control flags
2. Main Loop Structureasync def run(self, agent):
while not_finished and within_max_attempts:
# 1. Reasoning Phase
tools_names = [tool.name for tool in agent.available_tools]
reasoning_response: ReasoningAction = await self._reasoning_step(
messages=agent.messages,
tools_names=tools_names # tool names to infect to the prompt
)
# 2. Action Phase
result = await self._action_step(agent, reasoning_response)
# action call to LLM, to execute the action, use the tool and append the result to the history.
# example code in action step:
# msg = f"system-automessage@noreply: INFO: executing action: {reasoning_response.action}"
# agent.history.append(Messages.User(msg))
# result["break"] = reasoning_response.goal_completed or other flag
# 3. Break if conditions met
if result["break"]:
break 📝 Example Output Format
|
You can see, each question, each turn the agent execute two call:
that make the cost double |
Ah, I think I understand the issue. The ReAct agent flow is somewhat outdated by the introduction of tools (function calling) in LLMs. ReAct was published before tools existed, so the idea was that you would have the LLM reason and provide and action, take the action on the LLM's behalf, and then give the LLM the action's output and have it continue in following steps. With the introduction of tools, you can implement a more modern ReAct agent flow simply by calling tools until the agent no longer calls the tools. The LLM calling a "tool" is the equivalent of the reasoning + action step, and taking the action is equivalent to calling the tool on the LLM's behalf and providing the output as part of the message array. I recommend reading through our agent docs that cover this more modern "react" agent flow using tools. If I've misunderstood, let me know! |
Oh i see, thank you, maybe i make things complicated. But it still interesting to see that response model can use in the same time as tools. Because, we can parallel tool calls, so maybe it is possible, to use them at the same time. Or with structure output, json mode perhaps. This is just an idea and not a feature request or anything, please do not take this too seriously. I don't have a deep understanding of how LLMs generate tokens for tool calls. However, this could lead to advanced prompt engineering use cases for agents that utilize structured output and tool calling. Thanks again; I will try the simple ReAct agent flow. |
Sounds good. I'm going to continue thinking on this to see if there is a way to enable setting both I'm going to leave this open as I continue to think on this. We always appreciate the questions, so keep them coming! |
Question
Can we use the response model structure output and function calling at the same times? This would enhance the flexibility and capability of the API by allowing structured outputs alongside the execution of specific functions.
some thing like this:
Motivation:
Alternatives:
Currently, users may need to implement separate logic to handle function calls and response structuring, which can lead to increased complexity in code. A built-in mechanism to support this feature would simplify the development process.
The text was updated successfully, but these errors were encountered: