-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bug] stream_chat()
does not trigger tool calling, though chat()
does
#11013
Comments
🤖 Hey @tslmy, great to see you diving into the intricacies of LlamaIndex again! Hope all is well on your side. 😊 Thank you for bringing this to our attention. The difference in behavior between In the On the other hand, in the Here is the relevant code: def chat(
self,
message: str,
chat_history: Optional[List[ChatMessage]] = None,
tool_choice: Union[str, dict] = "auto",
) -> AgentChatResponse:
chat_response = self._chat(
message, chat_history, tool_choice, mode=ChatResponseMode.WAIT
)
return chat_response
def stream_chat(
self,
message: str,
chat_history: Optional[List[ChatMessage]] = None,
tool_choice: Union[str, dict] = "auto",
) -> StreamingAgentChatResponse:
chat_response = self._chat(
message, chat_history, tool_choice, mode=ChatResponseMode.STREAM
)
return chat_response In the You can find this code in the llama-index-integrations/agent/llama-index-agent-openai-legacy/llama_index/agent/openai_legacy/openai_agent.py file. I hope this clarifies the difference between the two functions. If you have any further questions, feel free to ask. Sources
|
hmm, very sus, will take a look |
Hey @tslmy -- I tried with both OpenAI (which worked as expected) and Ollama Code below (for testing with OpenAI, I just commented out the line with Settings) from llama_index.core import Settings
from llama_index.core.agent import ReActAgent
from llama_index.core.callbacks import CallbackManager, LlamaDebugHandler
from llama_index.core.tools import FunctionTool
from llama_index.llms.ollama import Ollama
Settings.llm = Ollama(model="starling-lm", request_timeout=300)
def multiply(a: int, b: int) -> int:
"""Multiply two numbers."""
return a * b
def add(a: int, b: int) -> int:
"""Add two numbers."""
return a + b
all_tools = [
FunctionTool.from_defaults(fn=multiply),
FunctionTool.from_defaults(fn=add),
]
QUERY = "What is (1242 * (5 + 3))?"
print(">>>>>>>> With stream_chat:")
agent = ReActAgent.from_tools(
tools=all_tools,
verbose=True,
)
response = agent.stream_chat(QUERY)
print(f">>>>>>>> Response: ", end="", flush=True)
for token in response.response_gen:
print(token, end="", flush=True)
print()
agent = ReActAgent.from_tools(
tools=all_tools,
verbose=True,
)
print(">>>>>>>> With chat:")
response = agent.chat(QUERY)
print(f">>>>>>>> Response: {response.response}") You need to iterate over the stream_chat response, putting it right into |
@logan-markewich , I'm sorry, I still can't achieve consistent behaviors across I used the script you shared, with only the following changes for consistency's sake: Settings.llm = Ollama(
model="starling-lm",
request_timeout=300,
+ temperature=0.01,
+ seed=42,
+ additional_kwargs={"stop": ["Observation:"]},
) And this is what I got: Note that, with
wasn't extracted as a step. Could you share a screenshot of running this on your side, so that we can compare the color-coded console output? Also, could you check if you're using the same versions of dependencies as my https://github.com/tslmy/agent/blob/main/poetry.lock file? For Ollama, version is 0.1.25. |
Seems like in this case, the LLM hallucinate the function call and result? Will take a try. It might just be a difference in how tracebacks end up getting handled |
I'm having the same issue - agent returns its initial internal thought as response. I used the create-llama template with a FastAPI backend Relevant code snippet (inside /chat endpoint):
versions: |
@logan-markewich your code worked for me. i used a mistral 7b and here's the response:
i made a small change in the QUERY by removing the external parenthesis QUERY = "What is 1242 * (5 + 3)?" and it worked. i am new to this. so, can you clarify a few things about function calling. can we take any open source language model and use it to build agents for function calling purposes? i previously thought this was only possible with openai models as all the documentation on llama index agents point towards how to build or modify openai agents and i couldn't find any for open source models. |
I can also confirm that with a very simple RAG setup using Code to reproduce: from llama_index.core import VectorStoreIndex, SimpleDirectoryReader
from llama_index.core.node_parser import SentenceSplitter
from llama_index.embeddings.bedrock import BedrockEmbedding
from llama_index.llms.bedrock import Bedrock
llm = Bedrock(
temperature=0,
model='anthropic.claude-3-sonnet-20240229-v1:0',
region_name='us-east-1',
)
embed_model = BedrockEmbedding(
model_id="amazon.titan-embed-text-v2:0",
region_name="us-east-1"
)
reader = SimpleDirectoryReader(
input_dir="./data_bob",
recursive=True,
)
all_docs = []
for docs in reader.iter_data():
for doc in docs:
all_docs.append(doc)
splitter = SentenceSplitter(chunk_size=1024)
index = VectorStoreIndex.from_documents(
all_docs, transformations=[splitter], embed_model=embed_model
)
chat_engine = index.as_chat_engine(llm=llm)
question = "Who is bob?"
response = chat_engine.chat(question)
print("Chat response\n*******")
print(response)
stream_response = chat_engine.stream_chat(question)
print("\nStream chat response\n*******")
stream_response.print_response_stream() The Bob is a civil engineer whose expertise is in creating interesting projects for other people.
He has three children and a cat. The output of the code above: Chat response
*******
Based on the information provided, Bob is a civil engineer who specializes in creating interesting projects. He has a family with three children and also owns a pet cat.
Stream chat response
*******
I'm sorry, but I don't have enough context to determine exactly who "Bob" is referring to. Bob is a very common name, so without any additional details about the person, I cannot provide specifics about their identity, background, occupation, etc. If you could provide some more context about which Bob you are asking about, that would help me better understand and answer your question. |
@logan-markewich Hi Logan, did you see our new comments about this issue? Perhaps this can help pin-point the issue? |
@omrihar try with latest llamaindex, could have been an issue with some pydantic class under the hood consuming the first token of the stream 🤷🏻 But it works fine for me |
@logan-markewich I upgraded to the latest version of llama-index (0.10.43) but still not able to make it work. |
I cannot test bedrock. But using openai, anthropic, and ollama, it works fine |
@logan-markewich That's the point maybe. It's working for me too when I use openai, but not with Bedrock. Is there any other developer who can test and possibly debug the code using Bedrock? Unfortunately it's a requirement for the application I'm developing. |
EDIT: I can confirm that it's an issue with the model. I was using Mistral 7b, which gave me faulty results. Llama3 and gpt-3.5-turbo generate proper responses. Original: To bring some awareness, I'm also encountering this issue using a more or less stock version of the FastAPI template generated by Template-Code: https://github.com/run-llama/create-llama/blob/main/templates/types/streaming/fastapi/app/api/routers/chat.py#L41_L45 Calling
|
@sahilshaheen Hi, did you find solution for this issue?
|
@jp-kh-kim I found that in my case it was a problem with the system prompt. I opened a PR that got released in the latest version: #14814 You may want to see if the newest version fixes the issue for you? |
@garritfra Thanks ! Same here. The issue is resolved when I updated version ! Thx a lot :) |
stream_chat()
does not trigger tool calling, though chat()
doesstream_chat()
does not trigger tool calling, though chat()
does
Bug Description
I have a ReAct Agent (never tested it with an OpenAI Agent). I used to interact with it using
.chat()
. It was able to wield tools.Today, I felt curious and replaced it with
.stream_chat()
. It stopped calling functions / using tools. It didn't even seem to bother extract a thought-action-input triplet from the LLM generation.Version
0.10.7; 0.9.43
Steps to Reproduce
I wrote a minimal reproducing script here. It uses identical settings, same query, and fixed temperature & seed for the LLM. It runs
agent.stream_chat(QUERY)
first and thenagent.chat(QUERY)
, so you can compare the different behavior.When you run:
you'll see:
We can see that
stream_chat
didn't trigger the "parse a tool use" procedure, whilechat
did.Update: Also observed this with
OpenAILike
.Relevant Logs/Tracbacks
No response
The text was updated successfully, but these errors were encountered: