Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

finish_reason not set in AzureOpenAIChatCompletionClient.create_stream #4213

Closed
MohMaz opened this issue Nov 15, 2024 · 5 comments · Fixed by #4311
Closed

finish_reason not set in AzureOpenAIChatCompletionClient.create_stream #4213

MohMaz opened this issue Nov 15, 2024 · 5 comments · Fixed by #4311
Assignees
Milestone

Comments

@MohMaz
Copy link
Contributor

MohMaz commented Nov 15, 2024

What happened?

The provided code snippet works fine for the .create call of AzureOpenAIChatCompletionClient, but errors on .create_stream call:

Creating client with config: {'model': 'gpt-4o', 'azure_endpoint': 'https://xxxxxxxx.openai.azure.com', 'azure_deployment': 'gpt-4o', 'api_version': '2024-08-01-preview', 'model_capabilities': {'vision': False, 'function_calling': False, 'json_output': False}, 'azure_ad_token_provider': <function get_bearer_token_provider.<locals>.wrapper at 0x108205da0>}
-----> Print output of .create call
/Users/mohammadmazraeh/Projects/autogen/python/packages/autogen-core/samples/distributed-group-chat/test_aoi.py:26: UserWarning: Resolved model mismatch: gpt-4o-2024-08-06 != gpt-4o-2024-05-13. Model mapping may be incorrect.
  single_output = await client.create(messages=messages)
-----> CreateResult(finish_reason='stop', content='The autumn leaves painted the park in vibrant shades of red and gold.', usage=RequestUsage(prompt_tokens=17, completion_tokens=14), cached=False, logprobs=None) - * 50
-----> Print output of .create_stream call
Traceback (most recent call last):
  File "/Users/mohammadmazraeh/Projects/autogen/python/packages/autogen-core/samples/distributed-group-chat/test_aoi.py", line 34, in <module>
    asyncio.run(main())
  File "/usr/local/Cellar/[email protected]/3.11.10/Frameworks/Python.framework/Versions/3.11/lib/python3.11/asyncio/runners.py", line 190, in run
    return runner.run(main)
           ^^^^^^^^^^^^^^^^
  File "/usr/local/Cellar/[email protected]/3.11.10/Frameworks/Python.framework/Versions/3.11/lib/python3.11/asyncio/runners.py", line 118, in run
    return self._loop.run_until_complete(task)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/Cellar/[email protected]/3.11.10/Frameworks/Python.framework/Versions/3.11/lib/python3.11/asyncio/base_events.py", line 654, in run_until_complete
    return future.result()
           ^^^^^^^^^^^^^^^
  File "/Users/mohammadmazraeh/Projects/autogen/python/packages/autogen-core/samples/distributed-group-chat/test_aoi.py", line 31, in main
    async for chunk in stream_output:
  File "/Users/mohammadmazraeh/Projects/autogen/python/packages/autogen-ext/src/autogen_ext/models/_openai/_openai_client.py", line 662, in create_stream
    choice.finish_reason
AttributeError: 'NoneType' object has no attribute 'finish_reason'

What did you expect to happen?

I expected to get some reasonable response from .create_stream call as well.

How can we reproduce it (as minimally and precisely as possible)?

import asyncio

from autogen_core.components.models._types import UserMessage
from autogen_ext.models._openai._openai_client import AzureOpenAIChatCompletionClient
from azure.identity import DefaultAzureCredential, get_bearer_token_provider

if __name__ == "__main__":

    async def main():
        config = {
            "model": "gpt-4o",
            "azure_endpoint": "https://xxxxxxx.openai.azure.com",
            "azure_deployment": "gpt-4o",
            "api_version": "2024-08-01-preview",
            "model_capabilities": {"vision": False, "function_calling": False, "json_output": False},
            "azure_ad_token_provider": get_bearer_token_provider(
                DefaultAzureCredential(), "https://cognitiveservices.azure.com/.default"
            ),
        }

        print(f"Creating client with config: {config}")
        client = AzureOpenAIChatCompletionClient(**config)

        messages = [UserMessage(content="Generate one short sentence on some topic!", source="system")]
        print("-----> Print output of .create call")
        single_output = await client.create(messages=messages)
        print("----->", single_output, "- * 50")

        print("-----> Print output of .create_stream call")
        stream_output = client.create_stream(messages=messages)
        async for chunk in stream_output:
            print(chunk)

    asyncio.run(main())

AutoGen version

0.4.0.dev6

Which package was this bug in

Extensions

Model used

gpt-4o

Python version

3.11.10

Operating system

macOS Sequoia Version 15.1 (24B83)

Any additional info you think would be helpful for fixing this bug

No response

@ekzhu ekzhu added this to the 0.4.0 milestone Nov 15, 2024
@MohMaz MohMaz self-assigned this Nov 15, 2024
@ekzhu
Copy link
Collaborator

ekzhu commented Jan 31, 2025

@MohMaz @jackgerrits

I think the behavior should be just to set finished reason to unknown in this case. Under the hood simply skip the empty chunks.

Having to this parameter is too confusing and error prone.

@jackgerrits
Copy link
Member

If not more specific reason is known then unknown is the right thing to set it to. I agree on removing empty chunks, kinda feels like some sort of bug on their end and we should smooth it over in our client

@ekzhu
Copy link
Collaborator

ekzhu commented Jan 31, 2025

I have seen this happens whenever the endpoint is probably under a lot of loads. For example, now the hugging face interference API often gives this error.

Another thing for removing this from create_stream method is the agent that calls the model client can't know for sure which client type it is using, therefore it can't just pass the tolerance parameter in. This parameter is also currently not settable at the constructor.

To reduce complexity of the interface let's remove this all together and just handle empty chunks by skipping. @MohMaz

@MohMaz
Copy link
Contributor Author

MohMaz commented Jan 31, 2025

I agree with defaulting to an unknown finish reason if a better one is not available. I can create a PR to clean this up.

The only catch that might happen is, if the API keeps returning empty chunks, we will get stuck in a loop, which I think should not be considered as an AutoGen bug that needs a fix.
Please let me know and I can submit the PR for it.

@ekzhu
Copy link
Collaborator

ekzhu commented Jan 31, 2025

The only catch that might happen is, if the API keeps returning empty chunks, we will get stuck in a loop, which I think should not be considered as an AutoGen bug that needs a fix.

I think in this case we can throw a warning once we get, say, 10 empty chunks in a row. This will remind the users that there is something not right going on and need to address it separately from the library.

Please let me know and I can submit the PR for it.

Yes please thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants