Support response metadata when streaming from OpenAI models #5322

MJDeligan · 2024-05-08T23:18:22Z

MJDeligan
May 8, 2024

Checked

I searched existing ideas and did not find a similar one
I added a very descriptive title
I've clearly described the feature request and motivation for it

Feature request

After not providing token usage statistics when streaming, OpenAI has finally implemented this feature and now returns usage statistics when streaming. Testing with the newest OpenAI npm-package, the information is available using the sdk. When using the ChatOpenAI integration in @langchain/openai, the response_metadata field always shows 0 for prompt and completion tokens. It would be nice, if we could get this information, since it is already nicely tallied up in the final completion chunk.

Motivation

Usage information is generally important for businesses building services based on LLMs. With streaming providing the best user experience it's an important feature to have and should support usage tracking.

Proposal (If applicable)

I've overwritten the ChatOpenAI._streamResponseChunks method with the following:

const choice = data?.choices[0];
            if (!choice) {
                if (data?.usage) {
                    yield new ChatGenerationChunk({
                      message: new AIMessageChunk(""),
                      text: "",
                      generationInfo: data.usage,
                  });
                }
                continue;
            }

and setting modelKwargs to include usage:

const chat = new ChatOpenAI({
    modelName: 'gpt-3.5-turbo',
    temperature: 0,
    streaming: true,
    modelKwargs: {
        stream_options: {
            include_usage: true,
        },
    }
})

The API is a little clunky, since the usage info is sent as the last chunk with no choices, but we need to yield a message chunk in order to pass the info to the BaseLanguageModel, which then outputs the on_llm_end event and compiles the response_metadata.

Maybe there is more a convenient way to this that does not involve emitting an empty message.

kangyeop · 2024-05-30T07:11:59Z

kangyeop
May 30, 2024

I need this too.

2 replies

MJDeligan May 30, 2024
Author

There's a PR already. See #5586 .

You can also extend the ChatOpenAI class' streamChunkResponse method for a quick hack. The usage is always the last chunk in the stream by OpenAI and has no generated text, ie. 'choice' evaluates to false, but it has a usage property.

kangyeop Jun 5, 2024

Thank you for your help!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support response metadata when streaming from OpenAI models #5322

{{title}}

Replies: 1 comment 2 replies

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

Select a reply

Support response metadata when streaming from OpenAI models #5322

MJDeligan May 8, 2024

Checked

Feature request

Motivation

Proposal (If applicable)

Replies: 1 comment · 2 replies

kangyeop May 30, 2024

MJDeligan May 30, 2024 Author

kangyeop Jun 5, 2024

MJDeligan
May 8, 2024

Replies: 1 comment 2 replies

kangyeop
May 30, 2024

MJDeligan May 30, 2024
Author