Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: TokenCountingHandler remains 0 #12214

Open
BastiaanRudolf opened this issue Mar 24, 2024 · 8 comments
Open

[Bug]: TokenCountingHandler remains 0 #12214

BastiaanRudolf opened this issue Mar 24, 2024 · 8 comments
Labels
bug Something isn't working triage Issue needs to be triaged/prioritized

Comments

@BastiaanRudolf
Copy link

Bug Description

Hi! Love this project, and it's a blessing to work with.

I ran into a small problem. When implementing the new TokenCountingHandler through the new global Setting, the tokens remain 0. It does not return any warning or error.

What am I missing? Any help is much appreciated 🙏

Version

0.10.23

Steps to Reproduce

import openai
import tiktoken
from llama_index.llms.openai import OpenAI

from llama_index.core import Settings, VectorStoreIndex, SimpleDirectoryReader
from llama_index.core.memory import ChatMemoryBuffer
from llama_index.core.callbacks import CallbackManager, TokenCountingHandler
from llama_index.legacy.indices.empty import EmptyIndex
from .utils import create_chat_message, update_chat_message

openai.api_key = os.environ["OPENAI_API_KEY"]
token_counter = TokenCountingHandler(
    tokenizer=tiktoken.encoding_for_model("gpt-3.5-turbo").encode,
    verbose=True,  # set to true to see usage printed to the console
)
Settings.llm = OpenAI(model="gpt-3.5-turbo", temperature=0.7)
Settings.callback_manager = CallbackManager([token_counter])

memory = ChatMemoryBuffer.from_defaults(token_limit=1500)

# Index
index = EmptyIndex()
engine = index.as_chat_engine(
    chat_mode="context",
    memory=memory,
    verbose=True
)

response = engine.stream_chat(
    message="A random message",
)

# Generate response & store in message
resp_string = ""
for resp in response.response_gen:
    resp_string += resp

# Get tokens & reset counter
update_chat_message(
    id=chat_message_responder_id,
    chat=chat,
    duration=time.time() - start,
    total_tokens=token_counter.total_llm_token_count,
    embedding_tokens=token_counter.total_embedding_token_count,
    prompt_tokens=token_counter.prompt_llm_token_count,
    completion_tokens=token_counter.completion_llm_token_count
)
token_counter.reset_counts()

Relevant Logs/Tracbacks

No response

@BastiaanRudolf BastiaanRudolf added bug Something isn't working triage Issue needs to be triaged/prioritized labels Mar 24, 2024
Copy link

dosubot bot commented Mar 24, 2024

Based on the information you've provided and the context from the LlamaIndex repository, it seems like there might be an issue with how the Settings.callback_manager is being set or how the TokenCountingHandler is being used in version 0.10.23 of the LlamaIndex project.

The TokenCountingHandler is designed to track and count tokens for different types of events, specifically for LLM (Language Learning Models) and embedding events. It maintains separate lists to track token counts for LLM and embedding events (llm_token_counts and embedding_token_counts, respectively). Each time an event ends, it adds a record to the respective list, indicating the total token count for that event. The handler also keeps a running total of the token counts for LLM and embedding events separately (total_llm_token_count and total_embedding_token_count). These totals are updated as events end, reflecting the cumulative token counts across all processed events of each type. You can find more details about this in the test_token_counter.py file.

However, if the Settings or the CallbackManager implementation expects a specific initialization process or if there's a need for the Settings to be explicitly saved or updated after modifying its properties, simply assigning a new CallbackManager to Settings.callback_manager might not trigger the necessary updates or hooks within the LlamaIndex framework to recognize and utilize the new callback manager properly.

To troubleshoot this issue, consider the following steps:

  • Review the release notes or documentation for version 0.10.23 of the LlamaIndex project to identify any changes related to the Settings or callback system.
  • Ensure that after setting the Settings.callback_manager, the settings are correctly applied or saved as required by the LlamaIndex framework.
  • Check if the CallbackManager or TokenCountingHandler requires any additional configuration or initialization steps in the new version.
  • Look into the source code or documentation for any deprecated methods or properties that might affect the token counting functionality.

Unfortunately, I wasn't able to find specific changes made to the TokenCountingHandler or the Settings and CallbackManager classes in the LlamaIndex project between version 0.10.22 and 0.10.23. Also, I couldn't find any known bugs or issues related to the TokenCountingHandler class in the LlamaIndex project. However, there might be more information available elsewhere or I could have missed it.

If you could provide more details about the behavior you're experiencing or any additional context, it would be helpful in further diagnosing the issue. For example, are there any specific conditions or steps that lead to the TokenCountingHandler not updating the token count as expected? Are there any other parts of your code that interact with the Settings or CallbackManager that might affect their behavior? Any additional information you can provide would be greatly appreciated.

Sources

About Dosu

This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.

Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

If you want to continue the conversation, start your reply with @dosu-bot.

@logan-markewich
Copy link
Collaborator

its kind of jank --- try attaching it directly to the LLM too

Settings.llm = OpenAI(model="gpt-3.5-turbo", temperature=0.7, callback_manager=callback_manager)

@BastiaanRudolf
Copy link
Author

Hi @logan-markewich, thanks for your reply!

I changed to the following:

token_counter = TokenCountingHandler(
    tokenizer=tiktoken.encoding_for_model("gpt-3.5-turbo").encode,
    verbose=True,  # set to true to see usage printed to the console
)
callback_manager = CallbackManager([token_counter])
Settings.callback_manager = callback_manager
Settings.llm = OpenAI(model="gpt-3.5-turbo", temperature=0.7, callback_manager=callback_manager)

Unfortunately, no luck..

@logan-markewich
Copy link
Collaborator

So weird.

OK one more thing

index.as_chat_engine(...., callback_manger=callback_manager)

@BastiaanRudolf
Copy link
Author

Thanks for the help, I tried, and unfortunately still all zeroes! Very weird behaviour.. I'm now using the legacy ServiceContext object to make it work.

@andreyka26-git
Copy link

Any progress on it? I am having the same problem

@MisteFr
Copy link

MisteFr commented Jun 19, 2024

Same issue here.

@GillesJ
Copy link

GillesJ commented Sep 4, 2024

Same issue, my first invocation will show a zero count, subsequent calls will be correctly updated to the token count have counted fine.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working triage Issue needs to be triaged/prioritized
Projects
None yet
Development

No branches or pull requests

5 participants