Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: last_token_time is equal to arrival_time #9998

Open
1 task done
wolfgangsmdt opened this issue Nov 4, 2024 · 7 comments · May be fixed by #11376
Open
1 task done

[Bug]: last_token_time is equal to arrival_time #9998

wolfgangsmdt opened this issue Nov 4, 2024 · 7 comments · May be fixed by #11376
Labels
bug Something isn't working

Comments

@wolfgangsmdt
Copy link

wolfgangsmdt commented Nov 4, 2024

Your current environment

The bug is not related to the envirement

Model Input Dumps

The bug does not related to the model

🐛 Describe the bug

QUESTION 1:

How do you calculate the RequestMetrics in RequestOutput please look at screen-shot below (in YELLOW):

image

I have found here in L. 696 that last_token_time is equal to arrival_time !!! IS IT A BUG?

Could you please tell me what unit is the time is it second? nanosecond? I believe it is something like this example below (correct me if I am wrong):

import time
arrival_time = time.perf_counter()

QUESTION 2:

How can I calculate the tokens/second (for output), TTFT, TBT, throughput and total time

Before submitting a new issue...

  • Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.
@DarkLight1337
Copy link
Member

cc @comaniac

@comaniac
Copy link
Collaborator

comaniac commented Nov 6, 2024

The line you pointed out isn't a bug and isn't the root cause of the issue you're reporting. After all, we don't know the last token time when creating this object upon receiving the request. It will be updated when seq_group.get_last_latency() is called, so we should try to understand why it's not being called for the request.

@wolfgangsmdt
Copy link
Author

wolfgangsmdt commented Nov 6, 2024

The line you pointed out isn't a bug and isn't the root cause of the issue you're reporting. After all, we don't know the last token time when creating this object upon receiving the request. It will be updated when seq_group.get_last_latency() is called, so we should try to understand why it's not being called for the request.

Thanks for the reply @comaniac .
What do you need from me like informations to help you fix the issue.

My goal is to have TTFT, TBT and Tokens/second

Thanks

@comaniac
Copy link
Collaborator

comaniac commented Nov 6, 2024

I don't have bandwidth to look into it now but it shouldn't be hard to fix. Please feel free to submit a PR if you are able to fix it and I'll help review, or I may try to fix that in my spare time (but cannot guarantee any timeline).

@sri-fiddler
Copy link

sri-fiddler commented Nov 14, 2024

@wolfgangsmdt, the last_token_time is updated in the following line.

self.metrics.last_token_time = now

But I am curious as to how the time to first token can be calculated. Would it be first_token_time - arrival_time?

@wolfgangsmdt
Copy link
Author

@sri-fiddler
I do not know why I have them the same.

For me to have the time to first token (TTFT) it will be as you said but in reverse first_token_time - arrival_time

So for me:

# TTFT (second)
ttft_s = first_token_time - arrival_time

# Total time (second) 
total_time_s = finished_time - arrival_time

# Tokens/second
tps = len(outputs[0].token_ids) / abs(last_token_time - first_token_time)

# Total throughput (for me I believe)
total_throughput = len(outputs[0].token_ids) / total_time_s

However, what is weird to me is that first_token_time is higher than last_token_time this is why I put abs (absolute value) to get the tokens/second.

Any idea on how to get the time between tokens (TBT)

@sri-fiddler
Copy link

Yes, @wolfgangsmdt, you're right. I've made the changes on my end too.

I figured that last_token_time - first_token_time would be the time between tokens, however, I get a negative value here, which is surprising.

@rajveerb rajveerb linked a pull request Dec 20, 2024 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants