-
-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bug]: last_token_time is equal to arrival_time #9998
Comments
cc @comaniac |
The line you pointed out isn't a bug and isn't the root cause of the issue you're reporting. After all, we don't know the last token time when creating this object upon receiving the request. It will be updated when |
Thanks for the reply @comaniac . My goal is to have TTFT, TBT and Tokens/second Thanks |
I don't have bandwidth to look into it now but it shouldn't be hard to fix. Please feel free to submit a PR if you are able to fix it and I'll help review, or I may try to fix that in my spare time (but cannot guarantee any timeline). |
@wolfgangsmdt, the Line 801 in ccb5376
But I am curious as to how the time to first token can be calculated. Would it be |
@sri-fiddler For me to have the time to first token (TTFT) it will be as you said but in reverse So for me: # TTFT (second)
ttft_s = first_token_time - arrival_time
# Total time (second)
total_time_s = finished_time - arrival_time
# Tokens/second
tps = len(outputs[0].token_ids) / abs(last_token_time - first_token_time)
# Total throughput (for me I believe)
total_throughput = len(outputs[0].token_ids) / total_time_s However, what is weird to me is that Any idea on how to get the time between tokens (TBT) |
Yes, @wolfgangsmdt, you're right. I've made the changes on my end too. I figured that |
Your current environment
The bug is not related to the envirement
Model Input Dumps
The bug does not related to the model
🐛 Describe the bug
QUESTION 1:
How do you calculate the
RequestMetrics
inRequestOutput
please look at screen-shot below (in YELLOW):I have found here in L. 696 that
last_token_time
is equal toarrival_time
!!! IS IT A BUG?Could you please tell me what unit is the time is it second? nanosecond? I believe it is something like this example below (correct me if I am wrong):
QUESTION 2:
How can I calculate the tokens/second (for output), TTFT, TBT, throughput and total time
Before submitting a new issue...
The text was updated successfully, but these errors were encountered: