-
Notifications
You must be signed in to change notification settings - Fork 0
/
benchmark_temperature.txt
7541 lines (7252 loc) · 318 KB
/
benchmark_temperature.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
GPU clocks set to "(gpuClkMin 800, gpuClkMax 800)" for GPU 00000001:00:00.0
Warning: persistence mode is disabled on device 00000001:00:00.0. See the Known Issues section of the nvidia-smi(1) man page for more information. Run with [--help | -h] switch to get more information on how to enable persistence mode.
GPU clocks set to "(gpuClkMin 800, gpuClkMax 800)" for GPU 00000002:00:00.0
Warning: persistence mode is disabled on device 00000002:00:00.0. See the Known Issues section of the nvidia-smi(1) man page for more information. Run with [--help | -h] switch to get more information on how to enable persistence mode.
GPU clocks set to "(gpuClkMin 800, gpuClkMax 800)" for GPU 00000003:00:00.0
Warning: persistence mode is disabled on device 00000003:00:00.0. See the Known Issues section of the nvidia-smi(1) man page for more information. Run with [--help | -h] switch to get more information on how to enable persistence mode.
GPU clocks set to "(gpuClkMin 800, gpuClkMax 800)" for GPU 00000008:00:00.0
Warning: persistence mode is disabled on device 00000008:00:00.0. See the Known Issues section of the nvidia-smi(1) man page for more information. Run with [--help | -h] switch to get more information on how to enable persistence mode.
GPU clocks set to "(gpuClkMin 800, gpuClkMax 800)" for GPU 00000009:00:00.0
Warning: persistence mode is disabled on device 00000009:00:00.0. See the Known Issues section of the nvidia-smi(1) man page for more information. Run with [--help | -h] switch to get more information on how to enable persistence mode.
GPU clocks set to "(gpuClkMin 800, gpuClkMax 800)" for GPU 0000000A:00:00.0
Warning: persistence mode is disabled on device 0000000A:00:00.0. See the Known Issues section of the nvidia-smi(1) man page for more information. Run with [--help | -h] switch to get more information on how to enable persistence mode.
GPU clocks set to "(gpuClkMin 800, gpuClkMax 800)" for GPU 0000000B:00:00.0
Warning: persistence mode is disabled on device 0000000B:00:00.0. See the Known Issues section of the nvidia-smi(1) man page for more information. Run with [--help | -h] switch to get more information on how to enable persistence mode.
GPU clocks set to "(gpuClkMin 800, gpuClkMax 800)" for GPU 0000000C:00:00.0
Warning: persistence mode is disabled on device 0000000C:00:00.0. See the Known Issues section of the nvidia-smi(1) man page for more information. Run with [--help | -h] switch to get more information on how to enable persistence mode.
All done.
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
- Avoid using `tokenizers` before the fork if possible
- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
- Avoid using `tokenizers` before the fork if possible
- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
- Avoid using `tokenizers` before the fork if possible
- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
- Avoid using `tokenizers` before the fork if possible
- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
- Avoid using `tokenizers` before the fork if possible
- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
- Avoid using `tokenizers` before the fork if possible
- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
- Avoid using `tokenizers` before the fork if possible
- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
- Avoid using `tokenizers` before the fork if possible
- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
GPU clocks set to "(gpuClkMin 1200, gpuClkMax 1200)" for GPU 00000001:00:00.0
Warning: persistence mode is disabled on device 00000001:00:00.0. See the Known Issues section of the nvidia-smi(1) man page for more information. Run with [--help | -h] switch to get more information on how to enable persistence mode.
GPU clocks set to "(gpuClkMin 1200, gpuClkMax 1200)" for GPU 00000002:00:00.0
Warning: persistence mode is disabled on device 00000002:00:00.0. See the Known Issues section of the nvidia-smi(1) man page for more information. Run with [--help | -h] switch to get more information on how to enable persistence mode.
GPU clocks set to "(gpuClkMin 1200, gpuClkMax 1200)" for GPU 00000003:00:00.0
Warning: persistence mode is disabled on device 00000003:00:00.0. See the Known Issues section of the nvidia-smi(1) man page for more information. Run with [--help | -h] switch to get more information on how to enable persistence mode.
GPU clocks set to "(gpuClkMin 1200, gpuClkMax 1200)" for GPU 00000008:00:00.0
Warning: persistence mode is disabled on device 00000008:00:00.0. See the Known Issues section of the nvidia-smi(1) man page for more information. Run with [--help | -h] switch to get more information on how to enable persistence mode.
GPU clocks set to "(gpuClkMin 1200, gpuClkMax 1200)" for GPU 00000009:00:00.0
Warning: persistence mode is disabled on device 00000009:00:00.0. See the Known Issues section of the nvidia-smi(1) man page for more information. Run with [--help | -h] switch to get more information on how to enable persistence mode.
GPU clocks set to "(gpuClkMin 1200, gpuClkMax 1200)" for GPU 0000000A:00:00.0
Warning: persistence mode is disabled on device 0000000A:00:00.0. See the Known Issues section of the nvidia-smi(1) man page for more information. Run with [--help | -h] switch to get more information on how to enable persistence mode.
GPU clocks set to "(gpuClkMin 1200, gpuClkMax 1200)" for GPU 0000000B:00:00.0
Warning: persistence mode is disabled on device 0000000B:00:00.0. See the Known Issues section of the nvidia-smi(1) man page for more information. Run with [--help | -h] switch to get more information on how to enable persistence mode.
GPU clocks set to "(gpuClkMin 1200, gpuClkMax 1200)" for GPU 0000000C:00:00.0
Warning: persistence mode is disabled on device 0000000C:00:00.0. See the Known Issues section of the nvidia-smi(1) man page for more information. Run with [--help | -h] switch to get more information on how to enable persistence mode.
All done.
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
- Avoid using `tokenizers` before the fork if possible
- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
- Avoid using `tokenizers` before the fork if possible
- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
- Avoid using `tokenizers` before the fork if possible
- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
- Avoid using `tokenizers` before the fork if possible
- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
- Avoid using `tokenizers` before the fork if possible
- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
- Avoid using `tokenizers` before the fork if possible
- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
- Avoid using `tokenizers` before the fork if possible
- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
- Avoid using `tokenizers` before the fork if possible
- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
GPU clocks set to "(gpuClkMin 1600, gpuClkMax 1600)" for GPU 00000001:00:00.0
Warning: persistence mode is disabled on device 00000001:00:00.0. See the Known Issues section of the nvidia-smi(1) man page for more information. Run with [--help | -h] switch to get more information on how to enable persistence mode.
GPU clocks set to "(gpuClkMin 1600, gpuClkMax 1600)" for GPU 00000002:00:00.0
Warning: persistence mode is disabled on device 00000002:00:00.0. See the Known Issues section of the nvidia-smi(1) man page for more information. Run with [--help | -h] switch to get more information on how to enable persistence mode.
GPU clocks set to "(gpuClkMin 1600, gpuClkMax 1600)" for GPU 00000003:00:00.0
Warning: persistence mode is disabled on device 00000003:00:00.0. See the Known Issues section of the nvidia-smi(1) man page for more information. Run with [--help | -h] switch to get more information on how to enable persistence mode.
GPU clocks set to "(gpuClkMin 1600, gpuClkMax 1600)" for GPU 00000008:00:00.0
Warning: persistence mode is disabled on device 00000008:00:00.0. See the Known Issues section of the nvidia-smi(1) man page for more information. Run with [--help | -h] switch to get more information on how to enable persistence mode.
GPU clocks set to "(gpuClkMin 1600, gpuClkMax 1600)" for GPU 00000009:00:00.0
Warning: persistence mode is disabled on device 00000009:00:00.0. See the Known Issues section of the nvidia-smi(1) man page for more information. Run with [--help | -h] switch to get more information on how to enable persistence mode.
GPU clocks set to "(gpuClkMin 1600, gpuClkMax 1600)" for GPU 0000000A:00:00.0
Warning: persistence mode is disabled on device 0000000A:00:00.0. See the Known Issues section of the nvidia-smi(1) man page for more information. Run with [--help | -h] switch to get more information on how to enable persistence mode.
GPU clocks set to "(gpuClkMin 1600, gpuClkMax 1600)" for GPU 0000000B:00:00.0
Warning: persistence mode is disabled on device 0000000B:00:00.0. See the Known Issues section of the nvidia-smi(1) man page for more information. Run with [--help | -h] switch to get more information on how to enable persistence mode.
GPU clocks set to "(gpuClkMin 1600, gpuClkMax 1600)" for GPU 0000000C:00:00.0
Warning: persistence mode is disabled on device 0000000C:00:00.0. See the Known Issues section of the nvidia-smi(1) man page for more information. Run with [--help | -h] switch to get more information on how to enable persistence mode.
All done.
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
- Avoid using `tokenizers` before the fork if possible
- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
- Avoid using `tokenizers` before the fork if possible
- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
- Avoid using `tokenizers` before the fork if possible
- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
- Avoid using `tokenizers` before the fork if possible
- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
- Avoid using `tokenizers` before the fork if possible
- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
- Avoid using `tokenizers` before the fork if possible
- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
Open ttft file
Started DCGMI
92886
288 - 96
288 - 256
288 - 1024
1053 - 96
1053 - 256
1053 - 600
8170 - 5
8170 - 256
8170 - 600
8170 - 5
Average ttft = 1.1352256337801616
P50 ttft = 0.876136302947998
P99 ttft = 2.2022947049140935
Average tbt = 0.5435085753599803
P50 tbt = 0.6201989889144899
P99 tbt = 1.733678161859513
All GPUs and memories are cold after 1.0298666954040527
92886
288 - 96
288 - 256
288 - 1024
1053 - 96
1053 - 256
1053 - 600
8170 - 5
8170 - 256
8170 - 600
8170 - 5
Average ttft = 1.6017358757200695
P50 ttft = 1.5017907619476318
P99 ttft = 3.2912034034729007
Average tbt = 1.0999141114098685
P50 tbt = 0.6281604290008547
P99 tbt = 3.455548877716066
All GPUs and memories are cold after 5.0813376903533936
92886
288 - 96
288 - 256
288 - 1024
1053 - 96
1053 - 256
1053 - 600
8170 - 5
8170 - 256
8170 - 600
8170 - 5
Average ttft = 5.159543718610491
P50 ttft = 4.687931299209595
P99 ttft = 10.50176710605621
Average tbt = 10.075611960547311
P50 tbt = 10.071581983566286
P99 tbt = 19.888732741832737
All GPUs and memories are cold after 12.066996097564697
92886
288 - 96
288 - 256
288 - 1024
1053 - 96
1053 - 256
1053 - 600
8170 - 5
8170 - 256
8170 - 600
8170 - 5
Average ttft = 8.711873164991053
P50 ttft = 8.106292724609375
P99 ttft = 15.35182752609253
Average tbt = 11.895955259625506
P50 tbt = 11.888936614990238
P99 tbt = 23.442219319343575
All GPUs and memories are cold after 16.08412218093872
92886
288 - 96
288 - 256
288 - 1024
1053 - 96
1053 - 256
1053 - 600
8170 - 5
8170 - 256
8170 - 600
8170 - 5
Average ttft = 12.551698861122132
P50 ttft = 12.110552072525024
P99 ttft = 23.79429818868637
Average tbt = 14.509407296180722
P50 tbt = 14.503122210502628
P99 tbt = 28.646445067405708
All GPUs and memories are cold after 17.042044639587402
92886
288 - 96
288 - 256
288 - 1024
1053 - 96
1053 - 256
1053 - 600
8170 - 5
8170 - 256
8170 - 600
8170 - 5
Average ttft = 17.719152756035328
P50 ttft = 18.35385036468506
P99 ttft = 34.327722125053405
Average tbt = 18.656084727868443
P50 tbt = 18.625362789630895
P99 tbt = 36.91130081748963
All GPUs and memories are cold after 20.20615792274475
92886
288 - 96
288 - 256
288 - 1024
1053 - 96
1053 - 256
1053 - 600
8170 - 5
8170 - 256
8170 - 600
8170 - 5
Average ttft = 21.713141545857468
P50 ttft = 21.870787620544434
P99 ttft = 42.29313683509827
Average tbt = 21.41989534064515
P50 tbt = 21.39086329936982
P99 tbt = 42.31246268272401
All GPUs and memories are cold after 20.05538296699524
92886
288 - 96
288 - 256
288 - 1024
1053 - 96
1053 - 256
1053 - 600
8170 - 5
8170 - 256
8170 - 600
8170 - 5
Average ttft = 25.84684233780367
P50 ttft = 25.599838495254517
P99 ttft = 50.09847380161285
Average tbt = 24.37608161863074
P50 tbt = 24.341833615303045
P99 tbt = 48.12956105041504
All GPUs and memories are cold after 21.130449056625366
92886
288 - 96
288 - 256
288 - 1024
1053 - 96
1053 - 256
1053 - 600
8170 - 5
8170 - 256
8170 - 600
8170 - 5
Average ttft = 0.7459004124005636
P50 ttft = 0.5905699729919434
P99 ttft = 1.450362391471863
Average tbt = 0.2257074157396953
P50 tbt = 0.023198771476745608
P99 tbt = 0.7919998803138738
All GPUs and memories are cold after 4.02316427230835
92886
288 - 96
288 - 256
288 - 1024
1053 - 96
1053 - 256
1053 - 600
8170 - 5
8170 - 256
8170 - 600
8170 - 5
Average ttft = 0.9712855361756825
P50 ttft = 0.8701717853546143
P99 ttft = 2.0539129734039308
Average tbt = 0.5273195720854261
P50 tbt = 0.4305848121643068
P99 tbt = 1.5720602989196784
All GPUs and memories are cold after 12.091125965118408
92886
288 - 96
288 - 256
288 - 1024
1053 - 96
1053 - 256
1053 - 600
8170 - 5
8170 - 256
8170 - 600
8170 - 5
Average ttft = 1.5478432519095284
P50 ttft = 1.3512358665466309
P99 ttft = 3.7517982530593854
Average tbt = 1.6524826635633199
P50 tbt = 1.2526120901107791
P99 tbt = 4.780488272666931
All GPUs and memories are cold after 20.108954668045044
92886
288 - 96
288 - 256
288 - 1024
1053 - 96
1053 - 256
1053 - 600
8170 - 5
8170 - 256
8170 - 600
8170 - 5
Average ttft = 3.29976786055216
P50 ttft = 2.8963170051574707
P99 ttft = 6.288105154037476
Average tbt = 8.254025662236096
P50 tbt = 8.291367459297183
P99 tbt = 16.266743226051336
All GPUs and memories are cold after 25.04439616203308
92886
288 - 96
288 - 256
288 - 1024
1053 - 96
1053 - 256
1053 - 600
8170 - 5
8170 - 256
8170 - 600
8170 - 5
Average ttft = 5.918560452461243
P50 ttft = 5.478826999664307
P99 ttft = 10.931826589107514
Average tbt = 10.030714231014255
P50 tbt = 10.029394137859347
P99 tbt = 19.7999940340519
All GPUs and memories are cold after 27.06801700592041
92886
288 - 96
288 - 256
288 - 1024
1053 - 96
1053 - 256
1053 - 600
8170 - 5
8170 - 256
8170 - 600
8170 - 5
Average ttft = 9.254296746104956
P50 ttft = 9.881670951843262
P99 ttft = 17.853433957099913
Average tbt = 12.904798230901363
P50 tbt = 12.883860814571385
P99 tbt = 25.534525032997138
All GPUs and memories are cold after 33.057289123535156
92886
288 - 96
288 - 256
288 - 1024
1053 - 96
1053 - 256
1053 - 600
8170 - 5
8170 - 256
8170 - 600
8170 - 5
Average ttft = 12.050372257624588
P50 ttft = 12.483715772628784
P99 ttft = 23.359632825851442
Average tbt = 14.770976258630629
P50 tbt = 14.722983121871952
P99 tbt = 29.209009356498726
All GPUs and memories are cold after 31.058574676513672
92886
288 - 96
288 - 256
288 - 1024
1053 - 96
1053 - 256
1053 - 600
8170 - 5
8170 - 256
8170 - 600
8170 - 5
Average ttft = 14.898976196725684
P50 ttft = 14.653162956237793
P99 ttft = 28.698010416030883
Average tbt = 16.849098265601924
P50 tbt = 16.82198548316956
P99 tbt = 33.25991248464585
All GPUs and memories are cold after 33.08184266090393
92886
288 - 96
288 - 256
288 - 1024
1053 - 96
1053 - 256
1053 - 600
8170 - 5
8170 - 256
8170 - 600
8170 - 5
Average ttft = 0.5806034207344055
P50 ttft = 0.4738748073577881
P99 ttft = 1.1139018869400026
Average tbt = 0.15552484393119817
P50 tbt = 0.020467746257781985
P99 tbt = 0.6346551132202152
All GPUs and memories are cold after 32.05880880355835
92886
288 - 96
288 - 256
288 - 1024
1053 - 96
1053 - 256
1053 - 600
8170 - 5
8170 - 256
8170 - 600
8170 - 5
Average ttft = 0.7099099386306036
P50 ttft = 0.6069660186767578
P99 ttft = 1.487411642074585
Average tbt = 0.3908513818468367
P50 tbt = 0.3441334009170533
P99 tbt = 1.2526388740539556
All GPUs and memories are cold after 42.07590985298157
92886
288 - 96
288 - 256
288 - 1024
1053 - 96
1053 - 256
1053 - 600
8170 - 5
8170 - 256
8170 - 600
8170 - 5
Average ttft = 1.048140573501587
P50 ttft = 0.8363847732543945
P99 ttft = 2.2609828519821153
Average tbt = 0.9878769002641954
P50 tbt = 0.6699353456497195
P99 tbt = 3.8110121078491206
All GPUs and memories are cold after 56.11322283744812
92886
288 - 96
288 - 256
288 - 1024
1053 - 96
1053 - 256
1053 - 600
8170 - 5
8170 - 256
8170 - 600
8170 - 5
Average ttft = 1.3631215269972639
P50 ttft = 1.1003496646881104
P99 ttft = 3.2562735080719
Average tbt = 2.415040567444593
P50 tbt = 2.2972812652587895
P99 tbt = 5.825597548484804
All GPUs and memories are cold after 63.10582709312439
92886
288 - 96
288 - 256
288 - 1024
1053 - 96
1053 - 256
1053 - 600
8170 - 5
8170 - 256
8170 - 600
8170 - 5
Average ttft = 2.9081860542297364
P50 ttft = 2.8809804916381836
P99 ttft = 5.134545903205871
Average tbt = 7.996061216354369
P50 tbt = 7.989631962776186
P99 tbt = 15.782534518480304
All GPUs and memories are cold after 68.11416697502136
92886
288 - 96
288 - 256
288 - 1024
1053 - 96
1053 - 256
1053 - 600
8170 - 5
8170 - 256
8170 - 600
8170 - 5
Average ttft = 5.432557161897421
P50 ttft = 5.869601011276245
P99 ttft = 10.380073151588439
Average tbt = 10.289361194521184
P50 tbt = 10.26706328392029
P99 tbt = 20.374068925857546
All GPUs and memories are cold after 72.1
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
- Avoid using `tokenizers` before the fork if possible
- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
- Avoid using `tokenizers` before the fork if possible
- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
GPU clocks set to "(gpuClkMin 1980, gpuClkMax 1980)" for GPU 00000001:00:00.0
Warning: persistence mode is disabled on device 00000001:00:00.0. See the Known Issues section of the nvidia-smi(1) man page for more information. Run with [--help | -h] switch to get more information on how to enable persistence mode.
GPU clocks set to "(gpuClkMin 1980, gpuClkMax 1980)" for GPU 00000002:00:00.0
Warning: persistence mode is disabled on device 00000002:00:00.0. See the Known Issues section of the nvidia-smi(1) man page for more information. Run with [--help | -h] switch to get more information on how to enable persistence mode.
GPU clocks set to "(gpuClkMin 1980, gpuClkMax 1980)" for GPU 00000003:00:00.0
Warning: persistence mode is disabled on device 00000003:00:00.0. See the Known Issues section of the nvidia-smi(1) man page for more information. Run with [--help | -h] switch to get more information on how to enable persistence mode.
GPU clocks set to "(gpuClkMin 1980, gpuClkMax 1980)" for GPU 00000008:00:00.0
Warning: persistence mode is disabled on device 00000008:00:00.0. See the Known Issues section of the nvidia-smi(1) man page for more information. Run with [--help | -h] switch to get more information on how to enable persistence mode.
GPU clocks set to "(gpuClkMin 1980, gpuClkMax 1980)" for GPU 00000009:00:00.0
Warning: persistence mode is disabled on device 00000009:00:00.0. See the Known Issues section of the nvidia-smi(1) man page for more information. Run with [--help | -h] switch to get more information on how to enable persistence mode.
GPU clocks set to "(gpuClkMin 1980, gpuClkMax 1980)" for GPU 0000000A:00:00.0
Warning: persistence mode is disabled on device 0000000A:00:00.0. See the Known Issues section of the nvidia-smi(1) man page for more information. Run with [--help | -h] switch to get more information on how to enable persistence mode.
GPU clocks set to "(gpuClkMin 1980, gpuClkMax 1980)" for GPU 0000000B:00:00.0
Warning: persistence mode is disabled on device 0000000B:00:00.0. See the Known Issues section of the nvidia-smi(1) man page for more information. Run with [--help | -h] switch to get more information on how to enable persistence mode.
GPU clocks set to "(gpuClkMin 1980, gpuClkMax 1980)" for GPU 0000000C:00:00.0
Warning: persistence mode is disabled on device 0000000C:00:00.0. See the Known Issues section of the nvidia-smi(1) man page for more information. Run with [--help | -h] switch to get more information on how to enable persistence mode.
All done.
All done.
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
- Avoid using `tokenizers` before the fork if possible
- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
- Avoid using `tokenizers` before the fork if possible
- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
- Avoid using `tokenizers` before the fork if possible
- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
- Avoid using `tokenizers` before the fork if possible
- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
- Avoid using `tokenizers` before the fork if possible
- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
- Avoid using `tokenizers` before the fork if possible
- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
- Avoid using `tokenizers` before the fork if possible
- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
- Avoid using `tokenizers` before the fork if possible
- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
GPU clocks set to "(gpuClkMin 800, gpuClkMax 800)" for GPU 00000001:00:00.0
Warning: persistence mode is disabled on device 00000001:00:00.0. See the Known Issues section of the nvidia-smi(1) man page for more information. Run with [--help | -h] switch to get more information on how to enable persistence mode.
GPU clocks set to "(gpuClkMin 800, gpuClkMax 800)" for GPU 00000002:00:00.0
Warning: persistence mode is disabled on device 00000002:00:00.0. See the Known Issues section of the nvidia-smi(1) man page for more information. Run with [--help | -h] switch to get more information on how to enable persistence mode.
GPU clocks set to "(gpuClkMin 800, gpuClkMax 800)" for GPU 00000003:00:00.0
Warning: persistence mode is disabled on device 00000003:00:00.0. See the Known Issues section of the nvidia-smi(1) man page for more information. Run with [--help | -h] switch to get more information on how to enable persistence mode.
GPU clocks set to "(gpuClkMin 800, gpuClkMax 800)" for GPU 00000008:00:00.0
Warning: persistence mode is disabled on device 00000008:00:00.0. See the Known Issues section of the nvidia-smi(1) man page for more information. Run with [--help | -h] switch to get more information on how to enable persistence mode.
GPU clocks set to "(gpuClkMin 800, gpuClkMax 800)" for GPU 00000009:00:00.0
Warning: persistence mode is disabled on device 00000009:00:00.0. See the Known Issues section of the nvidia-smi(1) man page for more information. Run with [--help | -h] switch to get more information on how to enable persistence mode.
GPU clocks set to "(gpuClkMin 800, gpuClkMax 800)" for GPU 0000000A:00:00.0
Warning: persistence mode is disabled on device 0000000A:00:00.0. See the Known Issues section of the nvidia-smi(1) man page for more information. Run with [--help | -h] switch to get more information on how to enable persistence mode.
GPU clocks set to "(gpuClkMin 800, gpuClkMax 800)" for GPU 0000000B:00:00.0
Warning: persistence mode is disabled on device 0000000B:00:00.0. See the Known Issues section of the nvidia-smi(1) man page for more information. Run with [--help | -h] switch to get more information on how to enable persistence mode.
GPU clocks set to "(gpuClkMin 800, gpuClkMax 800)" for GPU 0000000C:00:00.0
Warning: persistence mode is disabled on device 0000000C:00:00.0. See the Known Issues section of the nvidia-smi(1) man page for more information. Run with [--help | -h] switch to get more information on how to enable persistence mode.
All done.
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
- Avoid using `tokenizers` before the fork if possible
- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
- Avoid using `tokenizers` before the fork if possible
- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
- Avoid using `tokenizers` before the fork if possible
- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
- Avoid using `tokenizers` before the fork if possible
- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
- Avoid using `tokenizers` before the fork if possible
- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
- Avoid using `tokenizers` before the fork if possible
- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
- Avoid using `tokenizers` before the fork if possible
- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
- Avoid using `tokenizers` before the fork if possible
- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
GPU clocks set to "(gpuClkMin 1200, gpuClkMax 1200)" for GPU 00000001:00:00.0
Warning: persistence mode is disabled on device 00000001:00:00.0. See the Known Issues section of the nvidia-smi(1) man page for more information. Run with [--help | -h] switch to get more information on how to enable persistence mode.
GPU clocks set to "(gpuClkMin 1200, gpuClkMax 1200)" for GPU 00000002:00:00.0
Warning: persistence mode is disabled on device 00000002:00:00.0. See the Known Issues section of the nvidia-smi(1) man page for more information. Run with [--help | -h] switch to get more information on how to enable persistence mode.
GPU clocks set to "(gpuClkMin 1200, gpuClkMax 1200)" for GPU 00000003:00:00.0
Warning: persistence mode is disabled on device 00000003:00:00.0. See the Known Issues section of the nvidia-smi(1) man page for more information. Run with [--help | -h] switch to get more information on how to enable persistence mode.
GPU clocks set to "(gpuClkMin 1200, gpuClkMax 1200)" for GPU 00000008:00:00.0
Warning: persistence mode is disabled on device 00000008:00:00.0. See the Known Issues section of the nvidia-smi(1) man page for more information. Run with [--help | -h] switch to get more information on how to enable persistence mode.
GPU clocks set to "(gpuClkMin 1200, gpuClkMax 1200)" for GPU 00000009:00:00.0
Warning: persistence mode is disabled on device 00000009:00:00.0. See the Known Issues section of the nvidia-smi(1) man page for more information. Run with [--help | -h] switch to get more information on how to enable persistence mode.
GPU clocks set to "(gpuClkMin 1200, gpuClkMax 1200)" for GPU 0000000A:00:00.0
Warning: persistence mode is disabled on device 0000000A:00:00.0. See the Known Issues section of the nvidia-smi(1) man page for more information. Run with [--help | -h] switch to get more information on how to enable persistence mode.
GPU clocks set to "(gpuClkMin 1200, gpuClkMax 1200)" for GPU 0000000B:00:00.0
Warning: persistence mode is disabled on device 0000000B:00:00.0. See the Known Issues section of the nvidia-smi(1) man page for more information. Run with [--help | -h] switch to get more information on how to enable persistence mode.
GPU clocks set to "(gpuClkMin 1200, gpuClkMax 1200)" for GPU 0000000C:00:00.0
Warning: persistence mode is disabled on device 0000000C:00:00.0. See the Known Issues section of the nvidia-smi(1) man page for more information. Run with [--help | -h] switch to get more information on how to enable persistence mode.
All done.
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
- Avoid using `tokenizers` before the fork if possible
- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
- Avoid using `tokenizers` before the fork if possible
- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
- Avoid using `tokenizers` before the fork if possible
- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
- Avoid using `tokenizers` before the fork if possible
- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
72.12426424026489
92886
288 - 96
288 - 256
288 - 1024
1053 - 96
1053 - 256
1053 - 600
8170 - 5
8170 - 256
8170 - 600
8170 - 5
Average ttft = 7.65628526635366
P50 ttft = 7.961462497711182
P99 ttft = 14.783538103103638
Average tbt = 11.761313095811296
P50 tbt = 11.736917662620549
P99 tbt = 23.260148169517525
All GPUs and memories are cold after 74.11718392372131
92886
288 - 96
288 - 256
288 - 1024
1053 - 96
1053 - 256
1053 - 600
8170 - 5
8170 - 256
8170 - 600
8170 - 5
Average ttft = 9.976863220513586
P50 ttft = 9.731972932815552
P99 ttft = 19.027587227821346
Average tbt = 13.448294464076854
P50 tbt = 13.421716213226322
P99 tbt = 26.56646044158936
All GPUs and memories are cold after 76.13772249221802
92886
288 - 96
288 - 256
288 - 1024
1053 - 96
1053 - 256
1053 - 600
8170 - 5
8170 - 256
8170 - 600
8170 - 5
Average ttft = 0.5461323459943136
P50 ttft = 0.44780755043029785
P99 ttft = 1.0404448342323305
Average tbt = 0.14800097743670151
P50 tbt = 0.019137001037597655
P99 tbt = 0.602126130819321
All GPUs and memories are cold after 62.20918416976929
92886
288 - 96
288 - 256
288 - 1024
1053 - 96
1053 - 256
1053 - 600
8170 - 5
8170 - 256
8170 - 600
8170 - 5
Average ttft = 0.6577123914446149
P50 ttft = 0.5194463729858398
P99 ttft = 1.3729460239410403
Average tbt = 0.355010341462635
P50 tbt = 0.32712445259094247
P99 tbt = 1.1928163766860966
All GPUs and memories are cold after 79.12298774719238
92886
288 - 96
288 - 256
288 - 1024
1053 - 96
1053 - 256
1053 - 600
8170 - 5
8170 - 256
8170 - 600
8170 - 5
Average ttft = 0.9552283832005092
P50 ttft = 0.7916269302368164
P99 ttft = 1.978054947853087
Average tbt = 0.9373453535352437
P50 tbt = 0.6352513790130616
P99 tbt = 3.6178555655479427
All GPUs and memories are cold after 86.14342427253723
92886
288 - 96
288 - 256
288 - 1024
1053 - 96
1053 - 256
1053 - 600
8170 - 5
8170 - 256
8170 - 600
8170 - 5
Average ttft = 1.1504706289710067
P50 ttft = 0.8854672908782959
P99 ttft = 2.8429336547851567
Average tbt = 1.8248011594865383
P50 tbt = 1.6982678890228273
P99 tbt = 5.146566648483278
All GPUs and memories are cold after 85.1427218914032
92886
288 - 96
288 - 256
288 - 1024
1053 - 96
1053 - 256
1053 - 600
8170 - 5
8170 - 256
8170 - 600
8170 - 5
Average ttft = 2.41979350566864
P50 ttft = 2.520090341567993
P99 ttft = 4.282202658653259
Average tbt = 7.6114223971366926
P50 tbt = 7.59765387773514
P99 tbt = 15.082071592807774
All GPUs and memories are cold after 97.16065502166748
92886
288 - 96
288 - 256
288 - 1024
1053 - 96
1053 - 256
1053 - 600
8170 - 5
8170 - 256
8170 - 600
8170 - 5
Average ttft = 4.650351241230965
P50 ttft = 5.074915289878845
P99 ttft = 8.931202642917633
Average tbt = 9.80494931451976
P50 tbt = 9.757636535167697
P99 tbt = 19.367487744092948
All GPUs and memories are cold after 99.15201306343079
92886
288 - 96
288 - 256
288 - 1024
1053 - 96
1053 - 256
1053 - 600
8170 - 5
8170 - 256
8170 - 600
8170 - 5
Average ttft = 6.789819808855449
P50 ttft = 7.119378328323364
P99 ttft = 13.187811222076416
Average tbt = 11.22867522076385
P50 tbt = 11.166554117202761
P99 tbt = 22.152746355056767
All GPUs and memories are cold after 129.21574473381042
92886
288 - 96
288 - 256
288 - 1024
1053 - 96
1053 - 256
1053 - 600
8170 - 5
8170 - 256
8170 - 600
8170 - 5
Average ttft = 8.94313437680164
P50 ttft = 8.726092338562012
P99 ttft = 17.137001399993895
Average tbt = 12.80624394359359
P50 tbt = 12.752905344963077
P99 tbt = 25.239237537860873
All GPUs and memories are cold after 116.20280933380127
92886
288 - 96
288 - 256
288 - 1024
1053 - 96
1053 - 256
1053 - 600
8170 - 5
8170 - 256
8170 - 600
8170 - 5
Average ttft = 1.1328892509142559
P50 ttft = 0.8683412075042725
P99 ttft = 2.2019528698921205
Average tbt = 0.5437449375788371
P50 tbt = 0.6199451208114626
P99 tbt = 1.7351550815105448
All GPUs and memories are cold after 1.046881914138794
92886
288 - 96
288 - 256
288 - 1024
1053 - 96
1053 - 256
1053 - 600
8170 - 5
8170 - 256
8170 - 600
8170 - 5
Average ttft = 1.6029504594348727
P50 ttft = 1.501896619796753
P99 ttft = 3.2798263549804694
Average tbt = 1.0976803677422662
P50 tbt = 0.6271430730819704
P99 tbt = 3.4461487579345715
All GPUs and memories are cold after 9.024489402770996
92886
288 - 96
288 - 256
288 - 1024
1053 - 96
1053 - 256
1053 - 600
8170 - 5
8170 - 256
8170 - 600
8170 - 5
Average ttft = 5.143965148925782
P50 ttft = 4.670084476470947