benchmark_temperature.txt

GPU clocks set to "(gpuClkMin 800, gpuClkMax 800)" for GPU 00000001:00:00.0

Warning: persistence mode is disabled on device 00000001:00:00.0. See the Known Issues section of the nvidia-smi(1) man page for more information. Run with [--help | -h] switch to get more information on how to enable persistence mode.
GPU clocks set to "(gpuClkMin 800, gpuClkMax 800)" for GPU 00000002:00:00.0

Warning: persistence mode is disabled on device 00000002:00:00.0. See the Known Issues section of the nvidia-smi(1) man page for more information. Run with [--help | -h] switch to get more information on how to enable persistence mode.
GPU clocks set to "(gpuClkMin 800, gpuClkMax 800)" for GPU 00000003:00:00.0

Warning: persistence mode is disabled on device 00000003:00:00.0. See the Known Issues section of the nvidia-smi(1) man page for more information. Run with [--help | -h] switch to get more information on how to enable persistence mode.
GPU clocks set to "(gpuClkMin 800, gpuClkMax 800)" for GPU 00000008:00:00.0

Warning: persistence mode is disabled on device 00000008:00:00.0. See the Known Issues section of the nvidia-smi(1) man page for more information. Run with [--help | -h] switch to get more information on how to enable persistence mode.
GPU clocks set to "(gpuClkMin 800, gpuClkMax 800)" for GPU 00000009:00:00.0

Warning: persistence mode is disabled on device 00000009:00:00.0. See the Known Issues section of the nvidia-smi(1) man page for more information. Run with [--help | -h] switch to get more information on how to enable persistence mode.
GPU clocks set to "(gpuClkMin 800, gpuClkMax 800)" for GPU 0000000A:00:00.0

Warning: persistence mode is disabled on device 0000000A:00:00.0. See the Known Issues section of the nvidia-smi(1) man page for more information. Run with [--help | -h] switch to get more information on how to enable persistence mode.
GPU clocks set to "(gpuClkMin 800, gpuClkMax 800)" for GPU 0000000B:00:00.0

Warning: persistence mode is disabled on device 0000000B:00:00.0. See the Known Issues section of the nvidia-smi(1) man page for more information. Run with [--help | -h] switch to get more information on how to enable persistence mode.
GPU clocks set to "(gpuClkMin 800, gpuClkMax 800)" for GPU 0000000C:00:00.0

Warning: persistence mode is disabled on device 0000000C:00:00.0. See the Known Issues section of the nvidia-smi(1) man page for more information. Run with [--help | -h] switch to get more information on how to enable persistence mode.
All done.
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
GPU clocks set to "(gpuClkMin 1200, gpuClkMax 1200)" for GPU 00000001:00:00.0

Warning: persistence mode is disabled on device 00000001:00:00.0. See the Known Issues section of the nvidia-smi(1) man page for more information. Run with [--help | -h] switch to get more information on how to enable persistence mode.
GPU clocks set to "(gpuClkMin 1200, gpuClkMax 1200)" for GPU 00000002:00:00.0

Warning: persistence mode is disabled on device 00000002:00:00.0. See the Known Issues section of the nvidia-smi(1) man page for more information. Run with [--help | -h] switch to get more information on how to enable persistence mode.
GPU clocks set to "(gpuClkMin 1200, gpuClkMax 1200)" for GPU 00000003:00:00.0

Warning: persistence mode is disabled on device 00000003:00:00.0. See the Known Issues section of the nvidia-smi(1) man page for more information. Run with [--help | -h] switch to get more information on how to enable persistence mode.
GPU clocks set to "(gpuClkMin 1200, gpuClkMax 1200)" for GPU 00000008:00:00.0

Warning: persistence mode is disabled on device 00000008:00:00.0. See the Known Issues section of the nvidia-smi(1) man page for more information. Run with [--help | -h] switch to get more information on how to enable persistence mode.
GPU clocks set to "(gpuClkMin 1200, gpuClkMax 1200)" for GPU 00000009:00:00.0

Warning: persistence mode is disabled on device 00000009:00:00.0. See the Known Issues section of the nvidia-smi(1) man page for more information. Run with [--help | -h] switch to get more information on how to enable persistence mode.
GPU clocks set to "(gpuClkMin 1200, gpuClkMax 1200)" for GPU 0000000A:00:00.0

Warning: persistence mode is disabled on device 0000000A:00:00.0. See the Known Issues section of the nvidia-smi(1) man page for more information. Run with [--help | -h] switch to get more information on how to enable persistence mode.
GPU clocks set to "(gpuClkMin 1200, gpuClkMax 1200)" for GPU 0000000B:00:00.0

Warning: persistence mode is disabled on device 0000000B:00:00.0. See the Known Issues section of the nvidia-smi(1) man page for more information. Run with [--help | -h] switch to get more information on how to enable persistence mode.
GPU clocks set to "(gpuClkMin 1200, gpuClkMax 1200)" for GPU 0000000C:00:00.0

Warning: persistence mode is disabled on device 0000000C:00:00.0. See the Known Issues section of the nvidia-smi(1) man page for more information. Run with [--help | -h] switch to get more information on how to enable persistence mode.
All done.
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
GPU clocks set to "(gpuClkMin 1600, gpuClkMax 1600)" for GPU 00000001:00:00.0

Warning: persistence mode is disabled on device 00000001:00:00.0. See the Known Issues section of the nvidia-smi(1) man page for more information. Run with [--help | -h] switch to get more information on how to enable persistence mode.
GPU clocks set to "(gpuClkMin 1600, gpuClkMax 1600)" for GPU 00000002:00:00.0

Warning: persistence mode is disabled on device 00000002:00:00.0. See the Known Issues section of the nvidia-smi(1) man page for more information. Run with [--help | -h] switch to get more information on how to enable persistence mode.
GPU clocks set to "(gpuClkMin 1600, gpuClkMax 1600)" for GPU 00000003:00:00.0

Warning: persistence mode is disabled on device 00000003:00:00.0. See the Known Issues section of the nvidia-smi(1) man page for more information. Run with [--help | -h] switch to get more information on how to enable persistence mode.
GPU clocks set to "(gpuClkMin 1600, gpuClkMax 1600)" for GPU 00000008:00:00.0

Warning: persistence mode is disabled on device 00000008:00:00.0. See the Known Issues section of the nvidia-smi(1) man page for more information. Run with [--help | -h] switch to get more information on how to enable persistence mode.
GPU clocks set to "(gpuClkMin 1600, gpuClkMax 1600)" for GPU 00000009:00:00.0

Warning: persistence mode is disabled on device 00000009:00:00.0. See the Known Issues section of the nvidia-smi(1) man page for more information. Run with [--help | -h] switch to get more information on how to enable persistence mode.
GPU clocks set to "(gpuClkMin 1600, gpuClkMax 1600)" for GPU 0000000A:00:00.0

Warning: persistence mode is disabled on device 0000000A:00:00.0. See the Known Issues section of the nvidia-smi(1) man page for more information. Run with [--help | -h] switch to get more information on how to enable persistence mode.
GPU clocks set to "(gpuClkMin 1600, gpuClkMax 1600)" for GPU 0000000B:00:00.0

Warning: persistence mode is disabled on device 0000000B:00:00.0. See the Known Issues section of the nvidia-smi(1) man page for more information. Run with [--help | -h] switch to get more information on how to enable persistence mode.
GPU clocks set to "(gpuClkMin 1600, gpuClkMax 1600)" for GPU 0000000C:00:00.0

Warning: persistence mode is disabled on device 0000000C:00:00.0. See the Known Issues section of the nvidia-smi(1) man page for more information. Run with [--help | -h] switch to get more information on how to enable persistence mode.
All done.
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
Open ttft file
Started DCGMI
92886
288  -  96
288  -  256
288  -  1024
1053  -  96
1053  -  256
1053  -  600
8170  -  5
8170  -  256
8170  -  600
8170 - 5
Average ttft =  1.1352256337801616
P50 ttft =  0.876136302947998
P99 ttft =  2.2022947049140935
Average tbt =  0.5435085753599803
P50 tbt =  0.6201989889144899
P99 tbt =  1.733678161859513
All GPUs and memories are cold after  1.0298666954040527
92886
288  -  96
288  -  256
288  -  1024
1053  -  96
1053  -  256
1053  -  600
8170  -  5
8170  -  256
8170  -  600
8170 - 5
Average ttft =  1.6017358757200695
P50 ttft =  1.5017907619476318
P99 ttft =  3.2912034034729007
Average tbt =  1.0999141114098685
P50 tbt =  0.6281604290008547
P99 tbt =  3.455548877716066
All GPUs and memories are cold after  5.0813376903533936
92886
288  -  96
288  -  256
288  -  1024
1053  -  96
1053  -  256
1053  -  600
8170  -  5
8170  -  256
8170  -  600
8170 - 5
Average ttft =  5.159543718610491
P50 ttft =  4.687931299209595
P99 ttft =  10.50176710605621
Average tbt =  10.075611960547311
P50 tbt =  10.071581983566286
P99 tbt =  19.888732741832737
All GPUs and memories are cold after  12.066996097564697
92886
288  -  96
288  -  256
288  -  1024
1053  -  96
1053  -  256
1053  -  600
8170  -  5
8170  -  256
8170  -  600
8170 - 5
Average ttft =  8.711873164991053
P50 ttft =  8.106292724609375
P99 ttft =  15.35182752609253
Average tbt =  11.895955259625506
P50 tbt =  11.888936614990238
P99 tbt =  23.442219319343575
All GPUs and memories are cold after  16.08412218093872
92886
288  -  96
288  -  256
288  -  1024
1053  -  96
1053  -  256
1053  -  600
8170  -  5
8170  -  256
8170  -  600
8170 - 5
Average ttft =  12.551698861122132
P50 ttft =  12.110552072525024
P99 ttft =  23.79429818868637
Average tbt =  14.509407296180722
P50 tbt =  14.503122210502628
P99 tbt =  28.646445067405708
All GPUs and memories are cold after  17.042044639587402
92886
288  -  96
288  -  256
288  -  1024
1053  -  96
1053  -  256
1053  -  600
8170  -  5
8170  -  256
8170  -  600
8170 - 5
Average ttft =  17.719152756035328
P50 ttft =  18.35385036468506
P99 ttft =  34.327722125053405
Average tbt =  18.656084727868443
P50 tbt =  18.625362789630895
P99 tbt =  36.91130081748963
All GPUs and memories are cold after  20.20615792274475
92886
288  -  96
288  -  256
288  -  1024
1053  -  96
1053  -  256
1053  -  600
8170  -  5
8170  -  256
8170  -  600
8170 - 5
Average ttft =  21.713141545857468
P50 ttft =  21.870787620544434
P99 ttft =  42.29313683509827
Average tbt =  21.41989534064515
P50 tbt =  21.39086329936982
P99 tbt =  42.31246268272401
All GPUs and memories are cold after  20.05538296699524
92886
288  -  96
288  -  256
288  -  1024
1053  -  96
1053  -  256
1053  -  600
8170  -  5
8170  -  256
8170  -  600
8170 - 5
Average ttft =  25.84684233780367
P50 ttft =  25.599838495254517
P99 ttft =  50.09847380161285
Average tbt =  24.37608161863074
P50 tbt =  24.341833615303045
P99 tbt =  48.12956105041504
All GPUs and memories are cold after  21.130449056625366
92886
288  -  96
288  -  256
288  -  1024
1053  -  96
1053  -  256
1053  -  600
8170  -  5
8170  -  256
8170  -  600
8170 - 5
Average ttft =  0.7459004124005636
P50 ttft =  0.5905699729919434
P99 ttft =  1.450362391471863
Average tbt =  0.2257074157396953
P50 tbt =  0.023198771476745608
P99 tbt =  0.7919998803138738
All GPUs and memories are cold after  4.02316427230835
92886
288  -  96
288  -  256
288  -  1024
1053  -  96
1053  -  256
1053  -  600
8170  -  5
8170  -  256
8170  -  600
8170 - 5
Average ttft =  0.9712855361756825
P50 ttft =  0.8701717853546143
P99 ttft =  2.0539129734039308
Average tbt =  0.5273195720854261
P50 tbt =  0.4305848121643068
P99 tbt =  1.5720602989196784
All GPUs and memories are cold after  12.091125965118408
92886
288  -  96
288  -  256
288  -  1024
1053  -  96
1053  -  256
1053  -  600
8170  -  5
8170  -  256
8170  -  600
8170 - 5
Average ttft =  1.5478432519095284
P50 ttft =  1.3512358665466309
P99 ttft =  3.7517982530593854
Average tbt =  1.6524826635633199
P50 tbt =  1.2526120901107791
P99 tbt =  4.780488272666931
All GPUs and memories are cold after  20.108954668045044
92886
288  -  96
288  -  256
288  -  1024
1053  -  96
1053  -  256
1053  -  600
8170  -  5
8170  -  256
8170  -  600
8170 - 5
Average ttft =  3.29976786055216
P50 ttft =  2.8963170051574707
P99 ttft =  6.288105154037476
Average tbt =  8.254025662236096
P50 tbt =  8.291367459297183
P99 tbt =  16.266743226051336
All GPUs and memories are cold after  25.04439616203308
92886
288  -  96
288  -  256
288  -  1024
1053  -  96
1053  -  256
1053  -  600
8170  -  5
8170  -  256
8170  -  600
8170 - 5
Average ttft =  5.918560452461243
P50 ttft =  5.478826999664307
P99 ttft =  10.931826589107514
Average tbt =  10.030714231014255
P50 tbt =  10.029394137859347
P99 tbt =  19.7999940340519
All GPUs and memories are cold after  27.06801700592041
92886
288  -  96
288  -  256
288  -  1024
1053  -  96
1053  -  256
1053  -  600
8170  -  5
8170  -  256
8170  -  600
8170 - 5
Average ttft =  9.254296746104956
P50 ttft =  9.881670951843262
P99 ttft =  17.853433957099913
Average tbt =  12.904798230901363
P50 tbt =  12.883860814571385
P99 tbt =  25.534525032997138
All GPUs and memories are cold after  33.057289123535156
92886
288  -  96
288  -  256
288  -  1024
1053  -  96
1053  -  256
1053  -  600
8170  -  5
8170  -  256
8170  -  600
8170 - 5
Average ttft =  12.050372257624588
P50 ttft =  12.483715772628784
P99 ttft =  23.359632825851442
Average tbt =  14.770976258630629
P50 tbt =  14.722983121871952
P99 tbt =  29.209009356498726
All GPUs and memories are cold after  31.058574676513672
92886
288  -  96
288  -  256
288  -  1024
1053  -  96
1053  -  256
1053  -  600
8170  -  5
8170  -  256
8170  -  600
8170 - 5
Average ttft =  14.898976196725684
P50 ttft =  14.653162956237793
P99 ttft =  28.698010416030883
Average tbt =  16.849098265601924
P50 tbt =  16.82198548316956
P99 tbt =  33.25991248464585
All GPUs and memories are cold after  33.08184266090393
92886
288  -  96
288  -  256
288  -  1024
1053  -  96
1053  -  256
1053  -  600
8170  -  5
8170  -  256
8170  -  600
8170 - 5
Average ttft =  0.5806034207344055
P50 ttft =  0.4738748073577881
P99 ttft =  1.1139018869400026
Average tbt =  0.15552484393119817
P50 tbt =  0.020467746257781985
P99 tbt =  0.6346551132202152
All GPUs and memories are cold after  32.05880880355835
92886
288  -  96
288  -  256
288  -  1024
1053  -  96
1053  -  256
1053  -  600
8170  -  5
8170  -  256
8170  -  600
8170 - 5
Average ttft =  0.7099099386306036
P50 ttft =  0.6069660186767578
P99 ttft =  1.487411642074585
Average tbt =  0.3908513818468367
P50 tbt =  0.3441334009170533
P99 tbt =  1.2526388740539556
All GPUs and memories are cold after  42.07590985298157
92886
288  -  96
288  -  256
288  -  1024
1053  -  96
1053  -  256
1053  -  600
8170  -  5
8170  -  256
8170  -  600
8170 - 5
Average ttft =  1.048140573501587
P50 ttft =  0.8363847732543945
P99 ttft =  2.2609828519821153
Average tbt =  0.9878769002641954
P50 tbt =  0.6699353456497195
P99 tbt =  3.8110121078491206
All GPUs and memories are cold after  56.11322283744812
92886
288  -  96
288  -  256
288  -  1024
1053  -  96
1053  -  256
1053  -  600
8170  -  5
8170  -  256
8170  -  600
8170 - 5
Average ttft =  1.3631215269972639
P50 ttft =  1.1003496646881104
P99 ttft =  3.2562735080719
Average tbt =  2.415040567444593
P50 tbt =  2.2972812652587895
P99 tbt =  5.825597548484804
All GPUs and memories are cold after  63.10582709312439
92886
288  -  96
288  -  256
288  -  1024
1053  -  96
1053  -  256
1053  -  600
8170  -  5
8170  -  256
8170  -  600
8170 - 5
Average ttft =  2.9081860542297364
P50 ttft =  2.8809804916381836
P99 ttft =  5.134545903205871
Average tbt =  7.996061216354369
P50 tbt =  7.989631962776186
P99 tbt =  15.782534518480304
All GPUs and memories are cold after  68.11416697502136
92886
288  -  96
288  -  256
288  -  1024
1053  -  96
1053  -  256
1053  -  600
8170  -  5
8170  -  256
8170  -  600
8170 - 5
Average ttft =  5.432557161897421
P50 ttft =  5.869601011276245
P99 ttft =  10.380073151588439
Average tbt =  10.289361194521184
P50 tbt =  10.26706328392029
P99 tbt =  20.374068925857546
All GPUs and memories are cold after  72.1
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
GPU clocks set to "(gpuClkMin 1980, gpuClkMax 1980)" for GPU 00000001:00:00.0

Warning: persistence mode is disabled on device 00000001:00:00.0. See the Known Issues section of the nvidia-smi(1) man page for more information. Run with [--help | -h] switch to get more information on how to enable persistence mode.
GPU clocks set to "(gpuClkMin 1980, gpuClkMax 1980)" for GPU 00000002:00:00.0

Warning: persistence mode is disabled on device 00000002:00:00.0. See the Known Issues section of the nvidia-smi(1) man page for more information. Run with [--help | -h] switch to get more information on how to enable persistence mode.
GPU clocks set to "(gpuClkMin 1980, gpuClkMax 1980)" for GPU 00000003:00:00.0

Warning: persistence mode is disabled on device 00000003:00:00.0. See the Known Issues section of the nvidia-smi(1) man page for more information. Run with [--help | -h] switch to get more information on how to enable persistence mode.
GPU clocks set to "(gpuClkMin 1980, gpuClkMax 1980)" for GPU 00000008:00:00.0

Warning: persistence mode is disabled on device 00000008:00:00.0. See the Known Issues section of the nvidia-smi(1) man page for more information. Run with [--help | -h] switch to get more information on how to enable persistence mode.
GPU clocks set to "(gpuClkMin 1980, gpuClkMax 1980)" for GPU 00000009:00:00.0

Warning: persistence mode is disabled on device 00000009:00:00.0. See the Known Issues section of the nvidia-smi(1) man page for more information. Run with [--help | -h] switch to get more information on how to enable persistence mode.
GPU clocks set to "(gpuClkMin 1980, gpuClkMax 1980)" for GPU 0000000A:00:00.0

Warning: persistence mode is disabled on device 0000000A:00:00.0. See the Known Issues section of the nvidia-smi(1) man page for more information. Run with [--help | -h] switch to get more information on how to enable persistence mode.
GPU clocks set to "(gpuClkMin 1980, gpuClkMax 1980)" for GPU 0000000B:00:00.0

Warning: persistence mode is disabled on device 0000000B:00:00.0. See the Known Issues section of the nvidia-smi(1) man page for more information. Run with [--help | -h] switch to get more information on how to enable persistence mode.
GPU clocks set to "(gpuClkMin 1980, gpuClkMax 1980)" for GPU 0000000C:00:00.0

Warning: persistence mode is disabled on device 0000000C:00:00.0. See the Known Issues section of the nvidia-smi(1) man page for more information. Run with [--help | -h] switch to get more information on how to enable persistence mode.
All done.
All done.
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
GPU clocks set to "(gpuClkMin 800, gpuClkMax 800)" for GPU 00000001:00:00.0

Warning: persistence mode is disabled on device 00000001:00:00.0. See the Known Issues section of the nvidia-smi(1) man page for more information. Run with [--help | -h] switch to get more information on how to enable persistence mode.
GPU clocks set to "(gpuClkMin 800, gpuClkMax 800)" for GPU 00000002:00:00.0

Warning: persistence mode is disabled on device 00000002:00:00.0. See the Known Issues section of the nvidia-smi(1) man page for more information. Run with [--help | -h] switch to get more information on how to enable persistence mode.
GPU clocks set to "(gpuClkMin 800, gpuClkMax 800)" for GPU 00000003:00:00.0

Warning: persistence mode is disabled on device 00000003:00:00.0. See the Known Issues section of the nvidia-smi(1) man page for more information. Run with [--help | -h] switch to get more information on how to enable persistence mode.
GPU clocks set to "(gpuClkMin 800, gpuClkMax 800)" for GPU 00000008:00:00.0

Warning: persistence mode is disabled on device 00000008:00:00.0. See the Known Issues section of the nvidia-smi(1) man page for more information. Run with [--help | -h] switch to get more information on how to enable persistence mode.
GPU clocks set to "(gpuClkMin 800, gpuClkMax 800)" for GPU 00000009:00:00.0

Warning: persistence mode is disabled on device 00000009:00:00.0. See the Known Issues section of the nvidia-smi(1) man page for more information. Run with [--help | -h] switch to get more information on how to enable persistence mode.
GPU clocks set to "(gpuClkMin 800, gpuClkMax 800)" for GPU 0000000A:00:00.0

Warning: persistence mode is disabled on device 0000000A:00:00.0. See the Known Issues section of the nvidia-smi(1) man page for more information. Run with [--help | -h] switch to get more information on how to enable persistence mode.
GPU clocks set to "(gpuClkMin 800, gpuClkMax 800)" for GPU 0000000B:00:00.0

Warning: persistence mode is disabled on device 0000000B:00:00.0. See the Known Issues section of the nvidia-smi(1) man page for more information. Run with [--help | -h] switch to get more information on how to enable persistence mode.
GPU clocks set to "(gpuClkMin 800, gpuClkMax 800)" for GPU 0000000C:00:00.0

Warning: persistence mode is disabled on device 0000000C:00:00.0. See the Known Issues section of the nvidia-smi(1) man page for more information. Run with [--help | -h] switch to get more information on how to enable persistence mode.
All done.
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
GPU clocks set to "(gpuClkMin 1200, gpuClkMax 1200)" for GPU 00000001:00:00.0

Warning: persistence mode is disabled on device 00000001:00:00.0. See the Known Issues section of the nvidia-smi(1) man page for more information. Run with [--help | -h] switch to get more information on how to enable persistence mode.
GPU clocks set to "(gpuClkMin 1200, gpuClkMax 1200)" for GPU 00000002:00:00.0

Warning: persistence mode is disabled on device 00000002:00:00.0. See the Known Issues section of the nvidia-smi(1) man page for more information. Run with [--help | -h] switch to get more information on how to enable persistence mode.
GPU clocks set to "(gpuClkMin 1200, gpuClkMax 1200)" for GPU 00000003:00:00.0

Warning: persistence mode is disabled on device 00000003:00:00.0. See the Known Issues section of the nvidia-smi(1) man page for more information. Run with [--help | -h] switch to get more information on how to enable persistence mode.
GPU clocks set to "(gpuClkMin 1200, gpuClkMax 1200)" for GPU 00000008:00:00.0

Warning: persistence mode is disabled on device 00000008:00:00.0. See the Known Issues section of the nvidia-smi(1) man page for more information. Run with [--help | -h] switch to get more information on how to enable persistence mode.
GPU clocks set to "(gpuClkMin 1200, gpuClkMax 1200)" for GPU 00000009:00:00.0

Warning: persistence mode is disabled on device 00000009:00:00.0. See the Known Issues section of the nvidia-smi(1) man page for more information. Run with [--help | -h] switch to get more information on how to enable persistence mode.
GPU clocks set to "(gpuClkMin 1200, gpuClkMax 1200)" for GPU 0000000A:00:00.0

Warning: persistence mode is disabled on device 0000000A:00:00.0. See the Known Issues section of the nvidia-smi(1) man page for more information. Run with [--help | -h] switch to get more information on how to enable persistence mode.
GPU clocks set to "(gpuClkMin 1200, gpuClkMax 1200)" for GPU 0000000B:00:00.0

Warning: persistence mode is disabled on device 0000000B:00:00.0. See the Known Issues section of the nvidia-smi(1) man page for more information. Run with [--help | -h] switch to get more information on how to enable persistence mode.
GPU clocks set to "(gpuClkMin 1200, gpuClkMax 1200)" for GPU 0000000C:00:00.0

Warning: persistence mode is disabled on device 0000000C:00:00.0. See the Known Issues section of the nvidia-smi(1) man page for more information. Run with [--help | -h] switch to get more information on how to enable persistence mode.
All done.
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
72.12426424026489
92886
288  -  96
288  -  256
288  -  1024
1053  -  96
1053  -  256
1053  -  600
8170  -  5
8170  -  256
8170  -  600
8170 - 5
Average ttft =  7.65628526635366
P50 ttft =  7.961462497711182
P99 ttft =  14.783538103103638
Average tbt =  11.761313095811296
P50 tbt =  11.736917662620549
P99 tbt =  23.260148169517525
All GPUs and memories are cold after  74.11718392372131
92886
288  -  96
288  -  256
288  -  1024
1053  -  96
1053  -  256
1053  -  600
8170  -  5
8170  -  256
8170  -  600
8170 - 5
Average ttft =  9.976863220513586
P50 ttft =  9.731972932815552
P99 ttft =  19.027587227821346
Average tbt =  13.448294464076854
P50 tbt =  13.421716213226322
P99 tbt =  26.56646044158936
All GPUs and memories are cold after  76.13772249221802
92886
288  -  96
288  -  256
288  -  1024
1053  -  96
1053  -  256
1053  -  600
8170  -  5
8170  -  256
8170  -  600
8170 - 5
Average ttft =  0.5461323459943136
P50 ttft =  0.44780755043029785
P99 ttft =  1.0404448342323305
Average tbt =  0.14800097743670151
P50 tbt =  0.019137001037597655
P99 tbt =  0.602126130819321
All GPUs and memories are cold after  62.20918416976929
92886
288  -  96
288  -  256
288  -  1024
1053  -  96
1053  -  256
1053  -  600
8170  -  5
8170  -  256
8170  -  600
8170 - 5
Average ttft =  0.6577123914446149
P50 ttft =  0.5194463729858398
P99 ttft =  1.3729460239410403
Average tbt =  0.355010341462635
P50 tbt =  0.32712445259094247
P99 tbt =  1.1928163766860966
All GPUs and memories are cold after  79.12298774719238
92886
288  -  96
288  -  256
288  -  1024
1053  -  96
1053  -  256
1053  -  600
8170  -  5
8170  -  256
8170  -  600
8170 - 5
Average ttft =  0.9552283832005092
P50 ttft =  0.7916269302368164
P99 ttft =  1.978054947853087
Average tbt =  0.9373453535352437
P50 tbt =  0.6352513790130616
P99 tbt =  3.6178555655479427
All GPUs and memories are cold after  86.14342427253723
92886
288  -  96
288  -  256
288  -  1024
1053  -  96
1053  -  256
1053  -  600
8170  -  5
8170  -  256
8170  -  600
8170 - 5
Average ttft =  1.1504706289710067
P50 ttft =  0.8854672908782959
P99 ttft =  2.8429336547851567
Average tbt =  1.8248011594865383
P50 tbt =  1.6982678890228273
P99 tbt =  5.146566648483278
All GPUs and memories are cold after  85.1427218914032
92886
288  -  96
288  -  256
288  -  1024
1053  -  96
1053  -  256
1053  -  600
8170  -  5
8170  -  256
8170  -  600
8170 - 5
Average ttft =  2.41979350566864
P50 ttft =  2.520090341567993
P99 ttft =  4.282202658653259
Average tbt =  7.6114223971366926
P50 tbt =  7.59765387773514
P99 tbt =  15.082071592807774
All GPUs and memories are cold after  97.16065502166748
92886
288  -  96
288  -  256
288  -  1024
1053  -  96
1053  -  256
1053  -  600
8170  -  5
8170  -  256
8170  -  600
8170 - 5
Average ttft =  4.650351241230965
P50 ttft =  5.074915289878845
P99 ttft =  8.931202642917633
Average tbt =  9.80494931451976
P50 tbt =  9.757636535167697
P99 tbt =  19.367487744092948
All GPUs and memories are cold after  99.15201306343079
92886
288  -  96
288  -  256
288  -  1024
1053  -  96
1053  -  256
1053  -  600
8170  -  5
8170  -  256
8170  -  600
8170 - 5
Average ttft =  6.789819808855449
P50 ttft =  7.119378328323364
P99 ttft =  13.187811222076416
Average tbt =  11.22867522076385
P50 tbt =  11.166554117202761
P99 tbt =  22.152746355056767
All GPUs and memories are cold after  129.21574473381042
92886
288  -  96
288  -  256
288  -  1024
1053  -  96
1053  -  256
1053  -  600
8170  -  5
8170  -  256
8170  -  600
8170 - 5
Average ttft =  8.94313437680164
P50 ttft =  8.726092338562012
P99 ttft =  17.137001399993895
Average tbt =  12.80624394359359
P50 tbt =  12.752905344963077
P99 tbt =  25.239237537860873
All GPUs and memories are cold after  116.20280933380127
92886
288  -  96
288  -  256
288  -  1024
1053  -  96
1053  -  256
1053  -  600
8170  -  5
8170  -  256
8170  -  600
8170 - 5
Average ttft =  1.1328892509142559
P50 ttft =  0.8683412075042725
P99 ttft =  2.2019528698921205
Average tbt =  0.5437449375788371
P50 tbt =  0.6199451208114626
P99 tbt =  1.7351550815105448
All GPUs and memories are cold after  1.046881914138794
92886
288  -  96
288  -  256
288  -  1024
1053  -  96
1053  -  256
1053  -  600
8170  -  5
8170  -  256
8170  -  600
8170 - 5
Average ttft =  1.6029504594348727
P50 ttft =  1.501896619796753
P99 ttft =  3.2798263549804694
Average tbt =  1.0976803677422662
P50 tbt =  0.6271430730819704
P99 tbt =  3.4461487579345715
All GPUs and memories are cold after  9.024489402770996
92886
288  -  96
288  -  256
288  -  1024
1053  -  96
1053  -  256
1053  -  600
8170  -  5
8170  -  256
8170  -  600
8170 - 5
Average ttft =  5.143965148925782
P50 ttft =  4.670084476470947
P99 ttft =  10.480300216674802
Average tbt =  10.07071492467608
P50 tbt =  10.076262235641483
P99 tbt =  19.873124314785006
All GPUs and memories are cold after  13.028811931610107
92886
288  -  96
288  -  256
288  -  1024
1053  -  96
1053  -  256
1053  -  600
8170  -  5
8170  -  256
8170  -  600
8170 - 5
Average ttft =  8.774045234773217
P50 ttft =  8.215699911117554
P99 ttft =  15.463761806488037
Average tbt =  11.886699114194732
P50 tbt =  11.848542666435245
P99 tbt =  23.477869486808782
All GPUs and memories are cold after  15.087594032287598
92886
288  -  96
288  -  256
288  -  1024
1053  -  96
1053  -  256
1053  -  600
8170  -  5
8170  -  256
8170  -  600
8170 - 5
Average ttft =  12.529647793769836
P50 ttft =  12.09360682964325
P99 ttft =  23.75369563102722
Average tbt =  14.495994144916535
P50 tbt =  14.486993563175206
P99 tbt =  28.621028536796576
All GPUs and memories are cold after  18.04016089439392
92886
288  -  96
288  -  256
288  -  1024
1053  -  96
1053  -  256
1053  -  600
8170  -  5
8170  -  256
8170  -  600
8170 - 5
Average ttft =  17.69232589751482
P50 ttft =  18.31268835067749
P99 ttft =  34.263877675533294
Average tbt =  18.628466182202096
P50 tbt =  18.614384472370155
P99 tbt =  36.86351949620248
All GPUs and memories are cold after  19.049339056015015
92886
288  -  96
288  -  256
288  -  1024
1053  -  96
1053  -  256
1053  -  600
8170  -  5
8170  -  256
8170  -  600
8170 - 5
Average ttft =  21.691252212001853
P50 ttft =  21.831607341766357
P99 ttft =  42.15102486610413
Average tbt =  21.336955250126042
P50 tbt =  21.311828589439397
P99 tbt =  42.202500370979315
All GPUs and memories are cold after  22.042132139205933
92886
288  -  96
288  -  256
288  -  1024
1053  -  96
1053  -  256
1053  -  600
8170  -  5
8170  -  256
8170  -  600
8170 - 5
Average ttft =  25.820269395069904
P50 ttft =  25.55871295928955
P99 ttft =  50.03374200344085
Average tbt =  24.34281697732856
P50 tbt =  24.32729406356812
P99 tbt =  48.070785931587224
All GPUs and memories are cold after  20.03896188735962
92886
288  -  96
288  -  256
288  -  1024
1053  -  96
1053  -  256
1053  -  600
8170  -  5
8170  -  256
8170  -  600
8170 - 5
Average ttft =  0.7458096345265707
P50 ttft =  0.591633677482605
P99 ttft =  1.4480286979675294
Average tbt =  0.2255924244721731
P50 tbt =  0.02327929735183716
P99 tbt =  0.7904116063117984
All GPUs and memories are cold after  4.015582799911499
92886
288  -  96
288  -  256
288  -  1024
1053  -  96
1053  -  256
1053  -  600
8170  -  5
8170  -  256
8170  -  600
8170 - 5
Average ttft =  0.9658468223753429
P50 ttft =  0.8669524192810059
P99 ttft =  2.043401384353638
Average tbt =  0.5258302257174539
P50 tbt =  0.42924222946167
P99 tbt =  1.566368188858033
All GPUs and memories are cold after  11.037595272064209
92886
288  -  96
288  -  256
288  -  1024
1053  -  96
1053  -  256
1053  -  600
8170  -  5
8170  -  256
8170  -  600
8170 - 5
Average ttft =  1.5448218618120466
P50 ttft =  1.3513140678405762
P99 ttft =  3.7383999300003032
Average tbt =  1.6493749931880408
P50 tbt =  1.2486215591430667
P99 tbt =  4.771219321727752
All GPUs and memories are cold after  20.07015633583069
92886
288  -  96
288  -  256
288  -  1024
1053  -  96
1053  -  256
1053  -  600
8170  -  5
8170  -  256
8170  -  600
8170 - 5
Average ttft =  3.3377870234047493
P50 ttft =  2.998831033706665
P99 ttft =  6.235821390151978
Average tbt =  8.187639269014685
P50 tbt =  8.177973103523257
P99 tbt =  16.230018863677984
All GPUs and memories are cold after  23.046549320220947
92886
288 huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
GPU clocks set to "(gpuClkMin 1600, gpuClkMax 1600)" for GPU 00000001:00:00.0

Warning: persistence mode is disabled on device 00000001:00:00.0. See the Known Issues section of the nvidia-smi(1) man page for more information. Run with [--help | -h] switch to get more information on how to enable persistence mode.
GPU clocks set to "(gpuClkMin 1600, gpuClkMax 1600)" for GPU 00000002:00:00.0

Warning: persistence mode is disabled on device 00000002:00:00.0. See the Known Issues section of the nvidia-smi(1) man page for more information. Run with [--help | -h] switch to get more information on how to enable persistence mode.
GPU clocks set to "(gpuClkMin 1600, gpuClkMax 1600)" for GPU 00000003:00:00.0

Warning: persistence mode is disabled on device 00000003:00:00.0. See the Known Issues section of the nvidia-smi(1) man page for more information. Run with [--help | -h] switch to get more information on how to enable persistence mode.
GPU clocks set to "(gpuClkMin 1600, gpuClkMax 1600)" for GPU 00000008:00:00.0

Warning: persistence mode is disabled on device 00000008:00:00.0. See the Known Issues section of the nvidia-smi(1) man page for more information. Run with [--help | -h] switch to get more information on how to enable persistence mode.
GPU clocks set to "(gpuClkMin 1600, gpuClkMax 1600)" for GPU 00000009:00:00.0

Warning: persistence mode is disabled on device 00000009:00:00.0. See the Known Issues section of the nvidia-smi(1) man page for more information. Run with [--help | -h] switch to get more information on how to enable persistence mode.
GPU clocks set to "(gpuClkMin 1600, gpuClkMax 1600)" for GPU 0000000A:00:00.0

Warning: persistence mode is disabled on device 0000000A:00:00.0. See the Known Issues section of the nvidia-smi(1) man page for more information. Run with [--help | -h] switch to get more information on how to enable persistence mode.
GPU clocks set to "(gpuClkMin 1600, gpuClkMax 1600)" for GPU 0000000B:00:00.0

Warning: persistence mode is disabled on device 0000000B:00:00.0. See the Known Issues section of the nvidia-smi(1) man page for more information. Run with [--help | -h] switch to get more information on how to enable persistence mode.
GPU clocks set to "(gpuClkMin 1600, gpuClkMax 1600)" for GPU 0000000C:00:00.0

Warning: persistence mode is disabled on device 0000000C:00:00.0. See the Known Issues section of the nvidia-smi(1) man page for more information. Run with [--help | -h] switch to get more information on how to enable persistence mode.
All done.
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
GPU clocks set to "(gpuClkMin 1980, gpuClkMax 1980)" for GPU 00000001:00:00.0

Warning: persistence mode is disabled on device 00000001:00:00.0. See the Known Issues section of the nvidia-smi(1) man page for more information. Run with [--help | -h] switch to get more information on how to enable persistence mode.
GPU clocks set to "(gpuClkMin 1980, gpuClkMax 1980)" for GPU 00000002:00:00.0

Warning: persistence mode is disabled on device 00000002:00:00.0. See the Known Issues section of the nvidia-smi(1) man page for more information. Run with [--help | -h] switch to get more information on how to enable persistence mode.
GPU clocks set to "(gpuClkMin 1980, gpuClkMax 1980)" for GPU 00000003:00:00.0

Warning: persistence mode is disabled on device 00000003:00:00.0. See the Known Issues section of the nvidia-smi(1) man page for more information. Run with [--help | -h] switch to get more information on how to enable persistence mode.
GPU clocks set to "(gpuClkMin 1980, gpuClkMax 1980)" for GPU 00000008:00:00.0

Warning: persistence mode is disabled on device 00000008:00:00.0. See the Known Issues section of the nvidia-smi(1) man page for more information. Run with [--help | -h] switch to get more information on how to enable persistence mode.
GPU clocks set to "(gpuClkMin 1980, gpuClkMax 1980)" for GPU 00000009:00:00.0

Warning: persistence mode is disabled on device 00000009:00:00.0. See the Known Issues section of the nvidia-smi(1) man page for more information. Run with [--help | -h] switch to get more information on how to enable persistence mode.
GPU clocks set to "(gpuClkMin 1980, gpuClkMax 1980)" for GPU 0000000A:00:00.0

Warning: persistence mode is disabled on device 0000000A:00:00.0. See the Known Issues section of the nvidia-smi(1) man page for more information. Run with [--help | -h] switch to get more information on how to enable persistence mode.
GPU clocks set to "(gpuClkMin 1980, gpuClkMax 1980)" for GPU 0000000B:00:00.0

Warning: persistence mode is disabled on device 0000000B:00:00.0. See the Known Issues section of the nvidia-smi(1) man page for more information. Run with [--help | -h] switch to get more information on how to enable persistence mode.
GPU clocks set to "(gpuClkMin 1980, gpuClkMax 1980)" for GPU 0000000C:00:00.0

Warning: persistence mode is disabled on device 0000000C:00:00.0. See the Known Issues section of the nvidia-smi(1) man page for more information. Run with [--help | -h] switch to get more information on how to enable persistence mode.
All done.
All done.
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
GPU clocks set to "(gpuClkMin 800, gpuClkMax 800)" for GPU 00000001:00:00.0

Warning: persistence mode is disabled on device 00000001:00:00.0. See the Known Issues section of the nvidia-smi(1) man page for more information. Run with [--help | -h] switch to get more information on how to enable persistence mode.
GPU clocks set to "(gpuClkMin 800, gpuClkMax 800)" for GPU 00000002:00:00.0

Warning: persistence mode is disabled on device 00000002:00:00.0. See the Known Issues section of the nvidia-smi(1) man page for more information. Run with [--help | -h] switch to get more information on how to enable persistence mode.
GPU clocks set to "(gpuClkMin 800, gpuClkMax 800)" for GPU 00000003:00:00.0

Warning: persistence mode is disabled on device 00000003:00:00.0. See the Known Issues section of the nvidia-smi(1) man page for more information. Run with [--help | -h] switch to get more information on how to enable persistence mode.
GPU clocks set to "(gpuClkMin 800, gpuClkMax 800)" for GPU 00000008:00:00.0

Warning: persistence mode is disabled on device 00000008:00:00.0. See the Known Issues section of the nvidia-smi(1) man page for more information. Run with [--help | -h] switch to get more information on how to enable persistence mode.
GPU clocks set to "(gpuClkMin 800, gpuClkMax 800)" for GPU 00000009:00:00.0

Warning: persistence mode is disabled on device 00000009:00:00.0. See the Known Issues section of the nvidia-smi(1) man page for more information. Run with [--help | -h] switch to get more information on how to enable persistence mode.
GPU clocks set to "(gpuClkMin 800, gpuClkMax 800)" for GPU 0000000A:00:00.0

Warning: persistence mode is disabled on device 0000000A:00:00.0. See the Known Issues section of the nvidia-smi(1) man page for more information. Run with [--help | -h] switch to get more information on how to enable persistence mode.
GPU clocks set to "(gpuClkMin 800, gpuClkMax 800)" for GPU 0000000B:00:00.0

Warning: persistence mode is disabled on device 0000000B:00:00.0. See the Known Issues section of the nvidia-smi(1) man page for more information. Run with [--help | -h] switch to get more information on how to enable persistence mode.
GPU clocks set to "(gpuClkMin 800, gpuClkMax 800)" for GPU 0000000C:00:00.0

Warning: persistence mode is disabled on device 0000000C:00:00.0. See the Known Issues section of the nvidia-smi(1) man page for more information. Run with [--help | -h] switch to get more information on how to enable persistence mode.
All done.
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
 -  96
288  -  256
288  -  1024
1053  -  96
1053  -  256
1053  -  600
8170  -  5
8170  -  256
8170  -  600
8170 - 5
Average ttft =  5.962553272247314
P50 ttft =  5.464605927467346
P99 ttft =  11.042067551612854
Average tbt =  10.075769394874573
P50 tbt =  10.117878139019016
P99 tbt =  19.877450129270557
All GPUs and memories are cold after  26.04703140258789
92886
288  -  96
288  -  256
288  -  1024
1053  -  96
1053  -  256
1053  -  600
8170  -  5
8170  -  256
8170  -  600
8170 - 5
Average ttft =  9.126421969383955
P50 ttft =  9.735241293907166
P99 ttft =  17.679254765510557
Average tbt =  12.874269320443274
P50 tbt =  12.870675277709964
P99 tbt =  25.41453617715836
All GPUs and memories are cold after  30.06803250312805
92886
288  -  96
288  -  256
288  -  1024
1053  -  96
1053  -  256
1053  -  600
8170  -  5
8170  -  256
8170  -  600
8170 - 5
Average ttft =  12.058052206692631
P50 ttft =  12.45660948753357
P99 ttft =  23.344397945404054
Average tbt =  14.746247540761347
P50 tbt =  14.721755051612858
P99 tbt =  29.190062061309824
All GPUs and memories are cold after  32.061097621917725
92886
288  -  96
288  -  256
288  -  1024
1053  -  96
1053  -  256
1053  -  600
8170  -  5
8170  -  256
8170  -  600
8170 - 5
Average ttft =  14.91758198048695
P50 ttft =  14.651443481445312
P99 ttft =  28.700545873641964
Average tbt =  16.83898883239333
P50 tbt =  16.81894421577454
P99 tbt =  33.26273381137849
All GPUs and memories are cold after  33.11512470245361
92886
288  -  96
288  -  256
288  -  1024
1053  -  96
1053  -  256
1053  -  600
8170  -  5
8170  -  256
8170  -  600
8170 - 5
Average ttft =  0.5817038218180338
P50 ttft =  0.47502994537353516
P99 ttft =  1.1147744655609133
Average tbt =  0.15528699755668646
P50 tbt =  0.020011198520660398
P99 tbt =  0.6349208917617801
All GPUs and memories are cold after  31.060751914978027
92886
288  -  96
288  -  256
288  -  1024
1053  -  96
1053  -  256
1053  -  600
8170  -  5
8170  -  256
8170  -  600
8170 - 5
Average ttft =  0.7111181531633649
P50 ttft =  0.6142394542694092
P99 ttft =  1.4891967773437502
Average tbt =  0.3915987048830306
P50 tbt =  0.3452656507492066
P99 tbt =  1.2533859348297123
All GPUs and memories are cold after  40.071194887161255
92886
288  -  96
288  -  256
288  -  1024
1053  -  96
1053  -  256
1053  -  600
8170  -  5
8170  -  256
8170  -  600
8170 - 5
Average ttft =  1.053516456059047
P50 ttft =  0.8377029895782471
P99 ttft =  2.2532429027557357
Average tbt =  1.0735545097078598
P50 tbt =  0.6730317354202272
P99 tbt =  3.8172404766082764
All GPUs and memories are cold after  56.09646224975586
92886
288  -  96
288  -  256
288  -  1024
1053  -  96
1053  -  256
1053  -  600
8170  -  5
8170  -  256
8170  -  600
8170 - 5
Average ttft =  1.321421140577735
P50 ttft =  1.1009330749511719
P99 ttft =  3.1320484638214117
Average tbt =  2.408081518149958
P50 tbt =  2.302102899551392
P99 tbt =  5.7381131696701075
All GPUs and memories are cold after  56.135481119155884
92886
288  -  96
288  -  256
288  -  1024
1053  -  96
1053  -  256
1053  -  600
8170  -  5
8170  -  256
8170  -  600
8170 - 5
Average ttft =  2.985011034011841
P50 ttft =  2.887200713157654
P99 ttft =  5.2504288291931145
Average tbt =  8.020414000511174
P50 tbt =  7.987855243682863
P99 tbt =  15.856799500942234
All GPUs and memories are cold after  63.09596395492554
92886
288  -  96
288  -  256
288  -  1024
1053  -  96
1053  -  256
1053  -  600
8170  -  5
8170  -  256
8170  -  600
8170 - 5
Average ttft =  5.390954937785864
P50 ttft =  5.79792582988739
P99 ttft =  10.392157440185546
Average tbt =  10.327693362161519
P50 tbt =  10.325530588626863
P99 tbt =  20.379813013553626
All GPUs and memories are cold after  69.12094235420227
92886
288  -  96
288  -  256
288  -  1024
1053  -  96
1053  -  256
1053  -  600
8170  -  5
8170  -  256
8170  -  600
8170 - 5
Average ttft =  7.672261189108026
P50 ttft =  7.876096248626709
P99 ttft =  14.897186317443849
Average tbt =  11.82893346760371
P50 tbt =  11.878022933006289
P99 tbt =  23.336356740951544
All GPUs and memories are cold after  79.13144946098328
92886
288  -  96
288  -  256
288  -  1024
1053  -  96
1053  -  256
1053  -  600
8170  -  5
8170  -  256
8170  -  600
8170 - 5
Average ttft =  9.902453232960529
P50 ttft =  9.636761903762817
P99 ttft =  18.965774068832395
Average tbt =  13.444478463839333
P50 tbt =  13.4281590461731
P99 tbt =  26.51367531776429
All GPUs and memories are cold after  82.13496947288513
92886
288  -  96
288  -  256
288  -  1024
1053  -  96
1053  -  256
1053  -  600
8170  -  5
8170  -  256
8170  -  600
8170 - 5
Average ttft =  0.5651833415031433
P50 ttft =  0.44880855083465576
P99 ttft =  1.155666069984436
Average tbt =  0.15417166352272038
P50 tbt =  0.019440460205078124
P99 tbt =  0.6741531522274021
All GPUs and memories are cold after  82.14732623100281
92886
288  -  96
288  -  256
288  -  1024
1053  -  96
1053  -  256
1053  -  600
8170  -  5
8170  -  256
8170  -  600
8170 - 5
Average ttft =  0.6556210063752674
P50 ttft =  0.5121841430664062
P99 ttft =  1.3601244926452638
Average tbt =  0.3543972299212501
P50 tbt =  0.32640902996063237
P99 tbt =  1.1861847591400152
All GPUs and memories are cold after  87.15665435791016
92886
288  -  96
288  -  256
288  -  1024
1053  -  96
1053  -  256
1053  -  600
8170  -  5
8170  -  256
8170  -  600
8170 - 5
Average ttft =  0.9510524136679513
P50 ttft =  0.7909903526306152
P99 ttft =  1.9715263938903795
Average tbt =  0.9370972592490062
P50 tbt =  0.6329424619674684
P99 tbt =  3.6145794858932496
All GPUs and memories are cold after  141.2296326160431
92886
288  -  96
288  -  256
288  -  1024
1053  -  96
1053  -  256
1053  -  600
8170  -  5
8170  -  256
8170  -  600
8170 - 5
Average ttft =  1.1534571705794916
P50 ttft =  0.8847506046295166
P99 ttft =  2.859198236465455
Average tbt =  1.8967179845019084
P50 tbt =  1.7188613653182987
P99 tbt =  5.304887404441835
All GPUs and memories are cold after  123.19200277328491
92886
288  -  96
288  -  256
288  -  1024
1053  -  96
1053  -  256
1053  -  600
8170  -  5
8170  -  256
8170  -  600
8170 - 5
Average ttft =  2.423143916130066
P50 ttft =  2.5271719694137573
P99 ttft =  4.29526742696762
Average tbt =  7.623727654457094
P50 tbt =  7.607834148406985
P99 tbt =  15.093152840137485
All GPUs and memories are cold after  123.20266151428223
92886
288  -  96
288  -  256
288  -  1024
1053  -  96
1053  -  256
1053  -  600
8170  -  5
8170  -  256
8170  -  600
8170 - 5
Average ttft =  4.6638739965856075
P50 ttft =  5.085948467254639
P99 ttft =  8.951184222698211
Average tbt =  9.819416957348583
P50 tbt =  9.781001591682436
P99 tbt =  19.384316574335102
All GPUs and memories are cold after  133.20904397964478
92886
288  -  96
288  -  256
288  -  1024
1053  -  96
1053  -  256
1053  -  600
8170  -  5
8170  -  256
8170  -  600
8170 - 5
Average ttft =  6.830117415075433
P50 ttft =  7.157175779342651
P99 ttft =  13.24936450958252
Average tbt =  11.24884733337246
P50 tbt =  11.196567130088809
P99 tbt =  22.193086478233344
All GPUs and memories are cold after  131.20550918579102
92886
288  -  96
288  -  256
288  -  1024
1053  -  96
1053  -  256
1053  -  600
8170  -  5
8170  -  256
8170  -  600
8170 - 5
Average ttft =  8.953087085700897
P50 ttft =  8.737645626068115
P99 ttft =  17.146823792457578
Average tbt =  12.807884272609853
P50 tbt =  12.762384343147282
P99 tbt =  25.24708339738846
All GPUs and memories are cold after  146.24657893180847
92886
288  -  96
288  -  256
288  -  1024
1053  -  96
1053  -  256
1053  -  600
8170  -  5
8170  -  256
8170  -  600
8170 - 5
Average ttft =  1.1328998804092407
P50 ttft =  0.8733927011489868
P99 ttft =  2.1993320441246036
Average tbt =  0.54257523616155
P50 tbt =  0.618705713748932
P99 tbt =  1.7298000760078438
All GPUs and memories are cold after  1.0485215187072754
92886
288  -  96
288  -  256
288  -  1024
1053  -  96
1053  -  256
1053  -  600
8170  -  5
8170  -  256
8170  -  600
8170 - 5
Average ttft =  1.5986026582263766
P50 ttft =  1.5011253356933594
P99 ttft =  3.2864791870117194
Average tbt =  1.0985813458760585
P50 tbt =  0.6263314962387087
P99 tbt =  3.4495696258544934
All GPUs and memories are cold after  7.037396669387817
92886
288  -  96
288  -  256huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
GPU clocks set to "(gpuClkMin 1200, gpuClkMax 1200)" for GPU 00000001:00:00.0

Warning: persistence mode is disabled on device 00000001:00:00.0. See the Known Issues section of the nvidia-smi(1) man page for more information. Run with [--help | -h] switch to get more information on how to enable persistence mode.
GPU clocks set to "(gpuClkMin 1200, gpuClkMax 1200)" for GPU 00000002:00:00.0

Warning: persistence mode is disabled on device 00000002:00:00.0. See the Known Issues section of the nvidia-smi(1) man page for more information. Run with [--help | -h] switch to get more information on how to enable persistence mode.
GPU clocks set to "(gpuClkMin 1200, gpuClkMax 1200)" for GPU 00000003:00:00.0

Warning: persistence mode is disabled on device 00000003:00:00.0. See the Known Issues section of the nvidia-smi(1) man page for more information. Run with [--help | -h] switch to get more information on how to enable persistence mode.
GPU clocks set to "(gpuClkMin 1200, gpuClkMax 1200)" for GPU 00000008:00:00.0

Warning: persistence mode is disabled on device 00000008:00:00.0. See the Known Issues section of the nvidia-smi(1) man page for more information. Run with [--help | -h] switch to get more information on how to enable persistence mode.
GPU clocks set to "(gpuClkMin 1200, gpuClkMax 1200)" for GPU 00000009:00:00.0

Warning: persistence mode is disabled on device 00000009:00:00.0. See the Known Issues section of the nvidia-smi(1) man page for more information. Run with [--help | -h] switch to get more information on how to enable persistence mode.
GPU clocks set to "(gpuClkMin 1200, gpuClkMax 1200)" for GPU 0000000A:00:00.0

Warning: persistence mode is disabled on device 0000000A:00:00.0. See the Known Issues section of the nvidia-smi(1) man page for more information. Run with [--help | -h] switch to get more information on how to enable persistence mode.
GPU clocks set to "(gpuClkMin 1200, gpuClkMax 1200)" for GPU 0000000B:00:00.0

Warning: persistence mode is disabled on device 0000000B:00:00.0. See the Known Issues section of the nvidia-smi(1) man page for more information. Run with [--help | -h] switch to get more information on how to enable persistence mode.
GPU clocks set to "(gpuClkMin 1200, gpuClkMax 1200)" for GPU 0000000C:00:00.0

Warning: persistence mode is disabled on device 0000000C:00:00.0. See the Known Issues section of the nvidia-smi(1) man page for more information. Run with [--help | -h] switch to get more information on how to enable persistence mode.
All done.
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
GPU clocks set to "(gpuClkMin 1600, gpuClkMax 1600)" for GPU 00000001:00:00.0

Warning: persistence mode is disabled on device 00000001:00:00.0. See the Known Issues section of the nvidia-smi(1) man page for more information. Run with [--help | -h] switch to get more information on how to enable persistence mode.
GPU clocks set to "(gpuClkMin 1600, gpuClkMax 1600)" for GPU 00000002:00:00.0

Warning: persistence mode is disabled on device 00000002:00:00.0. See the Known Issues section of the nvidia-smi(1) man page for more information. Run with [--help | -h] switch to get more information on how to enable persistence mode.
GPU clocks set to "(gpuClkMin 1600, gpuClkMax 1600)" for GPU 00000003:00:00.0

Warning: persistence mode is disabled on device 00000003:00:00.0. See the Known Issues section of the nvidia-smi(1) man page for more information. Run with [--help | -h] switch to get more information on how to enable persistence mode.
GPU clocks set to "(gpuClkMin 1600, gpuClkMax 1600)" for GPU 00000008:00:00.0

Warning: persistence mode is disabled on device 00000008:00:00.0. See the Known Issues section of the nvidia-smi(1) man page for more information. Run with [--help | -h] switch to get more information on how to enable persistence mode.
GPU clocks set to "(gpuClkMin 1600, gpuClkMax 1600)" for GPU 00000009:00:00.0

Warning: persistence mode is disabled on device 00000009:00:00.0. See the Known Issues section of the nvidia-smi(1) man page for more information. Run with [--help | -h] switch to get more information on how to enable persistence mode.
GPU clocks set to "(gpuClkMin 1600, gpuClkMax 1600)" for GPU 0000000A:00:00.0

Warning: persistence mode is disabled on device 0000000A:00:00.0. See the Known Issues section of the nvidia-smi(1) man page for more information. Run with [--help | -h] switch to get more information on how to enable persistence mode.
GPU clocks set to "(gpuClkMin 1600, gpuClkMax 1600)" for GPU 0000000B:00:00.0

Warning: persistence mode is disabled on device 0000000B:00:00.0. See the Known Issues section of the nvidia-smi(1) man page for more information. Run with [--help | -h] switch to get more information on how to enable persistence mode.
GPU clocks set to "(gpuClkMin 1600, gpuClkMax 1600)" for GPU 0000000C:00:00.0

Warning: persistence mode is disabled on device 0000000C:00:00.0. See the Known Issues section of the nvidia-smi(1) man page for more information. Run with [--help | -h] switch to get more information on how to enable persistence mode.
All done.
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
GPU clocks set to "(gpuClkMin 1980, gpuClkMax 1980)" for GPU 00000001:00:00.0

Warning: persistence mode is disabled on device 00000001:00:00.0. See the Known Issues section of the nvidia-smi(1) man page for more information. Run with [--help | -h] switch to get more information on how to enable persistence mode.
GPU clocks set to "(gpuClkMin 1980, gpuClkMax 1980)" for GPU 00000002:00:00.0

Warning: persistence mode is disabled on device 00000002:00:00.0. See the Known Issues section of the nvidia-smi(1) man page for more information. Run with [--help | -h] switch to get more information on how to enable persistence mode.
GPU clocks set to "(gpuClkMin 1980, gpuClkMax 1980)" for GPU 00000003:00:00.0

Warning: persistence mode is disabled on device 00000003:00:00.0. See the Known Issues section of the nvidia-smi(1) man page for more information. Run with [--help | -h] switch to get more information on how to enable persistence mode.
GPU clocks set to "(gpuClkMin 1980, gpuClkMax 1980)" for GPU 00000008:00:00.0

Warning: persistence mode is disabled on device 00000008:00:00.0. See the Known Issues section of the nvidia-smi(1) man page for more information. Run with [--help | -h] switch to get more information on how to enable persistence mode.
GPU clocks set to "(gpuClkMin 1980, gpuClkMax 1980)" for GPU 00000009:00:00.0

Warning: persistence mode is disabled on device 00000009:00:00.0. See the Known Issues section of the nvidia-smi(1) man page for more information. Run with [--help | -h] switch to get more information on how to enable persistence mode.
GPU clocks set to "(gpuClkMin 1980, gpuClkMax 1980)" for GPU 0000000A:00:00.0

Warning: persistence mode is disabled on device 0000000A:00:00.0. See the Known Issues section of the nvidia-smi(1) man page for more information. Run with [--help | -h] switch to get more information on how to enable persistence mode.
GPU clocks set to "(gpuClkMin 1980, gpuClkMax 1980)" for GPU 0000000B:00:00.0

Warning: persistence mode is disabled on device 0000000B:00:00.0. See the Known Issues section of the nvidia-smi(1) man page for more information. Run with [--help | -h] switch to get more information on how to enable persistence mode.
GPU clocks set to "(gpuClkMin 1980, gpuClkMax 1980)" for GPU 0000000C:00:00.0

Warning: persistence mode is disabled on device 0000000C:00:00.0. See the Known Issues section of the nvidia-smi(1) man page for more information. Run with [--help | -h] switch to get more information on how to enable persistence mode.
All done.
All done.
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.

288  -  1024
1053  -  96
1053  -  256
1053  -  600
8170  -  5
8170  -  256
8170  -  600
8170 - 5
Average ttft =  5.132326010295323
P50 ttft =  4.66460657119751
P99 ttft =  10.46813377857208
Average tbt =  10.071061985833307
P50 tbt =  10.074358630180361
P99 tbt =  19.867491279602053
All GPUs and memories are cold after  15.053906202316284
92886
288  -  96
288  -  256
288  -  1024
1053  -  96
1053  -  256
1053  -  600
8170  -  5
8170  -  256
8170  -  600
8170 - 5
Average ttft =  8.753187185380517
P50 ttft =  8.198802471160889
P99 ttft =  15.434652328491211
Average tbt =  11.878878670785484
P50 tbt =  11.841996145248416
P99 tbt =  23.45706776142121
All GPUs and memories are cold after  16.030864477157593
92886
288  -  96
288  -  256
288  -  1024
1053  -  96
1053  -  256
1053  -  600
8170  -  5
8170  -  256
8170  -  600
8170 - 5
Average ttft =  12.520358686447144
P50 ttft =  12.08852481842041
P99 ttft =  23.734238052368163
Average tbt =  14.489271126270298
P50 tbt =  14.479944884777073
P99 tbt =  28.608632034063344
All GPUs and memories are cold after  18.039225339889526
92886
288  -  96
288  -  256
288  -  1024
1053  -  96
1053  -  256
1053  -  600
8170  -  5
8170  -  256
8170  -  600
8170 - 5
Average ttft =  17.60621400922537
P50 ttft =  18.21474778652191
P99 ttft =  34.19477960824966
Average tbt =  18.642761915177108
P50 tbt =  18.63807573318482
P99 tbt =  36.81798331141473
All GPUs and memories are cold after  19.067026376724243
92886
288  -  96
288  -  256
288  -  1024
1053  -  96
1053  -  256
1053  -  600
8170  -  5
8170  -  256
8170  -  600
8170 - 5
Average ttft =  21.662659729996772
P50 ttft =  21.808367013931274
P99 ttft =  42.12634575843811
Average tbt =  21.34225141283585
P50 tbt =  21.32638609409333
P99 tbt =  42.187750865936295
All GPUs and memories are cold after  20.0402410030365
92886
288  -  96
288  -  256
288  -  1024
1053  -  96
1053  -  256
1053  -  600
8170  -  5
8170  -  256
8170  -  600
8170 - 5
Average ttft =  25.870442697800787
P50 ttft =  25.62555456161499
P99 ttft =  50.11820909976959
Average tbt =  24.369129761443084
P50 tbt =  24.34219932556153
P99 tbt =  48.124446588039405
All GPUs and memories are cold after  21.0457124710083
92886
288  -  96
288  -  256
288  -  1024
1053  -  96
1053  -  256
1053  -  600
8170  -  5
8170  -  256
8170  -  600
8170 - 5
Average ttft =  0.7462186018625895
P50 ttft =  0.5920901298522949
P99 ttft =  1.4513595628738405
Average tbt =  0.22599131266276043
P50 tbt =  0.023232781887054445
P99 tbt =  0.7927824048995976
All GPUs and memories are cold after  3.0150930881500244
92886
288  -  96
288  -  256
288  -  1024
1053  -  96
1053  -  256
1053  -  600
8170  -  5
8170  -  256
8170  -  600
8170 - 5
Average ttft =  0.9709397951761881
P50 ttft =  0.8670380115509033
P99 ttft =  2.048234796524048
Average tbt =  0.5276449271610806
P50 tbt =  0.4311678171157838
P99 tbt =  1.5687939596176157
All GPUs and memories are cold after  14.135589838027954
92886
288  -  96
288  -  256
288  -  1024
1053  -  96
1053  -  256
1053  -  600
8170  -  5
8170  -  256
8170  -  600
8170 - 5
Average ttft =  1.5446656840188162
P50 ttft =  1.3512403964996338
P99 ttft =  3.747894325256346
Average tbt =  1.650886551312038
P50 tbt =  1.2572900295257572
P99 tbt =  4.777322337150573
All GPUs and memories are cold after  21.037065267562866
92886
288  -  96
288  -  256
288  -  1024
1053  -  96
1053  -  256
1053  -  600
8170  -  5
8170  -  256
8170  -  600
8170 - 5
Average ttft =  3.3068549923780486
P50 ttft =  2.891706943511963
P99 ttft =  6.272087526321412
Average tbt =  8.239319507087153
P50 tbt =  8.19054391384125
P99 tbt =  16.25500833034516
All GPUs and memories are cold after  27.054311752319336
92886
288  -  96
288  -  256
288  -  1024
1053  -  96
1053  -  256
1053  -  600
8170  -  5
8170  -  256
8170  -  600
8170 - 5
Average ttft =  5.907210211753846
P50 ttft =  5.468451380729675
P99 ttft =  10.91301515340805
Average tbt =  10.025857951641086
P50 tbt =  10.024031639099125
P99 tbt =  19.788027157545095
All GPUs and memories are cold after  27.05254888534546
92886
288  -  96
288  -  256
288  -  1024
1053  -  96
1053  -  256
1053  -  600
8170  -  5
8170  -  256
8170  -  600
8170 - 5
Average ttft =  9.39138262718916
P50 ttft =  10.06744647026062
P99 ttft =  18.043441998958585
Average tbt =  12.942080727964644
P50 tbt =  12.882484078407291
P99 tbt =  25.65639041423798
All GPUs and memories are cold after  31.12694525718689
92886
288  -  96
288  -  256
288  -  1024
1053  -  96
1053  -  256
1053  -  600
8170  -  5
8170  -  256
8170  -  600
8170 - 5
Average ttft =  12.030606939368052
P50 ttft =  12.476455926895142
P99 ttft =  23.34965626716614
Average tbt =  14.777367412880677
P50 tbt =  14.727530527114872
P99 tbt =  29.194443328857428
All GPUs and memories are cold after  33.0626175403595
92886
288  -  96
288  -  256
288  -  1024
1053  -  96
1053  -  256
1053  -  600
8170  -  5
8170  -  256
8170  -  600
8170 - 5
Average ttft =  14.929735789816064
P50 ttft =  14.698076486587524
P99 ttft =  28.759142484664913
Average tbt =  16.86859979169915
P50 tbt =  16.814543795585635
P99 tbt =  33.29827654600144
All GPUs and memories are cold after  33.07147693634033
92886
288  -  96
288  -  256
288  -  1024
1053  -  96
1053  -  256
1053  -  600
8170  -  5
8170  -  256
8170  -  600
8170 - 5
Average ttft =  0.5809538960456848
P50 ttft =  0.4749901294708252
P99 ttft =  1.11202529668808
Average tbt =  0.1558171967665355
P50 tbt =  0.02041875123977661
P99 tbt =  0.6344038872718815
All GPUs and memories are cold after  33.07471966743469
92886
288  -  96
288  -  256
288  -  1024
1053  -  96
1053  -  256
1053  -  600
8170  -  5
8170  -  256
8170  -  600
8170 - 5
Average ttft =  0.7120852356865293
P50 ttft =  0.6147556304931641
P99 ttft =  1.4902016639709474
Average tbt =  0.3915352809996834
P50 tbt =  0.34511084556579597
P99 tbt =  1.2546843624114996
All GPUs and memories are cold after  40.1314902305603
92886
288  -  96
288  -  256
288  -  1024
1053  -  96
1053  -  256
1053  -  600
8170  -  5
8170  -  256
8170  -  600
8170 - 5
Average ttft =  1.0488184997013636
P50 ttft =  0.8365952968597412
P99 ttft =  2.263865675926207
Average tbt =  0.98726856981005
P50 tbt =  0.6682294845581056
P99 tbt =  3.8147936339378354
All GPUs and memories are cold after  55.090174436569214
92886
288  -  96
288  -  256
288  -  1024
1053  -  96
1053  -  256
1053  -  600
8170  -  5
8170  -  256
8170  -  600
8170 - 5
Average ttft =  1.3195169669825857
P50 ttft =  1.0996615886688232
P99 ttft =  3.1312316417694097
Average tbt =  2.406844201320555
P50 tbt =  2.3003231525421146
P99 tbt =  5.737077112197878
All GPUs and memories are cold after  65.12708616256714
92886
288  -  96
288  -  256
288  -  1024
1053  -  96
1053  -  256
1053  -  600
8170  -  5
8170  -  256
8170  -  600
8170 - 5
Average ttft =  3.0135614490509033
P50 ttft =  3.003894567489624
P99 ttft =  5.2789698767662045
Average tbt =  8.025013115406034
P50 tbt =  7.998978471755984
P99 tbt =  15.88316539859772
All GPUs and memories are cold after  69.13461971282959
92886
288  -  96
288  -  256
288  -  1024
1053  -  96
1053  -  256
1053  -  600
8170  -  5
8170  -  256
8170  -  600
8170 - 5
Average ttft =  5.397857125848532
P50 ttft =  5.8733649253845215
P99 ttft =  10.387719457149505
Average tbt =  10.320182300359011
P50 tbt =  10.273513901233676
P99 tbt =  20.38222074556351
All GPUs and memories are cold after  76.13154625892639
92886
288  -  96
288  -  256
288  -  1024
1053  -  96
1053  -  256
1053  -  600
8170  -  5
8170  -  256
8170  -  600
8170 - 5
Average ttft =  7.72990832590077
P50 ttft =  7.97041392326355
P99 ttft =  15.009832286834717
Average tbt =  11.865272399497355
P50 tbt =  11.888567543029788
P99 tbt =  23.424667634963996
All GPUs and memories are cold after  78.20514440536499
92886
288  -  96
288  -  256
288  -  1024
1053  -  96
1053  -  256
1053  -  600
8170  -  5
8170  -  256
8170  -  600
8170 - 5
Average ttft =  9.953733033444507
P50 ttft =  9.756535053253174
P99 ttft =  19.043260803222655
Average tbt =  13.49786070513438
P50 tbt =  13.437312698364261
P99 tbt =  26.604783189296725
All GPUs and memories are cold after  80.13404750823975
92886
288  -  96
288  -  256
288  -  1024
1053  -  96
1053  -  huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
GPU clocks set to "(gpuClkMin 800, gpuClkMax 800)" for GPU 00000001:00:00.0

Warning: persistence mode is disabled on device 00000001:00:00.0. See the Known Issues section of the nvidia-smi(1) man page for more information. Run with [--help | -h] switch to get more information on how to enable persistence mode.
GPU clocks set to "(gpuClkMin 800, gpuClkMax 800)" for GPU 00000002:00:00.0

Warning: persistence mode is disabled on device 00000002:00:00.0. See the Known Issues section of the nvidia-smi(1) man page for more information. Run with [--help | -h] switch to get more information on how to enable persistence mode.
GPU clocks set to "(gpuClkMin 800, gpuClkMax 800)" for GPU 00000003:00:00.0

Warning: persistence mode is disabled on device 00000003:00:00.0. See the Known Issues section of the nvidia-smi(1) man page for more information. Run with [--help | -h] switch to get more information on how to enable persistence mode.
GPU clocks set to "(gpuClkMin 800, gpuClkMax 800)" for GPU 00000008:00:00.0

Warning: persistence mode is disabled on device 00000008:00:00.0. See the Known Issues section of the nvidia-smi(1) man page for more information. Run with [--help | -h] switch to get more information on how to enable persistence mode.
GPU clocks set to "(gpuClkMin 800, gpuClkMax 800)" for GPU 00000009:00:00.0

Warning: persistence mode is disabled on device 00000009:00:00.0. See the Known Issues section of the nvidia-smi(1) man page for more information. Run with [--help | -h] switch to get more information on how to enable persistence mode.
GPU clocks set to "(gpuClkMin 800, gpuClkMax 800)" for GPU 0000000A:00:00.0

Warning: persistence mode is disabled on device 0000000A:00:00.0. See the Known Issues section of the nvidia-smi(1) man page for more information. Run with [--help | -h] switch to get more information on how to enable persistence mode.
GPU clocks set to "(gpuClkMin 800, gpuClkMax 800)" for GPU 0000000B:00:00.0

Warning: persistence mode is disabled on device 0000000B:00:00.0. See the Known Issues section of the nvidia-smi(1) man page for more information. Run with [--help | -h] switch to get more information on how to enable persistence mode.
GPU clocks set to "(gpuClkMin 800, gpuClkMax 800)" for GPU 0000000C:00:00.0

Warning: persistence mode is disabled on device 0000000C:00:00.0. See the Known Issues section of the nvidia-smi(1) man page for more information. Run with [--help | -h] switch to get more information on how to enable persistence mode.
All done.
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
GPU clocks set to "(gpuClkMin 1200, gpuClkMax 1200)" for GPU 00000001:00:00.0

Warning: persistence mode is disabled on device 00000001:00:00.0. See the Known Issues section of the nvidia-smi(1) man page for more information. Run with [--help | -h] switch to get more information on how to enable persistence mode.
GPU clocks set to "(gpuClkMin 1200, gpuClkMax 1200)" for GPU 00000002:00:00.0

Warning: persistence mode is disabled on device 00000002:00:00.0. See the Known Issues section of the nvidia-smi(1) man page for more information. Run with [--help | -h] switch to get more information on how to enable persistence mode.
GPU clocks set to "(gpuClkMin 1200, gpuClkMax 1200)" for GPU 00000003:00:00.0

Warning: persistence mode is disabled on device 00000003:00:00.0. See the Known Issues section of the nvidia-smi(1) man page for more information. Run with [--help | -h] switch to get more information on how to enable persistence mode.
GPU clocks set to "(gpuClkMin 1200, gpuClkMax 1200)" for GPU 00000008:00:00.0

Warning: persistence mode is disabled on device 00000008:00:00.0. See the Known Issues section of the nvidia-smi(1) man page for more information. Run with [--help | -h] switch to get more information on how to enable persistence mode.
GPU clocks set to "(gpuClkMin 1200, gpuClkMax 1200)" for GPU 00000009:00:00.0

Warning: persistence mode is disabled on device 00000009:00:00.0. See the Known Issues section of the nvidia-smi(1) man page for more information. Run with [--help | -h] switch to get more information on how to enable persistence mode.
GPU clocks set to "(gpuClkMin 1200, gpuClkMax 1200)" for GPU 0000000A:00:00.0

Warning: persistence mode is disabled on device 0000000A:00:00.0. See the Known Issues section of the nvidia-smi(1) man page for more information. Run with [--help | -h] switch to get more information on how to enable persistence mode.
GPU clocks set to "(gpuClkMin 1200, gpuClkMax 1200)" for GPU 0000000B:00:00.0

Warning: persistence mode is disabled on device 0000000B:00:00.0. See the Known Issues section of the nvidia-smi(1) man page for more information. Run with [--help | -h] switch to get more information on how to enable persistence mode.
GPU clocks set to "(gpuClkMin 1200, gpuClkMax 1200)" for GPU 0000000C:00:00.0

Warning: persistence mode is disabled on device 0000000C:00:00.0. See the Known Issues section of the nvidia-smi(1) man page for more information. Run with [--help | -h] switch to get more information on how to enable persistence mode.
All done.
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
256
1053  -  600
8170  -  5
8170  -  256
8170  -  600
8170 - 5
Average ttft =  0.5463017423947653
P50 ttft =  0.4477452039718628
P99 ttft =  1.0393025732040406
Average tbt =  0.1482503294944764
P50 tbt =  0.019883561134338378
P99 tbt =  0.6018697760105136
All GPUs and memories are cold after  74.16045618057251
92886
288  -  96
288  -  256
288  -  1024
1053  -  96
1053  -  256
1053  -  600
8170  -  5
8170  -  256
8170  -  600
8170 - 5
Average ttft =  0.6579194750104632
P50 ttft =  0.5219447612762451
P99 ttft =  1.371272897720337
Average tbt =  0.3551489818663825
P50 tbt =  0.3271533966064454
P99 tbt =  1.19234359741211
All GPUs and memories are cold after  90.26257419586182
92886
288  -  96
288  -  256
288  -  1024
1053  -  96
1053  -  256
1053  -  600
8170  -  5
8170  -  256
8170  -  600
8170 - 5
Average ttft =  0.9547024931226458
P50 ttft =  0.787421464920044
P99 ttft =  1.9871265745162952
Average tbt =  0.9386846242632186
P50 tbt =  0.6328460216522218
P99 tbt =  3.627201166152954
All GPUs and memories are cold after  111.1717312335968
92886
288  -  96
288  -  256
288  -  1024
1053  -  96
1053  -  256
1053  -  600
8170  -  5
8170  -  256
8170  -  600
8170 - 5
Average ttft =  1.147651079224377
P50 ttft =  0.8831770420074463
P99 ttft =  2.8363740444183354
Average tbt =  1.8229371187163563
P50 tbt =  1.6981892824172977
P99 tbt =  5.140870451927187
All GPUs and memories are cold after  127.19597387313843
92886
288  -  96
288  -  256
288  -  1024
1053  -  96
1053  -  256
1053  -  600
8170  -  5
8170  -  256
8170  -  600
8170 - 5
Average ttft =  2.3825798416137696
P50 ttft =  2.4510854482650757
P99 ttft =  4.296357548236847
Average tbt =  7.651107896327975
P50 tbt =  7.614240550994875
P99 tbt =  15.096293273687365
All GPUs and memories are cold after  120.18404722213745
92886
288  -  96
288  -  256
288  -  1024
1053  -  96
1053  -  256
1053  -  600
8170  -  5
8170  -  256
8170  -  600
8170 - 5
Average ttft =  4.62813463434577
P50 ttft =  4.9371577501297
P99 ttft =  8.954815821647642
Average tbt =  9.850117361918093
P50 tbt =  9.895758461952212
P99 tbt =  19.397276709079748
All GPUs and memories are cold after  132.22735381126404
92886
288  -  96
288  -  256
288  -  1024
1053  -  96
1053  -  256
1053  -  600
8170  -  5
8170  -  256
8170  -  600
8170 - 5
Average ttft =  6.772109756731007
P50 ttft =  7.026822328567505
P99 ttft =  13.133384017944337
Average tbt =  11.206001429035242
P50 tbt =  11.198995447158817
P99 tbt =  22.115942568778998
All GPUs and memories are cold after  148.2370080947876
92886
288  -  96
288  -  256
288  -  1024
1053  -  96
1053  -  256
1053  -  600
8170  -  5
8170  -  256
8170  -  600
8170 - 5
Average ttft =  9.016655330198358
P50 ttft =  8.738089561462402
P99 ttft =  17.16869776725769
Average tbt =  12.77797676511558
P50 tbt =  12.763127899169925
P99 tbt =  25.251870625019077
All GPUs and memories are cold after  158.26735305786133
92886
288  -  96
288  -  256
288  -  1024
1053  -  96
1053  -  256
1053  -  600
8170  -  5
8170  -  256
8170  -  600
8170 - 5
Average ttft =  1.1472533543904622
P50 ttft =  0.8905855417251587
P99 ttft =  2.2016827082633976
Average tbt =  0.5781694173812868
P50 tbt =  0.6198657274246218
P99 tbt =  1.7417465777397163
All GPUs and memories are cold after  1.0232205390930176
92886
288  -  96
288  -  256
288  -  1024
1053  -  96
1053  -  256
1053  -  600
8170  -  5
8170  -  256
8170  -  600
8170 - 5
Average ttft =  1.5996960458301364
P50 ttft =  1.500957727432251
P99 ttft =  3.287005472183228
Average tbt =  1.0981836716334026
P50 tbt =  0.6260180234909059
P99 tbt =  3.448503036499025
All GPUs and memories are cold after  8.020715951919556
92886
288  -  96
288  -  256
288  -  1024
1053  -  96
1053  -  256
1053  -  600
8170  -  5
8170  -  256
8170  -  600
8170 - 5
Average ttft =  5.132257052830288
P50 ttft =  4.661484003067017
P99 ttft =  10.469026842117307
Average tbt =  10.07073296615056
P50 tbt =  10.073246836662296
P99 tbt =  19.865615709781647
All GPUs and memories are cold after  15.034432411193848
92886
288  -  96
288  -  256
288  -  1024
1053  -  96
1053  -  256
1053  -  600
8170  -  5
8170  -  256
8170  -  600
8170 - 5
Average ttft =  8.770334540343866
P50 ttft =  8.215379238128662
P99 ttft =  15.44905047416687
Average tbt =  11.868061830939318
P50 tbt =  11.833068609237674
P99 tbt =  23.45934125900269
All GPUs and memories are cold after  17.11393976211548
92886
288  -  96
288  -  256
288  -  1024
1053  -  96
1053  -  256
1053  -  600
8170  -  5
8170  -  256
8170  -  600
8170 - 5
Average ttft =  12.529458837509155
P50 ttft =  12.095076441764832
P99 ttft =  23.752220010757448
Average tbt =  14.496529545307162
P50 tbt =  14.4892302274704
P99 tbt =  28.619715418815616
All GPUs and memories are cold after  19.045589447021484
92886
288  -  96
288  -  256
288  -  1024
1053  -  96
1053  -  256
1053  -  600
8170  -  5
8170  -  256
8170  -  600
8170 - 5
Average ttft =  17.708323273807764
P50 ttft =  18.32601833343506
P99 ttft =  34.28065097093582
Average tbt =  18.629752299934633
P50 tbt =  18.61551300287247
P99 tbt =  36.87565071892739
All GPUs and memories are cold after  20.04246711730957
92886
288  -  96
288  -  256
288  -  1024
1053  -  96
1053  -  256
1053  -  600
8170  -  5
8170  -  256
8170  -  600
8170 - 5
Average ttft =  21.650189598945722
P50 ttft =  21.78481411933899
P99 ttft =  42.078150596618656
Average tbt =  21.316428214883157
P50 tbt =  21.302888464927676
P99 tbt =  42.15060421657563
All GPUs and memories are cold after  24.047776460647583
92886
288  -  96
288  -  256
288  -  1024
1053  -  96
1053  -  256
1053  -  600
8170  -  5
8170  -  256
8170  -  600
8170 - 5
Average ttft =  25.808539114802716
P50 ttft =  25.54026746749878
P99 ttft =  49.99097245693206
Average tbt =  24.326675338342973
P50 tbt =  24.305347585678106
P99 tbt =  48.048516973972326
All GPUs and memories are cold after  21.061675786972046
92886
288  -  96
288  -  256
288  -  1024
1053  -  96
1053  -  256
1053  -  600
8170  -  5
8170  -  256
8170  -  600
8170 - 5
Average ttft =  0.7457912166913351
P50 ttft =  0.5916273593902588
P99 ttft =  1.448739881515503
Average tbt =  0.22569479544957485
P50 tbt =  0.02312204837799072
P99 tbt =  0.7909696354866032
All GPUs and memories are cold after  4.035749912261963
92886
288  -  96
288  -  256
288  -  1024
1053  -  96
1053  -  256
1053  -  600
8170  -  5
8170  -  256
8170  -  600
8170 - 5
Average ttft =  0.9661815393538702
P50 ttft =  0.864609956741333
P99 ttft =  2.0460016727447514
Average tbt =  0.5264949310393563
P50 tbt =  0.4296400547027589
P99 tbt =  1.5694774055480964
All GPUs and memories are cold after  14.029494762420654
92886
288  -  96
288  -  256
288  -  1024
1053  -  96
1053  -  256
1053  -  600
8170  -  5
8170  -  256
8170  -  600
8170 - 5
Average ttft =  1.566218478339059
P50 ttft =  1.351919174194336
P99 ttft =  3.8651122236251814
Average tbt =  1.66722074849265
P50 tbt =  1.2521096229553226
P99 tbt =  4.859740510940552
All GPUs and memories are cold after  23.051459312438965
92886
288  -  96
288  -  256
288  -  1024
1053  -  96
1053  -  256
1053  -  600
8170  -  5
8170  -  256
8170  -  600
8170 - 5
Average ttft =  3.2444531452365037
P50 ttft =  2.8992655277252197
P99 ttft =  6.129525089263916
Average tbt =  8.184045950377863
P50 tbt =  8.182650852203372
P99 tbt =  16.145297136306766
All GPUs and memories are cold after  26.050692319869995
92886
288  -  96
288  -  256
288  -  1024
1053  -  96
1053  -  256
1053  -  600
8170  -  5
8170  -  256
8170  -  600
8170 - 5
Average ttft =  5.948354682922363
P50 ttft =  5.560978651046753
P99 ttft =  10.979979622364043
Average tbt =  10.04260085487366
P50 tbt =  9.998076379299167
P99 tbt =  19.839470379114157
All GPUs and memories are cold after  28.04849362373352
92886
288  -  96
288  -  256
288  -  1024
1053  -  96
1053  -  256
1053  -  600
8170  -  5
8170  -  256
8170  -  600
8170 - 5
Average ttft =  9.147465713322163
P50 ttft =  9.758108258247375
P99 ttft =  17.691659343242645
Average tbt =  12.865969668701293
P50 tbt =  12.858341240882876
P99 tbt =  25.409249892473227
All GPUs and memories are cold after  32.065509557724
92886
288  -  96
288  -  256
288  -  1024
1053  -  96
1053  -  256
1053  -  600
8170  -  5
8170huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
GPU clocks set to "(gpuClkMin 1600, gpuClkMax 1600)" for GPU 00000001:00:00.0

Warning: persistence mode is disabled on device 00000001:00:00.0. See the Known Issues section of the nvidia-smi(1) man page for more information. Run with [--help | -h] switch to get more information on how to enable persistence mode.
GPU clocks set to "(gpuClkMin 1600, gpuClkMax 1600)" for GPU 00000002:00:00.0

Warning: persistence mode is disabled on device 00000002:00:00.0. See the Known Issues section of the nvidia-smi(1) man page for more information. Run with [--help | -h] switch to get more information on how to enable persistence mode.
GPU clocks set to "(gpuClkMin 1600, gpuClkMax 1600)" for GPU 00000003:00:00.0

Warning: persistence mode is disabled on device 00000003:00:00.0. See the Known Issues section of the nvidia-smi(1) man page for more information. Run with [--help | -h] switch to get more information on how to enable persistence mode.
GPU clocks set to "(gpuClkMin 1600, gpuClkMax 1600)" for GPU 00000008:00:00.0

Warning: persistence mode is disabled on device 00000008:00:00.0. See the Known Issues section of the nvidia-smi(1) man page for more information. Run with [--help | -h] switch to get more information on how to enable persistence mode.
GPU clocks set to "(gpuClkMin 1600, gpuClkMax 1600)" for GPU 00000009:00:00.0

Warning: persistence mode is disabled on device 00000009:00:00.0. See the Known Issues section of the nvidia-smi(1) man page for more information. Run with [--help | -h] switch to get more information on how to enable persistence mode.
GPU clocks set to "(gpuClkMin 1600, gpuClkMax 1600)" for GPU 0000000A:00:00.0

Warning: persistence mode is disabled on device 0000000A:00:00.0. See the Known Issues section of the nvidia-smi(1) man page for more information. Run with [--help | -h] switch to get more information on how to enable persistence mode.
GPU clocks set to "(gpuClkMin 1600, gpuClkMax 1600)" for GPU 0000000B:00:00.0

Warning: persistence mode is disabled on device 0000000B:00:00.0. See the Known Issues section of the nvidia-smi(1) man page for more information. Run with [--help | -h] switch to get more information on how to enable persistence mode.
GPU clocks set to "(gpuClkMin 1600, gpuClkMax 1600)" for GPU 0000000C:00:00.0

Warning: persistence mode is disabled on device 0000000C:00:00.0. See the Known Issues section of the nvidia-smi(1) man page for more information. Run with [--help | -h] switch to get more information on how to enable persistence mode.
All done.
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
GPU clocks set to "(gpuClkMin 1980, gpuClkMax 1980)" for GPU 00000001:00:00.0

Warning: persistence mode is disabled on device 00000001:00:00.0. See the Known Issues section of the nvidia-smi(1) man page for more information. Run with [--help | -h] switch to get more information on how to enable persistence mode.
GPU clocks set to "(gpuClkMin 1980, gpuClkMax 1980)" for GPU 00000002:00:00.0

Warning: persistence mode is disabled on device 00000002:00:00.0. See the Known Issues section of the nvidia-smi(1) man page for more information. Run with [--help | -h] switch to get more information on how to enable persistence mode.
GPU clocks set to "(gpuClkMin 1980, gpuClkMax 1980)" for GPU 00000003:00:00.0

Warning: persistence mode is disabled on device 00000003:00:00.0. See the Known Issues section of the nvidia-smi(1) man page for more information. Run with [--help | -h] switch to get more information on how to enable persistence mode.
GPU clocks set to "(gpuClkMin 1980, gpuClkMax 1980)" for GPU 00000008:00:00.0

Warning: persistence mode is disabled on device 00000008:00:00.0. See the Known Issues section of the nvidia-smi(1) man page for more information. Run with [--help | -h] switch to get more information on how to enable persistence mode.
GPU clocks set to "(gpuClkMin 1980, gpuClkMax 1980)" for GPU 00000009:00:00.0

Warning: persistence mode is disabled on device 00000009:00:00.0. See the Known Issues section of the nvidia-smi(1) man page for more information. Run with [--help | -h] switch to get more information on how to enable persistence mode.
GPU clocks set to "(gpuClkMin 1980, gpuClkMax 1980)" for GPU 0000000A:00:00.0

Warning: persistence mode is disabled on device 0000000A:00:00.0. See the Known Issues section of the nvidia-smi(1) man page for more information. Run with [--help | -h] switch to get more information on how to enable persistence mode.
GPU clocks set to "(gpuClkMin 1980, gpuClkMax 1980)" for GPU 0000000B:00:00.0

Warning: persistence mode is disabled on device 0000000B:00:00.0. See the Known Issues section of the nvidia-smi(1) man page for more information. Run with [--help | -h] switch to get more information on how to enable persistence mode.
GPU clocks set to "(gpuClkMin 1980, gpuClkMax 1980)" for GPU 0000000C:00:00.0

Warning: persistence mode is disabled on device 0000000C:00:00.0. See the Known Issues section of the nvidia-smi(1) man page for more information. Run with [--help | -h] switch to get more information on how to enable persistence mode.
All done.
All done.
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
GPU clocks set to "(gpuClkMin 800, gpuClkMax 800)" for GPU 00000001:00:00.0

Warning: persistence mode is disabled on device 00000001:00:00.0. See the Known Issues section of the nvidia-smi(1) man page for more information. Run with [--help | -h] switch to get more information on how to enable persistence mode.
GPU clocks set to "(gpuClkMin 800, gpuClkMax 800)" for GPU 00000002:00:00.0

Warning: persistence mode is disabled on device 00000002:00:00.0. See the Known Issues section of the nvidia-smi(1) man page for more information. Run with [--help | -h] switch to get more information on how to enable persistence mode.
GPU clocks set to "(gpuClkMin 800, gpuClkMax 800)" for GPU 00000003:00:00.0

Warning: persistence mode is disabled on device 00000003:00:00.0. See the Known Issues section of the nvidia-smi(1) man page for more information. Run with [--help | -h] switch to get more information on how to enable persistence mode.
GPU clocks set to "(gpuClkMin 800, gpuClkMax 800)" for GPU 00000008:00:00.0

Warning: persistence mode is disabled on device 00000008:00:00.0. See the Known Issues section of the nvidia-smi(1) man page for more information. Run with [--help | -h] switch to get more information on how to enable persistence mode.
GPU clocks set to "(gpuClkMin 800, gpuClkMax 800)" for GPU 00000009:00:00.0

Warning: persistence mode is disabled on device 00000009:00:00.0. See the Known Issues section of the nvidia-smi(1) man page for more information. Run with [--help | -h] switch to get more information on how to enable persistence mode.
GPU clocks set to "(gpuClkMin 800, gpuClkMax 800)" for GPU 0000000A:00:00.0

Warning: persistence mode is disabled on device 0000000A:00:00.0. See the Known Issues section of the nvidia-smi(1) man page for more information. Run with [--help | -h] switch to get more information on how to enable persistence mode.
GPU clocks set to "(gpuClkMin 800, gpuClkMax 800)" for GPU 0000000B:00:00.0

Warning: persistence mode is disabled on device 0000000B:00:00.0. See the Known Issues section of the nvidia-smi(1) man page for more information. Run with [--help | -h] switch to get more information on how to enable persistence mode.
GPU clocks set to "(gpuClkMin 800, gpuClkMax 800)" for GPU 0000000C:00:00.0

Warning: persistence mode is disabled on device 0000000C:00:00.0. See the Known Issues section of the nvidia-smi(1) man page for more information. Run with [--help | -h] switch to get more information on how to enable persistence mode.
All done.
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
  -  256
8170  -  600
8170 - 5
Average ttft =  12.037213377756615
P50 ttft =  12.42839503288269
P99 ttft =  23.30615735054016
Average tbt =  14.733118544539362
P50 tbt =  14.71760132312775
P99 tbt =  29.157806834220892
All GPUs and memories are cold after  34.059550762176514
92886
288  -  96
288  -  256
288  -  1024
1053  -  96
1053  -  256
1053  -  600
8170  -  5
8170  -  256
8170  -  600
8170 - 5
Average ttft =  14.923827561987451
P50 ttft =  14.65181279182434
P99 ttft =  28.68871480941772
Average tbt =  16.82184167252966
P50 tbt =  16.805496263504033
P99 tbt =  33.258792599678046
All GPUs and memories are cold after  35.097086906433105
92886
288  -  96
288  -  256
288  -  1024
1053  -  96
1053  -  256
1053  -  600
8170  -  5
8170  -  256
8170  -  600
8170 - 5
Average ttft =  0.5998302896817526
P50 ttft =  0.4756091833114624
P99 ttft =  1.2140873098373415
Average tbt =  0.15534894267718002
P50 tbt =  0.021032023429870608
P99 tbt =  0.6333041591644291
All GPUs and memories are cold after  33.06074023246765
92886
288  -  96
288  -  256
288  -  1024
1053  -  96
1053  -  256
1053  -  600
8170  -  5
8170  -  256
8170  -  600
8170 - 5
Average ttft =  0.7110198111761183
P50 ttft =  0.6140339374542236
P99 ttft =  1.4853719234466554
Average tbt =  0.39148289703187505
P50 tbt =  0.3449338674545289
P99 tbt =  1.2520748853683477
All GPUs and memories are cold after  46.11260962486267
92886
288  -  96
288  -  256
288  -  1024
1053  -  96
1053  -  256
1053  -  600
8170  -  5
8170  -  256
8170  -  600
8170 - 5
Average ttft =  1.0433461870465959
P50 ttft =  0.8338489532470703
P99 ttft =  2.2441577720642076
Average tbt =  0.9846828160967149
P50 tbt =  0.6673346519470216
P99 tbt =  3.800924714565277
All GPUs and memories are cold after  59.14123773574829
92886
288  -  96
288  -  256
288  -  1024
1053  -  96
1053  -  256
1053  -  600
8170  -  5
8170  -  256
8170  -  600
8170 - 5
Average ttft =  1.3173455843111364
P50 ttft =  1.099931240081787
P99 ttft =  3.1384695053100593
Average tbt =  2.4038593135228985
P50 tbt =  2.2908763170242317
P99 tbt =  5.743539390563967
All GPUs and memories are cold after  64.10297703742981
92886
288  -  96
288  -  256
288  -  1024
1053  -  96
1053  -  256
1053  -  600
8170  -  5
8170  -  256
8170  -  600
8170 - 5
Average ttft =  3.00133047580719
P50 ttft =  2.966359853744507
P99 ttft =  5.248684065341949
Average tbt =  8.006709448814393
P50 tbt =  7.990283668041231
P99 tbt =  15.855841452121737
All GPUs and memories are cold after  66.11396956443787
92886
288  -  96
288  -  256
288  -  1024
1053  -  96
1053  -  256
1053  -  600
8170  -  5
8170  -  256
8170  -  600
8170 - 5
Average ttft =  5.40581563860178
P50 ttft =  5.876902222633362
P99 ttft =  10.394440352916716
Average tbt =  10.319008996710185
P50 tbt =  10.27740848064423
P99 tbt =  20.37965485596657
All GPUs and memories are cold after  78.12577319145203
92886
288  -  96
288  -  256
288  -  1024
1053  -  96
1053  -  256
1053  -  600
8170  -  5
8170  -  256
8170  -  600
8170 - 5
Average ttft =  7.659128368717351
P50 ttft =  8.022163152694702
P99 ttft =  14.837658472061158
Average tbt =  11.794248118792495
P50 tbt =  11.729514169692997
P99 tbt =  23.278774346351632
All GPUs and memories are cold after  77.1187756061554
92886
288  -  96
288  -  256
288  -  1024
1053  -  96
1053  -  256
1053  -  600
8170  -  5
8170  -  256
8170  -  600
8170 - 5
Average ttft =  9.8875909397401
P50 ttft =  9.543224096298218
P99 ttft =  18.956943073272704
Average tbt =  13.444510671604117
P50 tbt =  13.480088400840764
P99 tbt =  26.49324939870835
All GPUs and memories are cold after  88.1672260761261
92886
288  -  96
288  -  256
288  -  1024
1053  -  96
1053  -  256
1053  -  600
8170  -  5
8170  -  256
8170  -  600
8170 - 5
Average ttft =  0.5460538069407145
P50 ttft =  0.44749581813812256
P99 ttft =  1.0382777237892151
Average tbt =  0.14752034942309064
P50 tbt =  0.02004406452178955
P99 tbt =  0.6007306580543521
All GPUs and memories are cold after  109.21776580810547
92886
288  -  96
288  -  256
288  -  1024
1053  -  96
1053  -  256
1053  -  600
8170  -  5
8170  -  256
8170  -  600
8170 - 5
Average ttft =  0.6541103181384859
P50 ttft =  0.5155138969421387
P99 ttft =  1.3605526924133302
Average tbt =  0.3548165979839507
P50 tbt =  0.32712748050689705
P99 tbt =  1.1869967031478885
All GPUs and memories are cold after  86.14578795433044
92886
288  -  96
288  -  256
288  -  1024
1053  -  96
1053  -  256
1053  -  600
8170  -  5
8170  -  256
8170  -  600
8170 - 5
Average ttft =  0.9500134127480643
P50 ttft =  0.7866988182067871
P99 ttft =  1.9705581140518176
Average tbt =  0.9355598102297105
P50 tbt =  0.6330716609954836
P99 tbt =  3.611762409687042
All GPUs and memories are cold after  111.18808436393738
92886
288  -  96
288  -  256
288  -  1024
1053  -  96
1053  -  256
1053  -  600
8170  -  5
8170  -  256
8170  -  600
8170 - 5
Average ttft =  1.1481536888494723
P50 ttft =  0.888094425201416
P99 ttft =  2.836683368682862
Average tbt =  1.8235951749289916
P50 tbt =  1.695177984237671
P99 tbt =  5.141583747863772
All GPUs and memories are cold after  131.24182748794556
92886
288  -  96
288  -  256
288  -  1024
1053  -  96
1053  -  256
1053  -  600
8170  -  5
8170  -  256
8170  -  600
8170 - 5
Average ttft =  2.331671242713928
P50 ttft =  2.4223231077194214
P99 ttft =  4.212874991893768
Average tbt =  7.609462417125705
P50 tbt =  7.567196941375734
P99 tbt =  15.024249219417575
All GPUs and memories are cold after  127.2011604309082
92886
288  -  96
288  -  256
288  -  1024
1053  -  96
1053  -  256
1053  -  600
8170  -  5
8170  -  256
8170  -  600
8170 - 5
Average ttft =  4.561941083520651
P50 ttft =  4.925778150558472
P99 ttft =  8.779161572456358
Average tbt =  9.757128115743402
P50 tbt =  9.751379477977755
P99 tbt =  19.252518864870076
All GPUs and memories are cold after  151.22318530082703
92886
288  -  96
288  -  256
288  -  1024
1053  -  96
1053  -  256
1053  -  600
8170  -  5
8170  -  256
8170  -  600
8170 - 5
Average ttft =  6.790752580721084
P50 ttft =  7.037930250167847
P99 ttft =  13.086201667785645
Average tbt =  11.15861303414384
P50 tbt =  11.145784258842472
P99 tbt =  22.082316161155706
All GPUs and memories are cold after  141.22946190834045
92886
288  -  96
288  -  256
288  -  1024
1053  -  96
1053  -  256
1053  -  600
8170  -  5
8170  -  256
8170  -  600
8170 - 5
Average ttft =  8.95966472970434
P50 ttft =  8.699437141418457
P99 ttft =  17.041455917358398
Average tbt =  12.73410435383579
P50 tbt =  12.715941333770754
P99 tbt =  25.170154660701755
All GPUs and memories are cold after  170.2751784324646
92886
288  -  96
288  -  256
288  -  1024
1053  -  96
1053  -  256
1053  -  600
8170  -  5
8170  -  256
8170  -  600
8170 - 5
Average ttft =  1.1316426992416382
P50 ttft =  0.8718193769454956
P99 ttft =  2.1984321570396426
Average tbt =  0.5434907813866935
P50 tbt =  0.6198573350906373
P99 tbt =  1.7326465706825265
All GPUs and memories are cold after  1.02298903465271
92886
288  -  96
288  -  256
288  -  1024
1053  -  96
1053  -  256
1053  -  600
8170  -  5
8170  -  256
8170  -  600
8170 - 5
Average ttft =  1.6007124242328463
P50 ttft =  1.501129150390625
P99 ttft =  3.2881843566894537
Average tbt =  1.0990075815291636
P50 tbt =  0.6269150733947755
P99 tbt =  3.4508621549606335
All GPUs and memories are cold after  8.026557445526123
92886
288  -  96
288  -  256
288  -  1024
1053  -  96
1053  -  256
1053  -  600
8170  -  5
8170  -  256
8170  -  600
8170 - 5
Average ttft =  5.248586388996669
P50 ttft =  4.785797595977783
P99 ttft =  10.579632778167722
Average tbt =  10.068034513337272
P50 tbt =  10.069490933418276
P99 tbt =  19.91454136180878
All GPUs and memories are cold after  15.142144441604614
92886
288  -  96
288  -  256
288  -  1024
1053  -  96
1053  -  256
1053  -  600
8170  -  5
8170  -  256
8170  -  600
8170 - 5
Average ttft =  8.754338438917951
P50 ttft =  8.085744142532349
P99 ttft =  15.464860582351685
Average tbt =  11.895934952759163
P50 tbt =  11.934331107139592
P99 tbt =  23.472640094757086
All GPUs and memories are cold after  17.036224842071533
92886
288  -  96
288  -  256
288  -  1024
1053  -  96
1053  -  256
1053  -  600
8170  -  5
8170  -  256
8170  -  600
8170huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
GPU clocks set to "(gpuClkMin 1200, gpuClkMax 1200)" for GPU 00000001:00:00.0

Warning: persistence mode is disabled on device 00000001:00:00.0. See the Known Issues section of the nvidia-smi(1) man page for more information. Run with [--help | -h] switch to get more information on how to enable persistence mode.
GPU clocks set to "(gpuClkMin 1200, gpuClkMax 1200)" for GPU 00000002:00:00.0

Warning: persistence mode is disabled on device 00000002:00:00.0. See the Known Issues section of the nvidia-smi(1) man page for more information. Run with [--help | -h] switch to get more information on how to enable persistence mode.
GPU clocks set to "(gpuClkMin 1200, gpuClkMax 1200)" for GPU 00000003:00:00.0

Warning: persistence mode is disabled on device 00000003:00:00.0. See the Known Issues section of the nvidia-smi(1) man page for more information. Run with [--help | -h] switch to get more information on how to enable persistence mode.
GPU clocks set to "(gpuClkMin 1200, gpuClkMax 1200)" for GPU 00000008:00:00.0

Warning: persistence mode is disabled on device 00000008:00:00.0. See the Known Issues section of the nvidia-smi(1) man page for more information. Run with [--help | -h] switch to get more information on how to enable persistence mode.
GPU clocks set to "(gpuClkMin 1200, gpuClkMax 1200)" for GPU 00000009:00:00.0

Warning: persistence mode is disabled on device 00000009:00:00.0. See the Known Issues section of the nvidia-smi(1) man page for more information. Run with [--help | -h] switch to get more information on how to enable persistence mode.
GPU clocks set to "(gpuClkMin 1200, gpuClkMax 1200)" for GPU 0000000A:00:00.0

Warning: persistence mode is disabled on device 0000000A:00:00.0. See the Known Issues section of the nvidia-smi(1) man page for more information. Run with [--help | -h] switch to get more information on how to enable persistence mode.
GPU clocks set to "(gpuClkMin 1200, gpuClkMax 1200)" for GPU 0000000B:00:00.0

Warning: persistence mode is disabled on device 0000000B:00:00.0. See the Known Issues section of the nvidia-smi(1) man page for more information. Run with [--help | -h] switch to get more information on how to enable persistence mode.
GPU clocks set to "(gpuClkMin 1200, gpuClkMax 1200)" for GPU 0000000C:00:00.0

Warning: persistence mode is disabled on device 0000000C:00:00.0. See the Known Issues section of the nvidia-smi(1) man page for more information. Run with [--help | -h] switch to get more information on how to enable persistence mode.
All done.
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
GPU clocks set to "(gpuClkMin 1600, gpuClkMax 1600)" for GPU 00000001:00:00.0

Warning: persistence mode is disabled on device 00000001:00:00.0. See the Known Issues section of the nvidia-smi(1) man page for more information. Run with [--help | -h] switch to get more information on how to enable persistence mode.
GPU clocks set to "(gpuClkMin 1600, gpuClkMax 1600)" for GPU 00000002:00:00.0

Warning: persistence mode is disabled on device 00000002:00:00.0. See the Known Issues section of the nvidia-smi(1) man page for more information. Run with [--help | -h] switch to get more information on how to enable persistence mode.
GPU clocks set to "(gpuClkMin 1600, gpuClkMax 1600)" for GPU 00000003:00:00.0

Warning: persistence mode is disabled on device 00000003:00:00.0. See the Known Issues section of the nvidia-smi(1) man page for more information. Run with [--help | -h] switch to get more information on how to enable persistence mode.
GPU clocks set to "(gpuClkMin 1600, gpuClkMax 1600)" for GPU 00000008:00:00.0

Warning: persistence mode is disabled on device 00000008:00:00.0. See the Known Issues section of the nvidia-smi(1) man page for more information. Run with [--help | -h] switch to get more information on how to enable persistence mode.
GPU clocks set to "(gpuClkMin 1600, gpuClkMax 1600)" for GPU 00000009:00:00.0

Warning: persistence mode is disabled on device 00000009:00:00.0. See the Known Issues section of the nvidia-smi(1) man page for more information. Run with [--help | -h] switch to get more information on how to enable persistence mode.
GPU clocks set to "(gpuClkMin 1600, gpuClkMax 1600)" for GPU 0000000A:00:00.0

Warning: persistence mode is disabled on device 0000000A:00:00.0. See the Known Issues section of the nvidia-smi(1) man page for more information. Run with [--help | -h] switch to get more information on how to enable persistence mode.
GPU clocks set to "(gpuClkMin 1600, gpuClkMax 1600)" for GPU 0000000B:00:00.0

Warning: persistence mode is disabled on device 0000000B:00:00.0. See the Known Issues section of the nvidia-smi(1) man page for more information. Run with [--help | -h] switch to get more information on how to enable persistence mode.
GPU clocks set to "(gpuClkMin 1600, gpuClkMax 1600)" for GPU 0000000C:00:00.0

Warning: persistence mode is disabled on device 0000000C:00:00.0. See the Known Issues section of the nvidia-smi(1) man page for more information. Run with [--help | -h] switch to get more information on how to enable persistence mode.
All done.
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
GPU clocks set to "(gpuClkMin 1980, gpuClkMax 1980)" for GPU 00000001:00:00.0

Warning: persistence mode is disabled on device 00000001:00:00.0. See the Known Issues section of the nvidia-smi(1) man page for more information. Run with [--help | -h] switch to get more information on how to enable persistence mode.
GPU clocks set to "(gpuClkMin 1980, gpuClkMax 1980)" for GPU 00000002:00:00.0

Warning: persistence mode is disabled on device 00000002:00:00.0. See the Known Issues section of the nvidia-smi(1) man page for more information. Run with [--help | -h] switch to get more information on how to enable persistence mode.
GPU clocks set to "(gpuClkMin 1980, gpuClkMax 1980)" for GPU 00000003:00:00.0

Warning: persistence mode is disabled on device 00000003:00:00.0. See the Known Issues section of the nvidia-smi(1) man page for more information. Run with [--help | -h] switch to get more information on how to enable persistence mode.
GPU clocks set to "(gpuClkMin 1980, gpuClkMax 1980)" for GPU 00000008:00:00.0

Warning: persistence mode is disabled on device 00000008:00:00.0. See the Known Issues section of the nvidia-smi(1) man page for more information. Run with [--help | -h] switch to get more information on how to enable persistence mode.
GPU clocks set to "(gpuClkMin 1980, gpuClkMax 1980)" for GPU 00000009:00:00.0

Warning: persistence mode is disabled on device 00000009:00:00.0. See the Known Issues section of the nvidia-smi(1) man page for more information. Run with [--help | -h] switch to get more information on how to enable persistence mode.
GPU clocks set to "(gpuClkMin 1980, gpuClkMax 1980)" for GPU 0000000A:00:00.0

Warning: persistence mode is disabled on device 0000000A:00:00.0. See the Known Issues section of the nvidia-smi(1) man page for more information. Run with [--help | -h] switch to get more information on how to enable persistence mode.
GPU clocks set to "(gpuClkMin 1980, gpuClkMax 1980)" for GPU 0000000B:00:00.0

Warning: persistence mode is disabled on device 0000000B:00:00.0. See the Known Issues section of the nvidia-smi(1) man page for more information. Run with [--help | -h] switch to get more information on how to enable persistence mode.
GPU clocks set to "(gpuClkMin 1980, gpuClkMax 1980)" for GPU 0000000C:00:00.0

Warning: persistence mode is disabled on device 0000000C:00:00.0. See the Known Issues section of the nvidia-smi(1) man page for more information. Run with [--help | -h] switch to get more information on how to enable persistence mode.
All done.
All done.
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
 - 5
Average ttft =  12.545424256324768
P50 ttft =  12.115773916244507
P99 ttft =  23.773862595558167
Average tbt =  14.502343233585366
P50 tbt =  14.4948733329773
P99 tbt =  28.6365155453682
All GPUs and memories are cold after  19.039141178131104
92886
288  -  96
288  -  256
288  -  1024
1053  -  96
1053  -  256
1053  -  600
8170  -  5
8170  -  256
8170  -  600
8170 - 5
Average ttft =  17.744196373969316
P50 ttft =  18.372841238975525
P99 ttft =  34.37902154445648
Average tbt =  18.67898722402752
P50 tbt =  18.65743416547776
P99 tbt =  36.93471876358986
All GPUs and memories are cold after  21.052936553955078
92886
288  -  96
288  -  256
288  -  1024
1053  -  96
1053  -  256
1053  -  600
8170  -  5
8170  -  256
8170  -  600
8170 - 5
Average ttft =  21.59949596287453
P50 ttft =  21.73551845550537
P99 ttft =  42.02525409698487
Average tbt =  21.3160638355229
P50 tbt =  21.308357954025276
P99 tbt =  42.10427989768983
All GPUs and memories are cold after  21.039056539535522
92886
288  -  96
288  -  256
288  -  1024
1053  -  96
1053  -  256
1053  -  600
8170  -  5
8170  -  256
8170  -  600
8170 - 5
Average ttft =  25.822018629097077
P50 ttft =  25.57817816734314
P99 ttft =  50.06224522113799
Average tbt =  24.369824060761772
P50 tbt =  24.338225769996647
P99 tbt =  48.10541183042527
All GPUs and memories are cold after  22.070722818374634
92886
288  -  96
288  -  256
288  -  1024
1053  -  96
1053  -  256
1053  -  600
8170  -  5
8170  -  256
8170  -  600
8170 - 5
Average ttft =  0.7504573861757914
P50 ttft =  0.5941774845123291
P99 ttft =  1.4532237625122073
Average tbt =  0.22655451496442167
P50 tbt =  0.02327885627746582
P99 tbt =  0.7931749060153965
All GPUs and memories are cold after  3.0299370288848877
92886
288  -  96
288  -  256
288  -  1024
1053  -  96
1053  -  256
1053  -  600
8170  -  5
8170  -  256
8170  -  600
8170 - 5
Average ttft =  0.9772967724573045
P50 ttft =  0.8714356422424316
P99 ttft =  2.055049896240235
Average tbt =  0.5283666304179603
P50 tbt =  0.4315510511398316
P99 tbt =  1.5724459362030034
All GPUs and memories are cold after  14.028406143188477
92886
288  -  96
288  -  256
288  -  1024
1053  -  96
1053  -  256
1053  -  600
8170  -  5
8170  -  256
8170  -  600
8170 - 5
Average ttft =  1.5529327733176095
P50 ttft =  1.3551342487335205
P99 ttft =  3.7649118185043315
Average tbt =  1.8816080113819673
P50 tbt =  2.0693585157394416
P99 tbt =  4.78742800951004
All GPUs and memories are cold after  22.041709899902344
92886
288  -  96
288  -  256
288  -  1024
1053  -  96
1053  -  256
1053  -  600
8170  -  5
8170  -  256
8170  -  600
8170 - 5
Average ttft =  3.3064916192031486
P50 ttft =  2.9078140258789062
P99 ttft =  6.342239904403687
Average tbt =  8.30426758207926
P50 tbt =  8.31810214519501
P99 tbt =  16.319874277114874
All GPUs and memories are cold after  27.09386134147644
92886
288  -  96
288  -  256
288  -  1024
1053  -  96
1053  -  256
1053  -  600
8170  -  5
8170  -  256
8170  -  600
8170 - 5
Average ttft =  5.956346340179444
P50 ttft =  5.524749994277954
P99 ttft =  11.004725084304809
Average tbt =  10.056089982986453
P50 tbt =  10.050624728202822
P99 tbt =  19.850773834943773
All GPUs and memories are cold after  28.049019813537598
92886
288  -  96
288  -  256
288  -  1024
1053  -  96
1053  -  256
1053  -  600
8170  -  5
8170  -  256
8170  -  600
8170 - 5
Average ttft =  9.334044251590967
P50 ttft =  9.994194984436035
P99 ttft =  18.006154403686523
Average tbt =  12.956471096351743
P50 tbt =  12.914124798774722
P99 tbt =  25.630646522760394
All GPUs and memories are cold after  32.055505990982056
92886
288  -  96
288  -  256
288  -  1024
1053  -  96
1053  -  256
1053  -  600
8170  -  5
8170  -  256
8170  -  600
8170 - 5
Average ttft =  12.11653417430512
P50 ttft =  12.575578927993774
P99 ttft =  23.516489219665527
Average tbt =  14.823444444839275
P50 tbt =  14.766863322258
P99 tbt =  29.29884165763856
All GPUs and memories are cold after  33.05445885658264
92886
288  -  96
288  -  256
288  -  1024
1053  -  96
1053  -  256
1053  -  600
8170  -  5
8170  -  256
8170  -  600
8170 - 5
Average ttft =  15.00506609893707
P50 ttft =  14.80375623703003
P99 ttft =  28.89470550060272
Average tbt =  16.91415176075625
P50 tbt =  16.85163545608521
P99 tbt =  33.39936126995087
All GPUs and memories are cold after  35.062925577163696
92886
288  -  96
288  -  256
288  -  1024
1053  -  96
1053  -  256
1053  -  600
8170  -  5
8170  -  256
8170  -  600
8170 - 5
Average ttft =  0.5834554235140482
P50 ttft =  0.4756007194519043
P99 ttft =  1.1177033829689027
Average tbt =  0.1558666308720907
P50 tbt =  0.020385098457336423
P99 tbt =  0.6382482988834384
All GPUs and memories are cold after  37.05991864204407
92886
288  -  96
288  -  256
288  -  1024
1053  -  96
1053  -  256
1053  -  600
8170  -  5
8170  -  256
8170  -  600
8170 - 5
Average ttft =  0.7164810044424874
P50 ttft =  0.6175692081451416
P99 ttft =  1.502080774307251
Average tbt =  0.4163947059994653
P50 tbt =  0.3468879222869874
P99 tbt =  1.2614606761932379
All GPUs and memories are cold after  41.06364059448242
92886
288  -  96
288  -  256
288  -  1024
1053  -  96
1053  -  256
1053  -  600
8170  -  5
8170  -  256
8170  -  600
8170 - 5
Average ttft =  1.0623783656529018
P50 ttft =  0.8427808284759521
P99 ttft =  2.299206271171568
Average tbt =  0.9944305780955727
P50 tbt =  0.6710685253143311
P99 tbt =  3.8400854635238644
All GPUs and memories are cold after  62.1616268157959
92886
288  -  96
288  -  256
288  -  1024
1053  -  96
1053  -  256
1053  -  600
8170  -  5
8170  -  256
8170  -  600
8170 - 5
Average ttft =  1.3394744047304479
P50 ttft =  1.1077818870544434
P99 ttft =  3.160427379608155
Average tbt =  2.4166900268415135
P50 tbt =  2.3106890678405767
P99 tbt =  5.762675342559817
All GPUs and memories are cold after  64.09686613082886
92886
288  -  96
288  -  256
288  -  1024
1053  -  96
1053  -  256
1053  -  600
8170  -  5
8170  -  256
8170  -  600
8170 - 5
Average ttft =  3.052447247505188
P50 ttft =  2.9414085149765015
P99 ttft =  5.3620262145996085
Average tbt =  8.069454820632933
P50 tbt =  8.036135756969454
P99 tbt =  15.94491548418999
All GPUs and memories are cold after  66.177419424057
92886
288  -  96
288  -  256
288  -  1024
1053  -  96
1053  -  256
1053  -  600
8170  -  5
8170  -  256
8170  -  600
8170 - 5
Average ttft =  5.5183792896568775
P50 ttft =  5.943197846412659
P99 ttft =  10.605017075538635
Average tbt =  10.38183655887842
P50 tbt =  10.376824939250948
P99 tbt =  20.5123750140667
All GPUs and memories are cold after  73.16526389122009
92886
288  -  96
288  -  256
288  -  1024
1053  -  96
1053  -  256
1053  -  600
8170  -  5
8170  -  256
8170  -  600
8170 - 5
Average ttft =  7.730001096856104
P50 ttft =  7.912184000015259
P99 ttft =  15.015055046081544
Average tbt =  11.869982508763872
P50 tbt =  11.906732082366947
P99 tbt =  23.4152498703003
All GPUs and memories are cold after  76.11316323280334
92886
288  -  96
288  -  256
288  -  1024
1053  -  96
1053  -  256
1053  -  600
8170  -  5
8170  -  256
8170  -  600
8170 - 5
Average ttft =  9.937750744532389
P50 ttft =  9.669135093688965
P99 ttft =  19.032055144309997
Average tbt =  13.461585062095915
P50 tbt =  13.438914203643801
P99 tbt =  26.54303364515305
All GPUs and memories are cold after  90.1346583366394
92886
288  -  96
288  -  256
288  -  1024
1053  -  96
1053  -  256
1053  -  600
8170  -  5
8170  -  256
8170  -  600
8170 - 5
Average ttft =  0.5489243467648824
P50 ttft =  0.44994986057281494
P99 ttft =  1.043765242099762
Average tbt =  0.14778856436411542
P50 tbt =  0.01958796977996826
P99 tbt =  0.6027869172096255
All GPUs and memories are cold after  84.13389873504639
92886
288  -  96
288  -  256
288  -  1024
1053  -  96
1053  -  256
1053  -  600
8170  -  5
8170  -  256
8170  -  600
8170 - 5
Average ttft =  0.6903793698265439
P50 ttft =  0.5697393417358398
P99 ttft =  1.4945820331573487
Average tbt =  0.35560759476252973
P50 tbt =  0.32789940834045417
P99 tbt =  1.1940080833435065
All GPUs and memories are cold after  78.11747479438782
92886
288  -  96
288  -  256
288  -  1024
1053  -  96
1053  -  256
1053  -  600
8170  -  5
8170  -  256
8170  -  600
8170 - 5
Average ttft =  huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
GPU clocks set to "(gpuClkMin 800, gpuClkMax 800)" for GPU 00000001:00:00.0

Warning: persistence mode is disabled on device 00000001:00:00.0. See the Known Issues section of the nvidia-smi(1) man page for more information. Run with [--help | -h] switch to get more information on how to enable persistence mode.
GPU clocks set to "(gpuClkMin 800, gpuClkMax 800)" for GPU 00000002:00:00.0

Warning: persistence mode is disabled on device 00000002:00:00.0. See the Known Issues section of the nvidia-smi(1) man page for more information. Run with [--help | -h] switch to get more information on how to enable persistence mode.
GPU clocks set to "(gpuClkMin 800, gpuClkMax 800)" for GPU 00000003:00:00.0

Warning: persistence mode is disabled on device 00000003:00:00.0. See the Known Issues section of the nvidia-smi(1) man page for more information. Run with [--help | -h] switch to get more information on how to enable persistence mode.
GPU clocks set to "(gpuClkMin 800, gpuClkMax 800)" for GPU 00000008:00:00.0

Warning: persistence mode is disabled on device 00000008:00:00.0. See the Known Issues section of the nvidia-smi(1) man page for more information. Run with [--help | -h] switch to get more information on how to enable persistence mode.
GPU clocks set to "(gpuClkMin 800, gpuClkMax 800)" for GPU 00000009:00:00.0

Warning: persistence mode is disabled on device 00000009:00:00.0. See the Known Issues section of the nvidia-smi(1) man page for more information. Run with [--help | -h] switch to get more information on how to enable persistence mode.
GPU clocks set to "(gpuClkMin 800, gpuClkMax 800)" for GPU 0000000A:00:00.0

Warning: persistence mode is disabled on device 0000000A:00:00.0. See the Known Issues section of the nvidia-smi(1) man page for more information. Run with [--help | -h] switch to get more information on how to enable persistence mode.
GPU clocks set to "(gpuClkMin 800, gpuClkMax 800)" for GPU 0000000B:00:00.0

Warning: persistence mode is disabled on device 0000000B:00:00.0. See the Known Issues section of the nvidia-smi(1) man page for more information. Run with [--help | -h] switch to get more information on how to enable persistence mode.
GPU clocks set to "(gpuClkMin 800, gpuClkMax 800)" for GPU 0000000C:00:00.0

Warning: persistence mode is disabled on device 0000000C:00:00.0. See the Known Issues section of the nvidia-smi(1) man page for more information. Run with [--help | -h] switch to get more information on how to enable persistence mode.
All done.
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
GPU clocks set to "(gpuClkMin 1200, gpuClkMax 1200)" for GPU 00000001:00:00.0

Warning: persistence mode is disabled on device 00000001:00:00.0. See the Known Issues section of the nvidia-smi(1) man page for more information. Run with [--help | -h] switch to get more information on how to enable persistence mode.
GPU clocks set to "(gpuClkMin 1200, gpuClkMax 1200)" for GPU 00000002:00:00.0

Warning: persistence mode is disabled on device 00000002:00:00.0. See the Known Issues section of the nvidia-smi(1) man page for more information. Run with [--help | -h] switch to get more information on how to enable persistence mode.
GPU clocks set to "(gpuClkMin 1200, gpuClkMax 1200)" for GPU 00000003:00:00.0

Warning: persistence mode is disabled on device 00000003:00:00.0. See the Known Issues section of the nvidia-smi(1) man page for more information. Run with [--help | -h] switch to get more information on how to enable persistence mode.
GPU clocks set to "(gpuClkMin 1200, gpuClkMax 1200)" for GPU 00000008:00:00.0

Warning: persistence mode is disabled on device 00000008:00:00.0. See the Known Issues section of the nvidia-smi(1) man page for more information. Run with [--help | -h] switch to get more information on how to enable persistence mode.
GPU clocks set to "(gpuClkMin 1200, gpuClkMax 1200)" for GPU 00000009:00:00.0

Warning: persistence mode is disabled on device 00000009:00:00.0. See the Known Issues section of the nvidia-smi(1) man page for more information. Run with [--help | -h] switch to get more information on how to enable persistence mode.
GPU clocks set to "(gpuClkMin 1200, gpuClkMax 1200)" for GPU 0000000A:00:00.0

Warning: persistence mode is disabled on device 0000000A:00:00.0. See the Known Issues section of the nvidia-smi(1) man page for more information. Run with [--help | -h] switch to get more information on how to enable persistence mode.
GPU clocks set to "(gpuClkMin 1200, gpuClkMax 1200)" for GPU 0000000B:00:00.0

Warning: persistence mode is disabled on device 0000000B:00:00.0. See the Known Issues section of the nvidia-smi(1) man page for more information. Run with [--help | -h] switch to get more information on how to enable persistence mode.
GPU clocks set to "(gpuClkMin 1200, gpuClkMax 1200)" for GPU 0000000C:00:00.0

Warning: persistence mode is disabled on device 0000000C:00:00.0. See the Known Issues section of the nvidia-smi(1) man page for more information. Run with [--help | -h] switch to get more information on how to enable persistence mode.
All done.
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
GPU clocks set to "(gpuClkMin 1600, gpuClkMax 1600)" for GPU 00000001:00:00.0

Warning: persistence mode is disabled on device 00000001:00:00.0. See the Known Issues section of the nvidia-smi(1) man page for more information. Run with [--help | -h] switch to get more information on how to enable persistence mode.
GPU clocks set to "(gpuClkMin 1600, gpuClkMax 1600)" for GPU 00000002:00:00.0

Warning: persistence mode is disabled on device 00000002:00:00.0. See the Known Issues section of the nvidia-smi(1) man page for more information. Run with [--help | -h] switch to get more information on how to enable persistence mode.
GPU clocks set to "(gpuClkMin 1600, gpuClkMax 1600)" for GPU 00000003:00:00.0

Warning: persistence mode is disabled on device 00000003:00:00.0. See the Known Issues section of the nvidia-smi(1) man page for more information. Run with [--help | -h] switch to get more information on how to enable persistence mode.
GPU clocks set to "(gpuClkMin 1600, gpuClkMax 1600)" for GPU 00000008:00:00.0

Warning: persistence mode is disabled on device 00000008:00:00.0. See the Known Issues section of the nvidia-smi(1) man page for more information. Run with [--help | -h] switch to get more information on how to enable persistence mode.
GPU clocks set to "(gpuClkMin 1600, gpuClkMax 1600)" for GPU 00000009:00:00.0

Warning: persistence mode is disabled on device 00000009:00:00.0. See the Known Issues section of the nvidia-smi(1) man page for more information. Run with [--help | -h] switch to get more information on how to enable persistence mode.
GPU clocks set to "(gpuClkMin 1600, gpuClkMax 1600)" for GPU 0000000A:00:00.0

Warning: persistence mode is disabled on device 0000000A:00:00.0. See the Known Issues section of the nvidia-smi(1) man page for more information. Run with [--help | -h] switch to get more information on how to enable persistence mode.
GPU clocks set to "(gpuClkMin 1600, gpuClkMax 1600)" for GPU 0000000B:00:00.0

Warning: persistence mode is disabled on device 0000000B:00:00.0. See the Known Issues section of the nvidia-smi(1) man page for more information. Run with [--help | -h] switch to get more information on how to enable persistence mode.
GPU clocks set to "(gpuClkMin 1600, gpuClkMax 1600)" for GPU 0000000C:00:00.0

Warning: persistence mode is disabled on device 0000000C:00:00.0. See the Known Issues section of the nvidia-smi(1) man page for more information. Run with [--help | -h] switch to get more information on how to enable persistence mode.
All done.
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
0.9620928355625697
P50 ttft =  0.7943511009216309
P99 ttft =  1.9974751329421985
Average tbt =  0.9409146996906829
P50 tbt =  0.6368371963500978
P99 tbt =  3.631995142459869
All GPUs and memories are cold after  110.17086219787598
92886
288  -  96
288  -  256
288  -  1024
1053  -  96
1053  -  256
1053  -  600
8170  -  5
8170  -  256
8170  -  600
8170 - 5
Average ttft =  1.1661517271181432
P50 ttft =  0.8916275501251221
P99 ttft =  2.874595975875855
Average tbt =  1.8952785148853213
P50 tbt =  1.70899875164032
P99 tbt =  5.305653576850893
All GPUs and memories are cold after  126.20471286773682
92886
288  -  96
288  -  256
288  -  1024
1053  -  96
1053  -  256
1053  -  600
8170  -  5
8170  -  256
8170  -  600
8170 - 5
Average ttft =  2.4442610502243043
P50 ttft =  2.5378679037094116
P99 ttft =  4.336150965690613
Average tbt =  7.641598092555996
P50 tbt =  7.629763150215151
P99 tbt =  15.12587807273865
All GPUs and memories are cold after  132.31099796295166
92886
288  -  96
288  -  256
288  -  1024
1053  -  96
1053  -  256
1053  -  600
8170  -  5
8170  -  256
8170  -  600
8170 - 5
Average ttft =  4.725596752017736
P50 ttft =  5.153467535972595
P99 ttft =  9.06745573759079
Average tbt =  9.844897183775903
P50 tbt =  9.806587290763858
P99 tbt =  19.45617238736153
All GPUs and memories are cold after  167.2469937801361
92886
288  -  96
288  -  256
288  -  1024
1053  -  96
1053  -  256
1053  -  600
8170  -  5
8170  -  256
8170  -  600
8170 - 5
Average ttft =  6.88620040188097
P50 ttft =  7.221330642700195
P99 ttft =  13.35166395187378
Average tbt =  11.283635686195062
P50 tbt =  11.226698374748233
P99 tbt =  22.27101154708863
All GPUs and memories are cold after  219.38585448265076
92886
288  -  96
288  -  256
288  -  1024
1053  -  96
1053  -  256
1053  -  600
8170  -  5
8170  -  256
8170  -  600
8170 - 5
Average ttft =  9.042808664850442
P50 ttft =  8.83728575706482
P99 ttft =  17.311488981246946
Average tbt =  12.863026082659346
P50 tbt =  12.810284328460696
P99 tbt =  25.362047212600714
All GPUs and memories are cold after  166.23297572135925
92886
288  -  96
288  -  256
288  -  1024
1053  -  96
1053  -  256
1053  -  600
8170  -  5
8170  -  256
8170  -  600
8170 - 5
Average ttft =  1.1350215474764507
P50 ttft =  0.8717269897460938
P99 ttft =  2.202925026416779
Average tbt =  0.543799541393916
P50 tbt =  0.620820438861847
P99 tbt =  1.734048592329026
All GPUs and memories are cold after  2.1328506469726562
92886
288  -  96
288  -  256
288  -  1024
1053  -  96
1053  -  256
1053  -  600
8170  -  5
8170  -  256
8170  -  600
8170 - 5
Average ttft =  1.6050570692334856
P50 ttft =  1.501929521560669
P99 ttft =  3.2971310615539555
Average tbt =  1.100029471942357
P50 tbt =  0.6258841753005984
P99 tbt =  3.455066652297975
All GPUs and memories are cold after  7.129590272903442
92886
288  -  96
288  -  256
288  -  1024
1053  -  96
1053  -  256
1053  -  600
8170  -  5
8170  -  256
8170  -  600
8170 - 5
Average ttft =  5.184862899780273
P50 ttft =  4.700725555419922
P99 ttft =  10.554285144805906
Average tbt =  10.09545187132699
P50 tbt =  10.09803376197815
P99 tbt =  19.92299834823609
All GPUs and memories are cold after  15.03552532196045
92886
288  -  96
288  -  256
288  -  1024
1053  -  96
1053  -  256
1053  -  600
8170  -  5
8170  -  256
8170  -  600
8170 - 5
Average ttft =  8.808100560816323
P50 ttft =  8.265753269195557
P99 ttft =  15.521460247039796
Average tbt =  11.906505866167027
P50 tbt =  11.857763576507573
P99 tbt =  23.519657602310186
All GPUs and memories are cold after  16.036859035491943
92886
288  -  96
288  -  256
288  -  1024
1053  -  96
1053  -  256
1053  -  600
8170  -  5
8170  -  256
8170  -  600
8170 - 5
Average ttft =  12.617490911483765
P50 ttft =  12.190272688865662
P99 ttft =  23.894788992404937
Average tbt =  14.531215906620027
P50 tbt =  14.512752568721774
P99 tbt =  28.716492401599893
All GPUs and memories are cold after  19.038853406906128
92886
288  -  96
288  -  256
288  -  1024
1053  -  96
1053  -  256
1053  -  600
8170  -  5
8170  -  256
8170  -  600
8170 - 5
Average ttft =  17.78017434105277
P50 ttft =  18.4176664352417
P99 ttft =  34.42960322856903
Average tbt =  18.6832043517381
P50 tbt =  18.65991135835648
P99 tbt =  36.97519305753708
All GPUs and memories are cold after  20.034451007843018
92886
288  -  96
288  -  256
288  -  1024
1053  -  96
1053  -  256
1053  -  600
8170  -  5
8170  -  256
8170  -  600
8170 - 5
Average ttft =  21.73365678525951
P50 ttft =  21.891655683517456
P99 ttft =  42.23209497451782
Average tbt =  21.369914838712514
P50 tbt =  21.33841743469239
P99 tbt =  42.259287096977246
All GPUs and memories are cold after  21.04872703552246
92886
288  -  96
288  -  256
288  -  1024
1053  -  96
1053  -  256
1053  -  600
8170  -  5
8170  -  256
8170  -  600
8170 - 5
Average ttft =  25.906240072595068
P50 ttft =  25.66718077659607
P99 ttft =  50.18458750724792
Average tbt =  24.394564952907793
P50 tbt =  24.363523578643807
P99 tbt =  48.179493528366095
All GPUs and memories are cold after  22.04551362991333
92886
288  -  96
288  -  256
288  -  1024
1053  -  96
1053  -  256
1053  -  600
8170  -  5
8170  -  256
8170  -  600
8170 - 5
Average ttft =  0.750865618387858
P50 ttft =  0.5963824987411499
P99 ttft =  1.4562671995162966
Average tbt =  0.2282301882902782
P50 tbt =  0.025621449947357176
P99 tbt =  0.7975388500690465
All GPUs and memories are cold after  3.013644218444824
92886
288  -  96
288  -  256
288  -  1024
1053  -  96
1053  -  256
1053  -  600
8170  -  5
8170  -  256
8170  -  600
8170 - 5
Average ttft =  0.9758050668807257
P50 ttft =  0.8737764358520508
P99 ttft =  2.0629904270172124
Average tbt =  0.5288987647919429
P50 tbt =  0.43159623146057136
P99 tbt =  1.576823225021363
All GPUs and memories are cold after  14.042022466659546
92886
288  -  96
288  -  256
288  -  1024
1053  -  96
1053  -  256
1053  -  600
8170  -  5
8170  -  256
8170  -  600
8170 - 5
Average ttft =  1.5579863616398402
P50 ttft =  1.360750675201416
P99 ttft =  3.7801578378677347
Average tbt =  1.8842498043605258
P50 tbt =  2.071510362625123
P99 tbt =  4.797498468399048
All GPUs and memories are cold after  23.040199279785156
92886
288  -  96
288  -  256
288  -  1024
1053  -  96
1053  -  256
1053  -  600
8170  -  5
8170  -  256
8170  -  600
8170 - 5
Average ttft =  3.29161135743304
P50 ttft =  2.9053285121917725
P99 ttft =  6.211101961135865
Average tbt =  8.327411916779312
P50 tbt =  8.329084253311159
P99 tbt =  16.333231482505802
All GPUs and memories are cold after  27.05965757369995
92886
288  -  96
288  -  256
288  -  1024
1053  -  96
1053  -  256
1053  -  600
8170  -  5
8170  -  256
8170  -  600
8170 - 5
Average ttft =  5.988580808639527
P50 ttft =  5.554895281791687
P99 ttft =  11.041706840991974
Average tbt =  10.059509782791142
P50 tbt =  10.051853394508363
P99 tbt =  19.86480080533028
All GPUs and memories are cold after  29.050423622131348
92886
288  -  96
288  -  256
288  -  1024
1053  -  96
1053  -  256
1053  -  600
8170  -  5
8170  -  256
8170  -  600
8170 - 5
Average ttft =  9.357574008405209
P50 ttft =  10.016199827194214
P99 ttft =  18.039858846664426
Average tbt =  12.963751739636066
P50 tbt =  12.915772414207462
P99 tbt =  25.647975441455845
All GPUs and memories are cold after  31.055670261383057
92886
288  -  96
288  -  256
288  -  1024
1053  -  96
1053  -  256
1053  -  600
8170  -  5
8170  -  256
8170  -  600
8170 - 5
Average ttft =  12.020965618629978
P50 ttft =  12.41089415550232
P99 ttft =  23.356998367309572
Average tbt =  14.78338522355851
P50 tbt =  14.764107799530034
P99 tbt =  29.18389284992219
All GPUs and memories are cold after  34.09226131439209
92886
288  -  96
288  -  256
288  -  1024
1053  -  96
1053  -  256
1053  -  600
8170  -  5
8170  -  256
8170  -  600
8170 - 5
Average ttft =  14.998077231717398
P50 ttft =  14.794083833694458
P99 ttft =  28.89454841136932
Average tbt =  16.91926808012537
P50 tbt =  16.869112181663517
P99 tbt =  33.40525089216233
All GPUs and memories are cold after  36.057276010513306
92886
288  -  96
288  -  256
288  -  1024
1053  -  96
1053  -  256
1053  -  600
8170  -  5
8170  -  256
8170  -  600
8170 - 5
Average ttft =  0.5837544798851013
P50 ttft =  huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
GPU clocks set to "(gpuClkMin 1980, gpuClkMax 1980)" for GPU 00000001:00:00.0

Warning: persistence mode is disabled on device 00000001:00:00.0. See the Known Issues section of the nvidia-smi(1) man page for more information. Run with [--help | -h] switch to get more information on how to enable persistence mode.
GPU clocks set to "(gpuClkMin 1980, gpuClkMax 1980)" for GPU 00000002:00:00.0

Warning: persistence mode is disabled on device 00000002:00:00.0. See the Known Issues section of the nvidia-smi(1) man page for more information. Run with [--help | -h] switch to get more information on how to enable persistence mode.
GPU clocks set to "(gpuClkMin 1980, gpuClkMax 1980)" for GPU 00000003:00:00.0

Warning: persistence mode is disabled on device 00000003:00:00.0. See the Known Issues section of the nvidia-smi(1) man page for more information. Run with [--help | -h] switch to get more information on how to enable persistence mode.
GPU clocks set to "(gpuClkMin 1980, gpuClkMax 1980)" for GPU 00000008:00:00.0

Warning: persistence mode is disabled on device 00000008:00:00.0. See the Known Issues section of the nvidia-smi(1) man page for more information. Run with [--help | -h] switch to get more information on how to enable persistence mode.
GPU clocks set to "(gpuClkMin 1980, gpuClkMax 1980)" for GPU 00000009:00:00.0

Warning: persistence mode is disabled on device 00000009:00:00.0. See the Known Issues section of the nvidia-smi(1) man page for more information. Run with [--help | -h] switch to get more information on how to enable persistence mode.
GPU clocks set to "(gpuClkMin 1980, gpuClkMax 1980)" for GPU 0000000A:00:00.0

Warning: persistence mode is disabled on device 0000000A:00:00.0. See the Known Issues section of the nvidia-smi(1) man page for more information. Run with [--help | -h] switch to get more information on how to enable persistence mode.
GPU clocks set to "(gpuClkMin 1980, gpuClkMax 1980)" for GPU 0000000B:00:00.0

Warning: persistence mode is disabled on device 0000000B:00:00.0. See the Known Issues section of the nvidia-smi(1) man page for more information. Run with [--help | -h] switch to get more information on how to enable persistence mode.
GPU clocks set to "(gpuClkMin 1980, gpuClkMax 1980)" for GPU 0000000C:00:00.0

Warning: persistence mode is disabled on device 0000000C:00:00.0. See the Known Issues section of the nvidia-smi(1) man page for more information. Run with [--help | -h] switch to get more information on how to enable persistence mode.
All done.
All done.
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
GPU clocks set to "(gpuClkMin 800, gpuClkMax 800)" for GPU 00000001:00:00.0

Warning: persistence mode is disabled on device 00000001:00:00.0. See the Known Issues section of the nvidia-smi(1) man page for more information. Run with [--help | -h] switch to get more information on how to enable persistence mode.
GPU clocks set to "(gpuClkMin 800, gpuClkMax 800)" for GPU 00000002:00:00.0

Warning: persistence mode is disabled on device 00000002:00:00.0. See the Known Issues section of the nvidia-smi(1) man page for more information. Run with [--help | -h] switch to get more information on how to enable persistence mode.
GPU clocks set to "(gpuClkMin 800, gpuClkMax 800)" for GPU 00000003:00:00.0

Warning: persistence mode is disabled on device 00000003:00:00.0. See the Known Issues section of the nvidia-smi(1) man page for more information. Run with [--help | -h] switch to get more information on how to enable persistence mode.
GPU clocks set to "(gpuClkMin 800, gpuClkMax 800)" for GPU 00000008:00:00.0

Warning: persistence mode is disabled on device 00000008:00:00.0. See the Known Issues section of the nvidia-smi(1) man page for more information. Run with [--help | -h] switch to get more information on how to enable persistence mode.
GPU clocks set to "(gpuClkMin 800, gpuClkMax 800)" for GPU 00000009:00:00.0

Warning: persistence mode is disabled on device 00000009:00:00.0. See the Known Issues section of the nvidia-smi(1) man page for more information. Run with [--help | -h] switch to get more information on how to enable persistence mode.
GPU clocks set to "(gpuClkMin 800, gpuClkMax 800)" for GPU 0000000A:00:00.0

Warning: persistence mode is disabled on device 0000000A:00:00.0. See the Known Issues section of the nvidia-smi(1) man page for more information. Run with [--help | -h] switch to get more information on how to enable persistence mode.
GPU clocks set to "(gpuClkMin 800, gpuClkMax 800)" for GPU 0000000B:00:00.0

Warning: persistence mode is disabled on device 0000000B:00:00.0. See the Known Issues section of the nvidia-smi(1) man page for more information. Run with [--help | -h] switch to get more information on how to enable persistence mode.
GPU clocks set to "(gpuClkMin 800, gpuClkMax 800)" for GPU 0000000C:00:00.0

Warning: persistence mode is disabled on device 0000000C:00:00.0. See the Known Issues section of the nvidia-smi(1) man page for more information. Run with [--help | -h] switch to get more information on how to enable persistence mode.
All done.
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
0.4765048027038574
P99 ttft =  1.1145569348335267
Average tbt =  0.15563434163729353
P50 tbt =  0.020682775974273683
P99 tbt =  0.6343993735313419
All GPUs and memories are cold after  35.0561318397522
92886
288  -  96
288  -  256
288  -  1024
1053  -  96
1053  -  256
1053  -  600
8170  -  5
8170  -  256
8170  -  600
8170 - 5
Average ttft =  0.7160388742174421
P50 ttft =  0.6196305751800537
P99 ttft =  1.498488998413086
Average tbt =  0.3930856432233539
P50 tbt =  0.3463352203369141
P99 tbt =  1.2600908279418952
All GPUs and memories are cold after  44.135719776153564
92886
288  -  96
288  -  256
288  -  1024
1053  -  96
1053  -  256
1053  -  600
8170  -  5
8170  -  256
8170  -  600
8170 - 5
Average ttft =  1.0579750674111503
P50 ttft =  0.8410096168518066
P99 ttft =  2.2903717899322493
Average tbt =  0.9931489951269968
P50 tbt =  0.6722138643264772
P99 tbt =  3.8326637129783627
All GPUs and memories are cold after  61.13222360610962
92886
288  -  96
288  -  256
288  -  1024
1053  -  96
1053  -  256
1053  -  600
8170  -  5
8170  -  256
8170  -  600
8170 - 5
Average ttft =  1.3425622509747017
P50 ttft =  1.1088330745697021
P99 ttft =  3.16066279411316
Average tbt =  2.420419797664736
P50 tbt =  2.3126556158065803
P99 tbt =  5.763089904785158
All GPUs and memories are cold after  62.10940217971802
92886
288  -  96
288  -  256
288  -  1024
1053  -  96
1053  -  256
1053  -  600
8170  -  5
8170  -  256
8170  -  600
8170 - 5
Average ttft =  3.056877498626709
P50 ttft =  2.942051410675049
P99 ttft =  5.374103064537048
Average tbt =  8.068983978271485
P50 tbt =  8.036890327930452
P99 tbt =  15.954649342298511
All GPUs and memories are cold after  67.14883708953857
92886
288  -  96
288  -  256
288  -  1024
1053  -  96
1053  -  256
1053  -  600
8170  -  5
8170  -  256
8170  -  600
8170 - 5
Average ttft =  5.499296419322491
P50 ttft =  5.919567704200745
P99 ttft =  10.585459637641906
Average tbt =  10.381268548592926
P50 tbt =  10.427546834945682
P99 tbt =  20.50071796703339
All GPUs and memories are cold after  84.14385342597961
92886
288  -  96
288  -  256
288  -  1024
1053  -  96
1053  -  256
1053  -  600
8170  -  5
8170  -  256
8170  -  600
8170 - 5
Average ttft =  7.7389804729043625
P50 ttft =  7.912906169891357
P99 ttft =  15.028863716125489
Average tbt =  11.87471309459373
P50 tbt =  11.921176457405092
P99 tbt =  23.42443574714661
All GPUs and memories are cold after  80.11704874038696
92886
288  -  96
288  -  256
288  -  1024
1053  -  96
1053  -  256
1053  -  600
8170  -  5
8170  -  256
8170  -  600
8170 - 5
Average ttft =  9.943394594881909
P50 ttft =  9.673306226730347
P99 ttft =  19.05357667446136
Average tbt =  13.471350212269524
P50 tbt =  13.465002679824831
P99 tbt =  26.56440108251572
All GPUs and memories are cold after  83.12282276153564
92886
288  -  96
288  -  256
288  -  1024
1053  -  96
1053  -  256
1053  -  600
8170  -  5
8170  -  256
8170  -  600
8170 - 5
Average ttft =  0.5497660835584005
P50 ttft =  0.4498938322067261
P99 ttft =  1.046013641357422
Average tbt =  0.14779715935389204
P50 tbt =  0.019461941719055173
P99 tbt =  0.6034539968967442
All GPUs and memories are cold after  100.14550495147705
92886
288  -  96
288  -  256
288  -  1024
1053  -  96
1053  -  256
1053  -  600
8170  -  5
8170  -  256
8170  -  600
8170 - 5
Average ttft =  0.6886828286307198
P50 ttft =  0.5672276020050049
P99 ttft =  1.490352487564087
Average tbt =  0.35662155378432514
P50 tbt =  0.3277287483215333
P99 tbt =  1.197592997550965
All GPUs and memories are cold after  79.17835116386414
92886
288  -  96
288  -  256
288  -  1024
1053  -  96
1053  -  256
1053  -  600
8170  -  5
8170  -  256
8170  -  600
8170 - 5
Average ttft =  0.9607125827244349
P50 ttft =  0.7925944328308105
P99 ttft =  1.9977580928802476
Average tbt =  0.9407100793293548
P50 tbt =  0.6357156991958619
P99 tbt =  3.6332395682334897
All GPUs and memories are cold after  119.17387461662292
92886
288  -  96
288  -  256
288  -  1024
1053  -  96
1053  -  256
1053  -  600
8170  -  5
8170  -  256
8170  -  600
8170 - 5
Average ttft =  1.1650203786245206
P50 ttft =  0.8910007476806641
P99 ttft =  2.8705675125122077
Average tbt =  1.9349535023293842
P50 tbt =  1.7115482568740847
P99 tbt =  5.3021543312072765
All GPUs and memories are cold after  148.2516074180603
92886
288  -  96
288  -  256
288  -  1024
1053  -  96
1053  -  256
1053  -  600
8170  -  5
8170  -  256
8170  -  600
8170 - 5
Average ttft =  2.464609532356262
P50 ttft =  2.552892327308655
P99 ttft =  4.361400241851807
Average tbt =  7.648804273128511
P50 tbt =  7.638481545448306
P99 tbt =  15.148049452543262
All GPUs and memories are cold after  133.22363376617432
92886
288  -  96
288  -  256
288  -  1024
1053  -  96
1053  -  256
1053  -  600
8170  -  5
8170  -  256
8170  -  600
8170 - 5
Average ttft =  4.723889283835888
P50 ttft =  5.14985191822052
P99 ttft =  9.064896867275237
Average tbt =  9.844482930004597
P50 tbt =  9.803119397163393
P99 tbt =  19.456243276357654
All GPUs and memories are cold after  157.21672868728638
92886
288  -  96
288  -  256
288  -  1024
1053  -  96
1053  -  256
1053  -  600
8170  -  5
8170  -  256
8170  -  600
8170 - 5
Average ttft =  6.867170382852423
P50 ttft =  7.202141284942627
P99 ttft =  13.334468250274659
Average tbt =  11.278590074630634
P50 tbt =  11.223707318305973
P99 tbt =  22.252492073059088
All GPUs and memories are cold after  205.30996012687683
92886
288  -  96
288  -  256
288  -  1024
1053  -  96
1053  -  256
1053  -  600
8170  -  5
8170  -  256
8170  -  600
8170 - 5
Average ttft =  9.051962295210505
P50 ttft =  8.841268539428711
P99 ttft =  17.32301846027374
Average tbt =  12.860197458497012
P50 tbt =  12.810070252418521
P99 tbt =  25.359502261161808
All GPUs and memories are cold after  267.40416836738586
92886
288  -  96
288  -  256
288  -  1024
1053  -  96
1053  -  256
1053  -  600
8170  -  5
8170  -  256
8170  -  600
8170 - 5
Average ttft =  1.13708100716273
P50 ttft =  0.8765194416046143
P99 ttft =  2.205664386749268
Average tbt =  0.544555554787318
P50 tbt =  0.6204539179801942
P99 tbt =  1.7366956632137305
All GPUs and memories are cold after  1.0476627349853516
92886
288  -  96
288  -  256
288  -  1024
1053  -  96
1053  -  256
1053  -  600
8170  -  5
8170  -  256
8170  -  600
8170 - 5
Average ttft =  1.6171317327590216
P50 ttft =  1.5027830600738525
P99 ttft =  3.292365503311158
Average tbt =  1.1003904637836275
P50 tbt =  0.6271165609359743
P99 tbt =  3.4573565292358412
All GPUs and memories are cold after  11.029151916503906
92886
288  -  96
288  -  256
288  -  1024
1053  -  96
1053  -  256
1053  -  600
8170  -  5
8170  -  256
8170  -  600
8170 - 5
Average ttft =  5.176243707111904
P50 ttft =  4.697578191757202
P99 ttft =  10.538783698081968
Average tbt =  10.09091992855072
P50 tbt =  10.088286685943606
P99 tbt =  19.91583096456528
All GPUs and memories are cold after  16.03289818763733
92886
288  -  96
288  -  256
288  -  1024
1053  -  96
1053  -  256
1053  -  600
8170  -  5
8170  -  256
8170  -  600
8170 - 5
Average ttft =  8.891471176612669
P50 ttft =  8.335607051849365
P99 ttft =  15.61797194480896
Average tbt =  11.915612367304359
P50 tbt =  11.885413789749148
P99 tbt =  23.549754014015203
All GPUs and memories are cold after  17.08076000213623
92886
288  -  96
288  -  256
288  -  1024
1053  -  96
1053  -  256
1053  -  600
8170  -  5
8170  -  256
8170  -  600
8170 - 5
Average ttft =  12.609872035980224
P50 ttft =  12.180503726005554
P99 ttft =  23.876019661426543
Average tbt =  14.527022618770602
P50 tbt =  14.52083982229233
P99 tbt =  28.685277031660085
All GPUs and memories are cold after  20.040636777877808
92886
288  -  96
288  -  256
288  -  1024
1053  -  96
1053  -  256
1053  -  600
8170  -  5
8170  -  256
8170  -  600
8170 - 5
Average ttft =  17.77288816869259
P50 ttft =  18.406066060066223
P99 ttft =  34.41513056278229
Average tbt =  18.679471037909394
P50 tbt =  18.653129649162295
P99 tbt =  36.970718935966495
All GPUs and memories are cold after  20.042270183563232
92886
288  -  96
288  -  256
288  -  1024
1053  -  96
1053  -  256
1053  -  600
8170  -  5
8170  -  256
8170  -  600
8170 - 5
Average ttft =  21.7304914748832
P50 ttft =  21.88096523284912
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
GPU clocks set to "(gpuClkMin 1200, gpuClkMax 1200)" for GPU 00000001:00:00.0

Warning: persistence mode is disabled on device 00000001:00:00.0. See the Known Issues section of the nvidia-smi(1) man page for more information. Run with [--help | -h] switch to get more information on how to enable persistence mode.
GPU clocks set to "(gpuClkMin 1200, gpuClkMax 1200)" for GPU 00000002:00:00.0

Warning: persistence mode is disabled on device 00000002:00:00.0. See the Known Issues section of the nvidia-smi(1) man page for more information. Run with [--help | -h] switch to get more information on how to enable persistence mode.
GPU clocks set to "(gpuClkMin 1200, gpuClkMax 1200)" for GPU 00000003:00:00.0

Warning: persistence mode is disabled on device 00000003:00:00.0. See the Known Issues section of the nvidia-smi(1) man page for more information. Run with [--help | -h] switch to get more information on how to enable persistence mode.
GPU clocks set to "(gpuClkMin 1200, gpuClkMax 1200)" for GPU 00000008:00:00.0

Warning: persistence mode is disabled on device 00000008:00:00.0. See the Known Issues section of the nvidia-smi(1) man page for more information. Run with [--help | -h] switch to get more information on how to enable persistence mode.
GPU clocks set to "(gpuClkMin 1200, gpuClkMax 1200)" for GPU 00000009:00:00.0

Warning: persistence mode is disabled on device 00000009:00:00.0. See the Known Issues section of the nvidia-smi(1) man page for more information. Run with [--help | -h] switch to get more information on how to enable persistence mode.
GPU clocks set to "(gpuClkMin 1200, gpuClkMax 1200)" for GPU 0000000A:00:00.0

Warning: persistence mode is disabled on device 0000000A:00:00.0. See the Known Issues section of the nvidia-smi(1) man page for more information. Run with [--help | -h] switch to get more information on how to enable persistence mode.
GPU clocks set to "(gpuClkMin 1200, gpuClkMax 1200)" for GPU 0000000B:00:00.0

Warning: persistence mode is disabled on device 0000000B:00:00.0. See the Known Issues section of the nvidia-smi(1) man page for more information. Run with [--help | -h] switch to get more information on how to enable persistence mode.
GPU clocks set to "(gpuClkMin 1200, gpuClkMax 1200)" for GPU 0000000C:00:00.0

Warning: persistence mode is disabled on device 0000000C:00:00.0. See the Known Issues section of the nvidia-smi(1) man page for more information. Run with [--help | -h] switch to get more information on how to enable persistence mode.
All done.
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
GPU clocks set to "(gpuClkMin 1600, gpuClkMax 1600)" for GPU 00000001:00:00.0

Warning: persistence mode is disabled on device 00000001:00:00.0. See the Known Issues section of the nvidia-smi(1) man page for more information. Run with [--help | -h] switch to get more information on how to enable persistence mode.
GPU clocks set to "(gpuClkMin 1600, gpuClkMax 1600)" for GPU 00000002:00:00.0

Warning: persistence mode is disabled on device 00000002:00:00.0. See the Known Issues section of the nvidia-smi(1) man page for more information. Run with [--help | -h] switch to get more information on how to enable persistence mode.
GPU clocks set to "(gpuClkMin 1600, gpuClkMax 1600)" for GPU 00000003:00:00.0

Warning: persistence mode is disabled on device 00000003:00:00.0. See the Known Issues section of the nvidia-smi(1) man page for more information. Run with [--help | -h] switch to get more information on how to enable persistence mode.
GPU clocks set to "(gpuClkMin 1600, gpuClkMax 1600)" for GPU 00000008:00:00.0

Warning: persistence mode is disabled on device 00000008:00:00.0. See the Known Issues section of the nvidia-smi(1) man page for more information. Run with [--help | -h] switch to get more information on how to enable persistence mode.
GPU clocks set to "(gpuClkMin 1600, gpuClkMax 1600)" for GPU 00000009:00:00.0

Warning: persistence mode is disabled on device 00000009:00:00.0. See the Known Issues section of the nvidia-smi(1) man page for more information. Run with [--help | -h] switch to get more information on how to enable persistence mode.
GPU clocks set to "(gpuClkMin 1600, gpuClkMax 1600)" for GPU 0000000A:00:00.0

Warning: persistence mode is disabled on device 0000000A:00:00.0. See the Known Issues section of the nvidia-smi(1) man page for more information. Run with [--help | -h] switch to get more information on how to enable persistence mode.
GPU clocks set to "(gpuClkMin 1600, gpuClkMax 1600)" for GPU 0000000B:00:00.0

Warning: persistence mode is disabled on device 0000000B:00:00.0. See the Known Issues section of the nvidia-smi(1) man page for more information. Run with [--help | -h] switch to get more information on how to enable persistence mode.
GPU clocks set to "(gpuClkMin 1600, gpuClkMax 1600)" for GPU 0000000C:00:00.0

Warning: persistence mode is disabled on device 0000000C:00:00.0. See the Known Issues section of the nvidia-smi(1) man page for more information. Run with [--help | -h] switch to get more information on how to enable persistence mode.
All done.
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
GPU clocks set to "(gpuClkMin 1980, gpuClkMax 1980)" for GPU 00000001:00:00.0

Warning: persistence mode is disabled on device 00000001:00:00.0. See the Known Issues section of the nvidia-smi(1) man page for more information. Run with [--help | -h] switch to get more information on how to enable persistence mode.
GPU clocks set to "(gpuClkMin 1980, gpuClkMax 1980)" for GPU 00000002:00:00.0

Warning: persistence mode is disabled on device 00000002:00:00.0. See the Known Issues section of the nvidia-smi(1) man page for more information. Run with [--help | -h] switch to get more information on how to enable persistence mode.
GPU clocks set to "(gpuClkMin 1980, gpuClkMax 1980)" for GPU 00000003:00:00.0

Warning: persistence mode is disabled on device 00000003:00:00.0. See the Known Issues section of the nvidia-smi(1) man page for more information. Run with [--help | -h] switch to get more information on how to enable persistence mode.
GPU clocks set to "(gpuClkMin 1980, gpuClkMax 1980)" for GPU 00000008:00:00.0

Warning: persistence mode is disabled on device 00000008:00:00.0. See the Known Issues section of the nvidia-smi(1) man page for more information. Run with [--help | -h] switch to get more information on how to enable persistence mode.
GPU clocks set to "(gpuClkMin 1980, gpuClkMax 1980)" for GPU 00000009:00:00.0

Warning: persistence mode is disabled on device 00000009:00:00.0. See the Known Issues section of the nvidia-smi(1) man page for more information. Run with [--help | -h] switch to get more information on how to enable persistence mode.
GPU clocks set to "(gpuClkMin 1980, gpuClkMax 1980)" for GPU 0000000A:00:00.0

Warning: persistence mode is disabled on device 0000000A:00:00.0. See the Known Issues section of the nvidia-smi(1) man page for more information. Run with [--help | -h] switch to get more information on how to enable persistence mode.
GPU clocks set to "(gpuClkMin 1980, gpuClkMax 1980)" for GPU 0000000B:00:00.0

Warning: persistence mode is disabled on device 0000000B:00:00.0. See the Known Issues section of the nvidia-smi(1) man page for more information. Run with [--help | -h] switch to get more information on how to enable persistence mode.
GPU clocks set to "(gpuClkMin 1980, gpuClkMax 1980)" for GPU 0000000C:00:00.0

Warning: persistence mode is disabled on device 0000000C:00:00.0. See the Known Issues section of the nvidia-smi(1) man page for more information. Run with [--help | -h] switch to get more information on how to enable persistence mode.
All done.
All done.
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
P99 ttft =  42.22908169746399
Average tbt =  21.36418241213445
P50 tbt =  21.33953268527985
P99 tbt =  42.25110346508027
All GPUs and memories are cold after  23.041107416152954
92886
288  -  96
288  -  256
288  -  1024
1053  -  96
1053  -  256
1053  -  600
8170  -  5
8170  -  256
8170  -  600
8170 - 5
Average ttft =  25.901759883007372
P50 ttft =  25.64304757118225
P99 ttft =  50.16116569519043
Average tbt =  24.381282767617563
P50 tbt =  24.353585839271553
P99 tbt =  48.1642135667801
All GPUs and memories are cold after  24.046991109848022
92886
288  -  96
288  -  256
288  -  1024
1053  -  96
1053  -  256
1053  -  600
8170  -  5
8170  -  256
8170  -  600
8170 - 5
Average ttft =  0.7477110822995504
P50 ttft =  0.5944826602935791
P99 ttft =  1.451159613132477
Average tbt =  0.22586839596430464
P50 tbt =  0.022707664966583253
P99 tbt =  0.7925059275627141
All GPUs and memories are cold after  5.016364336013794
92886
288  -  96
288  -  256
288  -  1024
1053  -  96
1053  -  256
1053  -  600
8170  -  5
8170  -  256
8170  -  600
8170 - 5
Average ttft =  0.9731903757367816
P50 ttft =  0.8724815845489502
P99 ttft =  2.0557229042053224
Average tbt =  0.5279016869408745
P50 tbt =  0.43107001781463633
P99 tbt =  1.5725653457641606
All GPUs and memories are cold after  14.029839038848877
92886
288  -  96
288  -  256
288  -  1024
1053  -  96
1053  -  256
1053  -  600
8170  -  5
8170  -  256
8170  -  600
8170 - 5
Average ttft =  1.5541079998016358
P50 ttft =  1.360703468322754
P99 ttft =  3.7723785448074323
Average tbt =  1.8813017797470097
P50 tbt =  2.070672512054444
P99 tbt =  4.791669890403748
All GPUs and memories are cold after  24.04457449913025
92886
288  -  96
288  -  256
288  -  1024
1053  -  96
1053  -  256
1053  -  600
8170  -  5
8170  -  256
8170  -  600
8170 - 5
Average ttft =  3.360112853166534
P50 ttft =  3.0504815578460693
P99 ttft =  6.3175865650177006
Average tbt =  8.23563549693038
P50 tbt =  8.209101319313051
P99 tbt =  16.293258533477786
All GPUs and memories are cold after  28.054030895233154
92886
288  -  96
288  -  256
288  -  1024
1053  -  96
1053  -  256
1053  -  600
8170  -  5
8170  -  256
8170  -  600
8170 - 5
Average ttft =  5.959396266937256
P50 ttft =  5.52685821056366
P99 ttft =  11.004685316085816
Average tbt =  10.052938561916353
P50 tbt =  10.045660758018496
P99 tbt =  19.84962202572823
All GPUs and memories are cold after  27.054210424423218
92886
288  -  96
288  -  256
288  -  1024
1053  -  96
1053  -  256
1053  -  600
8170  -  5
8170  -  256
8170  -  600
8170 - 5
Average ttft =  9.3419672511518
P50 ttft =  9.960712432861328
P99 ttft =  17.95800808906555
Average tbt =  12.91650389693678
P50 tbt =  12.90023664236069
P99 tbt =  25.60172578835488
All GPUs and memories are cold after  30.124887466430664
92886
288  -  96
288  -  256
288  -  1024
1053  -  96
1053  -  256
1053  -  600
8170  -  5
8170  -  256
8170  -  600
8170 - 5
Average ttft =  12.115293747758212
P50 ttft =  12.521446466445923
P99 ttft =  23.447786102294923
Average tbt =  14.780849303284736
P50 tbt =  14.754698204994206
P99 tbt =  29.25598573875428
All GPUs and memories are cold after  34.063854694366455
92886
288  -  96
288  -  256
288  -  1024
1053  -  96
1053  -  256
1053  -  600
8170  -  5
8170  -  256
8170  -  600
8170 - 5
Average ttft =  14.998780695788831
P50 ttft =  14.739585638046265
P99 ttft =  28.85220854759216
Average tbt =  16.88402241540242
P50 tbt =  16.86325209140778
P99 tbt =  33.36166817331315
All GPUs and memories are cold after  35.05693817138672
92886
288  -  96
288  -  256
288  -  1024
1053  -  96
1053  -  256
1053  -  600
8170  -  5
8170  -  256
8170  -  600
8170 - 5
Average ttft =  0.5833728512128195
P50 ttft =  0.4759131669998169
P99 ttft =  1.1164132857322695
Average tbt =  0.1552327632904053
P50 tbt =  0.020074343681335448
P99 tbt =  0.6349839124679568
All GPUs and memories are cold after  33.059696674346924
92886
288  -  96
288  -  256
288  -  1024
1053  -  96
1053  -  256
1053  -  600
8170  -  5
8170  -  256
8170  -  600
8170 - 5
Average ttft =  0.7163215024130685
P50 ttft =  0.6215214729309082
P99 ttft =  1.4998627662658692
Average tbt =  0.39253570692879824
P50 tbt =  0.34538850784301767
P99 tbt =  1.258572936058045
All GPUs and memories are cold after  43.07231092453003
92886
288  -  96
288  -  256
288  -  1024
1053  -  96
1053  -  256
1053  -  600
8170  -  5
8170  -  256
8170  -  600
8170 - 5
Average ttft =  1.0955419404166085
P50 ttft =  0.8417387008666992
P99 ttft =  2.3999988555908187
Average tbt =  0.9927434478487287
P50 tbt =  0.6705309629440309
P99 tbt =  3.8780067391395563
All GPUs and memories are cold after  59.1218695640564
92886
288  -  96
288  -  256
288  -  1024
1053  -  96
1053  -  256
1053  -  600
8170  -  5
8170  -  256
8170  -  600
8170 - 5
Average ttft =  1.4796615809929081
P50 ttft =  1.2926604747772217
P99 ttft =  3.499652814865113
Average tbt =  2.4583081204716755
P50 tbt =  2.3109704017639165
P99 tbt =  6.106159610748294
All GPUs and memories are cold after  62.110013008117676
92886
288  -  96
288  -  256
288  -  1024
1053  -  96
1053  -  256
1053  -  600
8170  -  5
8170  -  256
8170  -  600
8170 - 5
Average ttft =  3.0326621437072756
P50 ttft =  2.9504891633987427
P99 ttft =  5.341132040023803
Average tbt =  8.082598905086515
P50 tbt =  8.030942428112033
P99 tbt =  15.95628054428101
All GPUs and memories are cold after  69.13011312484741
92886
288  -  96
288  -  256
288  -  1024
1053  -  96
1053  -  256
1053  -  600
8170  -  5
8170  -  256
8170  -  600
8170 - 5
Average ttft =  5.39795982465148
P50 ttft =  5.820793271064758
P99 ttft =  10.369634187221527
Average tbt =  10.360979343578222
P50 tbt =  10.351889550685886
P99 tbt =  20.417903558969503
All GPUs and memories are cold after  82.12410092353821
92886
288  -  96
288  -  256
288  -  1024
1053  -  96
1053  -  256
1053  -  600
8170  -  5
8170  -  256
8170  -  600
8170 - 5
Average ttft =  7.653913112535869
P50 ttft =  7.904551267623901
P99 ttft =  14.842310914993286
Average tbt =  11.80458517139905
P50 tbt =  11.802366638183596
P99 tbt =  23.30278433799744
All GPUs and memories are cold after  90.13708281517029
92886
288  -  96
288  -  256
288  -  1024
1053  -  96
1053  -  256
1053  -  600
8170  -  5
8170  -  256
8170  -  600
8170 - 5
Average ttft =  10.055232341030994
P50 ttft =  9.784093379974365
P99 ttft =  19.181726665496825
Average tbt =  13.481652404314067
P50 tbt =  13.455236434936527
P99 tbt =  26.657390913009646
All GPUs and memories are cold after  89.1466076374054
92886
288  -  96
288  -  256
288  -  1024
1053  -  96
1053  -  256
1053  -  600
8170  -  5
8170  -  256
8170  -  600
8170 - 5
Average ttft =  0.5490790804227194
P50 ttft =  0.44980669021606445
P99 ttft =  1.045431070327759
Average tbt =  0.14769199490547183
P50 tbt =  0.01940364837646484
P99 tbt =  0.603173269748688
All GPUs and memories are cold after  85.15538191795349
92886
288  -  96
288  -  256
288  -  1024
1053  -  96
1053  -  256
1053  -  600
8170  -  5
8170  -  256
8170  -  600
8170 - 5
Average ttft =  0.6696361360095796
P50 ttft =  0.5656235218048096
P99 ttft =  1.371409606933594
Average tbt =  0.37918681530725395
P50 tbt =  0.32781004905700695
P99 tbt =  1.1927415418624883
All GPUs and memories are cold after  86.13004851341248
92886
288  -  96
288  -  256
288  -  1024
1053  -  96
1053  -  256
1053  -  600
8170  -  5
8170  -  256
8170  -  600
8170 - 5
Average ttft =  0.9640100479125977
P50 ttft =  0.7887861728668213
P99 ttft =  2.004234852790831
Average tbt =  0.9415085956028532
P50 tbt =  0.6369772911071778
P99 tbt =  3.6366985983848568
All GPUs and memories are cold after  105.15540146827698
92886
288  -  96
288  -  256
288  -  1024
1053  -  96
1053  -  256
1053  -  600
8170  -  5
8170  -  256
8170  -  600
8170 - 5
Average ttft =  1.1628363074325934
P50 ttft =  0.8919603824615479
P99 ttft =  2.8644335746765144
Average tbt =  1.9323402835101617
P50 tbt =  1.7074202299118046
P99 tbt =  5.295958237648011
All GPUs and memories are cold after  144.2815601825714
92886
288  -  96
288  -  256
288  -  1024
1053  -  96
1053  -  256
1053  -  600
8170  -  5
8170  -  256
8170  -  600
8170 - 5
Average ttft =  2.3412489557266234
P50 ttft =  2.4190521240234375
P99 ttft =  huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
GPU clocks set to "(gpuClkMin 800, gpuClkMax 800)" for GPU 00000001:00:00.0

Warning: persistence mode is disabled on device 00000001:00:00.0. See the Known Issues section of the nvidia-smi(1) man page for more information. Run with [--help | -h] switch to get more information on how to enable persistence mode.
GPU clocks set to "(gpuClkMin 800, gpuClkMax 800)" for GPU 00000002:00:00.0

Warning: persistence mode is disabled on device 00000002:00:00.0. See the Known Issues section of the nvidia-smi(1) man page for more information. Run with [--help | -h] switch to get more information on how to enable persistence mode.
GPU clocks set to "(gpuClkMin 800, gpuClkMax 800)" for GPU 00000003:00:00.0

Warning: persistence mode is disabled on device 00000003:00:00.0. See the Known Issues section of the nvidia-smi(1) man page for more information. Run with [--help | -h] switch to get more information on how to enable persistence mode.
GPU clocks set to "(gpuClkMin 800, gpuClkMax 800)" for GPU 00000008:00:00.0

Warning: persistence mode is disabled on device 00000008:00:00.0. See the Known Issues section of the nvidia-smi(1) man page for more information. Run with [--help | -h] switch to get more information on how to enable persistence mode.
GPU clocks set to "(gpuClkMin 800, gpuClkMax 800)" for GPU 00000009:00:00.0

Warning: persistence mode is disabled on device 00000009:00:00.0. See the Known Issues section of the nvidia-smi(1) man page for more information. Run with [--help | -h] switch to get more information on how to enable persistence mode.
GPU clocks set to "(gpuClkMin 800, gpuClkMax 800)" for GPU 0000000A:00:00.0

Warning: persistence mode is disabled on device 0000000A:00:00.0. See the Known Issues section of the nvidia-smi(1) man page for more information. Run with [--help | -h] switch to get more information on how to enable persistence mode.
GPU clocks set to "(gpuClkMin 800, gpuClkMax 800)" for GPU 0000000B:00:00.0

Warning: persistence mode is disabled on device 0000000B:00:00.0. See the Known Issues section of the nvidia-smi(1) man page for more information. Run with [--help | -h] switch to get more information on how to enable persistence mode.
GPU clocks set to "(gpuClkMin 800, gpuClkMax 800)" for GPU 0000000C:00:00.0

Warning: persistence mode is disabled on device 0000000C:00:00.0. See the Known Issues section of the nvidia-smi(1) man page for more information. Run with [--help | -h] switch to get more information on how to enable persistence mode.
All done.
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
GPU clocks set to "(gpuClkMin 1200, gpuClkMax 1200)" for GPU 00000001:00:00.0

Warning: persistence mode is disabled on device 00000001:00:00.0. See the Known Issues section of the nvidia-smi(1) man page for more information. Run with [--help | -h] switch to get more information on how to enable persistence mode.
GPU clocks set to "(gpuClkMin 1200, gpuClkMax 1200)" for GPU 00000002:00:00.0

Warning: persistence mode is disabled on device 00000002:00:00.0. See the Known Issues section of the nvidia-smi(1) man page for more information. Run with [--help | -h] switch to get more information on how to enable persistence mode.
GPU clocks set to "(gpuClkMin 1200, gpuClkMax 1200)" for GPU 00000003:00:00.0

Warning: persistence mode is disabled on device 00000003:00:00.0. See the Known Issues section of the nvidia-smi(1) man page for more information. Run with [--help | -h] switch to get more information on how to enable persistence mode.
GPU clocks set to "(gpuClkMin 1200, gpuClkMax 1200)" for GPU 00000008:00:00.0

Warning: persistence mode is disabled on device 00000008:00:00.0. See the Known Issues section of the nvidia-smi(1) man page for more information. Run with [--help | -h] switch to get more information on how to enable persistence mode.
GPU clocks set to "(gpuClkMin 1200, gpuClkMax 1200)" for GPU 00000009:00:00.0

Warning: persistence mode is disabled on device 00000009:00:00.0. See the Known Issues section of the nvidia-smi(1) man page for more information. Run with [--help | -h] switch to get more information on how to enable persistence mode.
GPU clocks set to "(gpuClkMin 1200, gpuClkMax 1200)" for GPU 0000000A:00:00.0

Warning: persistence mode is disabled on device 0000000A:00:00.0. See the Known Issues section of the nvidia-smi(1) man page for more information. Run with [--help | -h] switch to get more information on how to enable persistence mode.
GPU clocks set to "(gpuClkMin 1200, gpuClkMax 1200)" for GPU 0000000B:00:00.0

Warning: persistence mode is disabled on device 0000000B:00:00.0. See the Known Issues section of the nvidia-smi(1) man page for more information. Run with [--help | -h] switch to get more information on how to enable persistence mode.
GPU clocks set to "(gpuClkMin 1200, gpuClkMax 1200)" for GPU 0000000C:00:00.0

Warning: persistence mode is disabled on device 0000000C:00:00.0. See the Known Issues section of the nvidia-smi(1) man page for more information. Run with [--help | -h] switch to get more information on how to enable persistence mode.
All done.
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
GPU clocks set to "(gpuClkMin 1600, gpuClkMax 1600)" for GPU 00000001:00:00.0

Warning: persistence mode is disabled on device 00000001:00:00.0. See the Known Issues section of the nvidia-smi(1) man page for more information. Run with [--help | -h] switch to get more information on how to enable persistence mode.
GPU clocks set to "(gpuClkMin 1600, gpuClkMax 1600)" for GPU 00000002:00:00.0

Warning: persistence mode is disabled on device 00000002:00:00.0. See the Known Issues section of the nvidia-smi(1) man page for more information. Run with [--help | -h] switch to get more information on how to enable persistence mode.
GPU clocks set to "(gpuClkMin 1600, gpuClkMax 1600)" for GPU 00000003:00:00.0

Warning: persistence mode is disabled on device 00000003:00:00.0. See the Known Issues section of the nvidia-smi(1) man page for more information. Run with [--help | -h] switch to get more information on how to enable persistence mode.
GPU clocks set to "(gpuClkMin 1600, gpuClkMax 1600)" for GPU 00000008:00:00.0

Warning: persistence mode is disabled on device 00000008:00:00.0. See the Known Issues section of the nvidia-smi(1) man page for more information. Run with [--help | -h] switch to get more information on how to enable persistence mode.
GPU clocks set to "(gpuClkMin 1600, gpuClkMax 1600)" for GPU 00000009:00:00.0

Warning: persistence mode is disabled on device 00000009:00:00.0. See the Known Issues section of the nvidia-smi(1) man page for more information. Run with [--help | -h] switch to get more information on how to enable persistence mode.
GPU clocks set to "(gpuClkMin 1600, gpuClkMax 1600)" for GPU 0000000A:00:00.0

Warning: persistence mode is disabled on device 0000000A:00:00.0. See the Known Issues section of the nvidia-smi(1) man page for more information. Run with [--help | -h] switch to get more information on how to enable persistence mode.
GPU clocks set to "(gpuClkMin 1600, gpuClkMax 1600)" for GPU 0000000B:00:00.0

Warning: persistence mode is disabled on device 0000000B:00:00.0. See the Known Issues section of the nvidia-smi(1) man page for more information. Run with [--help | -h] switch to get more information on how to enable persistence mode.
GPU clocks set to "(gpuClkMin 1600, gpuClkMax 1600)" for GPU 0000000C:00:00.0

Warning: persistence mode is disabled on device 0000000C:00:00.0. See the Known Issues section of the nvidia-smi(1) man page for more information. Run with [--help | -h] switch to get more information on how to enable persistence mode.
All done.
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
4.212837750911713
Average tbt =  7.623815183639525
P50 tbt =  7.624627172946932
P99 tbt =  15.036327836036685
All GPUs and memories are cold after  170.25464725494385
92886
288  -  96
288  -  256
288  -  1024
1053  -  96
1053  -  256
1053  -  600
8170  -  5
8170  -  256
8170  -  600
8170 - 5
Average ttft =  4.738004866987467
P50 ttft =  5.144971489906311
P99 ttft =  9.063030953407287
Average tbt =  9.834157284349198
P50 tbt =  9.80731022357941
P99 tbt =  19.45919561004639
All GPUs and memories are cold after  171.25928854942322
92886
288  -  96
288  -  256
288  -  1024
1053  -  96
1053  -  256
1053  -  600
8170  -  5
8170  -  256
8170  -  600
8170 - 5
Average ttft =  6.889684944936674
P50 ttft =  7.200398921966553
P99 ttft =  13.32914059638977
Average tbt =  11.262891756998348
P50 tbt =  11.217052865028384
P99 tbt =  22.25607776355744
All GPUs and memories are cold after  188.26628994941711
92886
288  -  96
288  -  256
288  -  1024
1053  -  96
1053  -  256
1053  -  600
8170  -  5
8170  -  256
8170  -  600
8170 - 5
Average ttft =  9.068778164415475
P50 ttft =  8.829992771148682
P99 ttft =  17.278437514305114
Average tbt =  12.843686297715434
P50 tbt =  12.807132673263553
P99 tbt =  25.352318798065188
All GPUs and memories are cold after  199.28198862075806
92886
288  -  96
288  -  256
288  -  1024
1053  -  96
1053  -  256
1053  -  600
8170  -  5
8170  -  256
8170  -  600
8170 - 5
Average ttft =  1.135843276977539
P50 ttft =  0.8723037242889404
P99 ttft =  2.2060292100906373
Average tbt =  0.5440495947996775
P50 tbt =  0.6205633640289309
P99 tbt =  1.7350749382972726
All GPUs and memories are cold after  1.0722072124481201
92886
288  -  96
288  -  256
288  -  1024
1053  -  96
1053  -  256
1053  -  600
8170  -  5
8170  -  256
8170  -  600
8170 - 5
Average ttft =  1.6061698028019495
P50 ttft =  1.5049560070037842
P99 ttft =  3.29909782409668
Average tbt =  1.1013848077683224
P50 tbt =  0.6280784606933596
P99 tbt =  3.4584700584411636
All GPUs and memories are cold after  7.170566082000732
92886
288  -  96
288  -  256
288  -  1024
1053  -  96
1053  -  256
1053  -  600
8170  -  5
8170  -  256
8170  -  600
8170 - 5
Average ttft =  5.22317179271153
P50 ttft =  4.697582006454468
P99 ttft =  10.673991365432736
Average tbt =  10.152530080250331
P50 tbt =  10.182445144653322
P99 tbt =  20.005925163745882
All GPUs and memories are cold after  14.035788774490356
92886
288  -  96
288  -  256
288  -  1024
1053  -  96
1053  -  256
1053  -  600
8170  -  5
8170  -  256
8170  -  600
8170 - 5
Average ttft =  8.73190154098883
P50 ttft =  8.126653909683228
P99 ttft =  15.385441207885743
Average tbt =  11.864705662029545
P50 tbt =  11.856606149673464
P99 tbt =  23.423523721694952
All GPUs and memories are cold after  17.03772282600403
92886
288  -  96
288  -  256
288  -  1024
1053  -  96
1053  -  256
1053  -  600
8170  -  5
8170  -  256
8170  -  600
8170 - 5
Average ttft =  12.691043663024903
P50 ttft =  12.307403326034546
P99 ttft =  24.032758939266206
Average tbt =  14.579888561725626
P50 tbt =  14.528724408149724
P99 tbt =  28.814741106987007
All GPUs and memories are cold after  18.03193950653076
92886
288  -  96
288  -  256
288  -  1024
1053  -  96
1053  -  256
1053  -  600
8170  -  5
8170  -  256
8170  -  600
8170 - 5
Average ttft =  17.67227566614747
P50 ttft =  18.28253746032715
P99 ttft =  34.30526584863662
Average tbt =  18.67380291596055
P50 tbt =  18.668718135356908
P99 tbt =  36.884329561710366
All GPUs and memories are cold after  19.03576636314392
92886
288  -  96
288  -  256
288  -  1024
1053  -  96
1053  -  256
1053  -  600
8170  -  5
8170  -  256
8170  -  600
8170 - 5
Average ttft =  21.895307828302254
P50 ttft =  22.053592681884766
P99 ttft =  42.40624296188354
Average tbt =  21.377486760648974
P50 tbt =  21.351826190948493
P99 tbt =  42.37686932086945
All GPUs and memories are cold after  19.13896369934082
92886
288  -  96
288  -  256
288  -  1024
1053  -  96
1053  -  256
1053  -  600
8170  -  5
8170  -  256
8170  -  600
8170 - 5
Average ttft =  25.934292695608484
P50 ttft =  25.6770498752594
P99 ttft =  50.226152362823484
Average tbt =  24.403439042654377
P50 tbt =  24.387771630287176
P99 tbt =  48.21169171333314
All GPUs and memories are cold after  20.04109215736389
92886
288  -  96
288  -  256
288  -  1024
1053  -  96
1053  -  256
1053  -  600
8170  -  5
8170  -  256
8170  -  600
8170 - 5
Average ttft =  0.7508815328280131
P50 ttft =  0.600553035736084
P99 ttft =  1.4557727789878847
Average tbt =  0.22649025917053225
P50 tbt =  0.023131024837493897
P99 tbt =  0.794091315031052
All GPUs and memories are cold after  3.0131421089172363
92886
288  -  96
288  -  256
288  -  1024
1053  -  96
1053  -  256
1053  -  600
8170  -  5
8170  -  256
8170  -  600
8170 - 5
Average ttft =  0.97617354847136
P50 ttft =  0.8706076145172119
P99 ttft =  2.0608211994171146
Average tbt =  0.5284408421743486
P50 tbt =  0.43109562397003187
P99 tbt =  1.575808215141297
All GPUs and memories are cold after  14.05269479751587
92886
288  -  96
288  -  256
288  -  1024
1053  -  96
1053  -  256
1053  -  600
8170  -  5
8170  -  256
8170  -  600
8170 - 5
Average ttft =  1.565001222065517
P50 ttft =  1.3539373874664307
P99 ttft =  3.75687940597534
Average tbt =  1.9294641235896524
P50 tbt =  2.0652436733245856
P99 tbt =  4.781458285331726
All GPUs and memories are cold after  21.036794900894165
92886
288  -  96
288  -  256
288  -  1024
1053  -  96
1053  -  256
1053  -  600
8170  -  5
8170  -  256
8170  -  600
8170 - 5
Average ttft =  3.34014914093948
P50 ttft =  2.987755537033081
P99 ttft =  6.26286120414734
Average tbt =  8.221697009482035
P50 tbt =  8.221980118751528
P99 tbt =  16.219196600914007
All GPUs and memories are cold after  24.045145511627197
92886
288  -  96
288  -  256
288  -  1024
1053  -  96
1053  -  256
1053  -  600
8170  -  5
8170  -  256
8170  -  600
8170 - 5
Average ttft =  6.06851743221283
P50 ttft =  5.6746193170547485
P99 ttft =  11.151025862693785
Average tbt =  10.090435733795168
P50 tbt =  10.05362721681595
P99 tbt =  19.94835259389878
All GPUs and memories are cold after  29.055567979812622
92886
288  -  96
288  -  256
288  -  1024
1053  -  96
1053  -  256
1053  -  600
8170  -  5
8170  -  256
8170  -  600
8170 - 5
Average ttft =  9.300068449229002
P50 ttft =  9.990708470344543
P99 ttft =  17.99291554689407
Average tbt =  12.977710276469592
P50 tbt =  12.914306592941287
P99 tbt =  25.63056798291207
All GPUs and memories are cold after  30.0522723197937
92886
288  -  96
288  -  256
288  -  1024
1053  -  96
1053  -  256
1053  -  600
8170  -  5
8170  -  256
8170  -  600
8170 - 5
Average ttft =  12.00807059301089
P50 ttft =  12.396482467651367
P99 ttft =  23.327051868438723
Average tbt =  14.779763160013177
P50 tbt =  14.758853960037236
P99 tbt =  29.17798527050019
All GPUs and memories are cold after  32.07570672035217
92886
288  -  96
288  -  256
288  -  1024
1053  -  96
1053  -  256
1053  -  600
8170  -  5
8170  -  256
8170  -  600
8170 - 5
Average ttft =  15.084626689014664
P50 ttft =  14.818156719207764
P99 ttft =  28.961095976829526
Average tbt =  16.932311281813202
P50 tbt =  16.9185337305069
P99 tbt =  33.483613684177406
All GPUs and memories are cold after  32.07535791397095
92886
288  -  96
288  -  256
288  -  1024
1053  -  96
1053  -  256
1053  -  600
8170  -  5
8170  -  256
8170  -  600
8170 - 5
Average ttft =  0.5984625816345215
P50 ttft =  0.48495233058929443
P99 ttft =  1.1126985049247744
Average tbt =  0.1560088117917379
P50 tbt =  0.020381498336791995
P99 tbt =  0.6364348104000095
All GPUs and memories are cold after  36.06812882423401
92886
288  -  96
288  -  256
288  -  1024
1053  -  96
1053  -  256
1053  -  600
8170  -  5
8170  -  256
8170  -  600
8170 - 5
Average ttft =  0.7178051812308175
P50 ttft =  0.6175260543823242
P99 ttft =  1.4893520355224612
Average tbt =  0.4157440367199127
P50 tbt =  0.3485918283462525
P99 tbt =  1.2567513942718511
All GPUs and memories are cold after  47.08097839355469
92886
288  -  96
288  -  256
288  -  1024
1053  -  96
1053  -  256
1053  -  600
8170  -  5
8170  -  256
8170  -  600
8170 - 5
Average ttft =  1.0572935444968088
P50 ttft =  0.8429572582244873
P99 ttft =  2.287553911209105
Average tbt = huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
GPU clocks set to "(gpuClkMin 1980, gpuClkMax 1980)" for GPU 00000001:00:00.0

Warning: persistence mode is disabled on device 00000001:00:00.0. See the Known Issues section of the nvidia-smi(1) man page for more information. Run with [--help | -h] switch to get more information on how to enable persistence mode.
GPU clocks set to "(gpuClkMin 1980, gpuClkMax 1980)" for GPU 00000002:00:00.0

Warning: persistence mode is disabled on device 00000002:00:00.0. See the Known Issues section of the nvidia-smi(1) man page for more information. Run with [--help | -h] switch to get more information on how to enable persistence mode.
GPU clocks set to "(gpuClkMin 1980, gpuClkMax 1980)" for GPU 00000003:00:00.0

Warning: persistence mode is disabled on device 00000003:00:00.0. See the Known Issues section of the nvidia-smi(1) man page for more information. Run with [--help | -h] switch to get more information on how to enable persistence mode.
GPU clocks set to "(gpuClkMin 1980, gpuClkMax 1980)" for GPU 00000008:00:00.0

Warning: persistence mode is disabled on device 00000008:00:00.0. See the Known Issues section of the nvidia-smi(1) man page for more information. Run with [--help | -h] switch to get more information on how to enable persistence mode.
GPU clocks set to "(gpuClkMin 1980, gpuClkMax 1980)" for GPU 00000009:00:00.0

Warning: persistence mode is disabled on device 00000009:00:00.0. See the Known Issues section of the nvidia-smi(1) man page for more information. Run with [--help | -h] switch to get more information on how to enable persistence mode.
GPU clocks set to "(gpuClkMin 1980, gpuClkMax 1980)" for GPU 0000000A:00:00.0

Warning: persistence mode is disabled on device 0000000A:00:00.0. See the Known Issues section of the nvidia-smi(1) man page for more information. Run with [--help | -h] switch to get more information on how to enable persistence mode.
GPU clocks set to "(gpuClkMin 1980, gpuClkMax 1980)" for GPU 0000000B:00:00.0

Warning: persistence mode is disabled on device 0000000B:00:00.0. See the Known Issues section of the nvidia-smi(1) man page for more information. Run with [--help | -h] switch to get more information on how to enable persistence mode.
GPU clocks set to "(gpuClkMin 1980, gpuClkMax 1980)" for GPU 0000000C:00:00.0

Warning: persistence mode is disabled on device 0000000C:00:00.0. See the Known Issues section of the nvidia-smi(1) man page for more information. Run with [--help | -h] switch to get more information on how to enable persistence mode.
All done.
All done.
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
GPU clocks set to "(gpuClkMin 800, gpuClkMax 800)" for GPU 00000001:00:00.0

Warning: persistence mode is disabled on device 00000001:00:00.0. See the Known Issues section of the nvidia-smi(1) man page for more information. Run with [--help | -h] switch to get more information on how to enable persistence mode.
GPU clocks set to "(gpuClkMin 800, gpuClkMax 800)" for GPU 00000002:00:00.0

Warning: persistence mode is disabled on device 00000002:00:00.0. See the Known Issues section of the nvidia-smi(1) man page for more information. Run with [--help | -h] switch to get more information on how to enable persistence mode.
GPU clocks set to "(gpuClkMin 800, gpuClkMax 800)" for GPU 00000003:00:00.0

Warning: persistence mode is disabled on device 00000003:00:00.0. See the Known Issues section of the nvidia-smi(1) man page for more information. Run with [--help | -h] switch to get more information on how to enable persistence mode.
GPU clocks set to "(gpuClkMin 800, gpuClkMax 800)" for GPU 00000008:00:00.0

Warning: persistence mode is disabled on device 00000008:00:00.0. See the Known Issues section of the nvidia-smi(1) man page for more information. Run with [--help | -h] switch to get more information on how to enable persistence mode.
GPU clocks set to "(gpuClkMin 800, gpuClkMax 800)" for GPU 00000009:00:00.0

Warning: persistence mode is disabled on device 00000009:00:00.0. See the Known Issues section of the nvidia-smi(1) man page for more information. Run with [--help | -h] switch to get more information on how to enable persistence mode.
GPU clocks set to "(gpuClkMin 800, gpuClkMax 800)" for GPU 0000000A:00:00.0

Warning: persistence mode is disabled on device 0000000A:00:00.0. See the Known Issues section of the nvidia-smi(1) man page for more information. Run with [--help | -h] switch to get more information on how to enable persistence mode.
GPU clocks set to "(gpuClkMin 800, gpuClkMax 800)" for GPU 0000000B:00:00.0

Warning: persistence mode is disabled on device 0000000B:00:00.0. See the Known Issues section of the nvidia-smi(1) man page for more information. Run with [--help | -h] switch to get more information on how to enable persistence mode.
GPU clocks set to "(gpuClkMin 800, gpuClkMax 800)" for GPU 0000000C:00:00.0

Warning: persistence mode is disabled on device 0000000C:00:00.0. See the Known Issues section of the nvidia-smi(1) man page for more information. Run with [--help | -h] switch to get more information on how to enable persistence mode.
All done.
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
GPU clocks set to "(gpuClkMin 1200, gpuClkMax 1200)" for GPU 00000001:00:00.0

Warning: persistence mode is disabled on device 00000001:00:00.0. See the Known Issues section of the nvidia-smi(1) man page for more information. Run with [--help | -h] switch to get more information on how to enable persistence mode.
GPU clocks set to "(gpuClkMin 1200, gpuClkMax 1200)" for GPU 00000002:00:00.0

Warning: persistence mode is disabled on device 00000002:00:00.0. See the Known Issues section of the nvidia-smi(1) man page for more information. Run with [--help | -h] switch to get more information on how to enable persistence mode.
GPU clocks set to "(gpuClkMin 1200, gpuClkMax 1200)" for GPU 00000003:00:00.0

Warning: persistence mode is disabled on device 00000003:00:00.0. See the Known Issues section of the nvidia-smi(1) man page for more information. Run with [--help | -h] switch to get more information on how to enable persistence mode.
GPU clocks set to "(gpuClkMin 1200, gpuClkMax 1200)" for GPU 00000008:00:00.0

Warning: persistence mode is disabled on device 00000008:00:00.0. See the Known Issues section of the nvidia-smi(1) man page for more information. Run with [--help | -h] switch to get more information on how to enable persistence mode.
GPU clocks set to "(gpuClkMin 1200, gpuClkMax 1200)" for GPU 00000009:00:00.0

Warning: persistence mode is disabled on device 00000009:00:00.0. See the Known Issues section of the nvidia-smi(1) man page for more information. Run with [--help | -h] switch to get more information on how to enable persistence mode.
GPU clocks set to "(gpuClkMin 1200, gpuClkMax 1200)" for GPU 0000000A:00:00.0

Warning: persistence mode is disabled on device 0000000A:00:00.0. See the Known Issues section of the nvidia-smi(1) man page for more information. Run with [--help | -h] switch to get more information on how to enable persistence mode.
GPU clocks set to "(gpuClkMin 1200, gpuClkMax 1200)" for GPU 0000000B:00:00.0

Warning: persistence mode is disabled on device 0000000B:00:00.0. See the Known Issues section of the nvidia-smi(1) man page for more information. Run with [--help | -h] switch to get more information on how to enable persistence mode.
GPU clocks set to "(gpuClkMin 1200, gpuClkMax 1200)" for GPU 0000000C:00:00.0

Warning: persistence mode is disabled on device 0000000C:00:00.0. See the Known Issues section of the nvidia-smi(1) man page for more information. Run with [--help | -h] switch to get more information on how to enable persistence mode.
All done.
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
 0.9938959101268227
P50 tbt =  0.6714792966842653
P99 tbt =  3.834782179355621
All GPUs and memories are cold after  57.096017599105835
92886
288  -  96
288  -  256
288  -  1024
1053  -  96
1053  -  256
1053  -  600
8170  -  5
8170  -  256
8170  -  600
8170 - 5
Average ttft =  1.340292546807266
P50 ttft =  1.1077218055725098
P99 ttft =  3.1618604660034184
Average tbt =  2.421979614001949
P50 tbt =  2.313294076919556
P99 tbt =  5.776686177253725
All GPUs and memories are cold after  66.14072585105896
92886
288  -  96
288  -  256
288  -  1024
1053  -  96
1053  -  256
1053  -  600
8170  -  5
8170  -  256
8170  -  600
8170 - 5
Average ttft =  3.0735857391357424
P50 ttft =  3.015382766723633
P99 ttft =  5.382559604644775
Average tbt =  8.06767417907715
P50 tbt =  8.035425221920015
P99 tbt =  15.963908036470416
All GPUs and memories are cold after  70.10242033004761
92886
288  -  96
288  -  256
288  -  1024
1053  -  96
1053  -  256
1053  -  600
8170  -  5
8170  -  256
8170  -  600
8170 - 5
Average ttft =  5.561454337090254
P50 ttft =  6.101895689964294
P99 ttft =  10.680427684783934
Average tbt =  10.403684005513787
P50 tbt =  10.312321853637698
P99 tbt =  20.565442366361623
All GPUs and memories are cold after  78.11201763153076
92886
288  -  96
288  -  256
288  -  1024
1053  -  96
1053  -  256
1053  -  600
8170  -  5
8170  -  256
8170  -  600
8170 - 5
Average ttft =  7.746124172863895
P50 ttft =  7.9117591381073
P99 ttft =  14.989450368881226
Average tbt =  11.887139252440575
P50 tbt =  11.931605029106144
P99 tbt =  23.450873637199408
All GPUs and memories are cold after  79.11550450325012
92886
288  -  96
288  -  256
288  -  1024
1053  -  96
1053  -  256
1053  -  600
8170  -  5
8170  -  256
8170  -  600
8170 - 5
Average ttft =  9.961102652262491
P50 ttft =  9.691328287124634
P99 ttft =  19.096628389358518
Average tbt =  13.569783900444765
P50 tbt =  13.560059714317326
P99 tbt =  26.670858499050144
All GPUs and memories are cold after  91.12616991996765
92886
288  -  96
288  -  256
288  -  1024
1053  -  96
1053  -  256
1053  -  600
8170  -  5
8170  -  256
8170  -  600
8170 - 5
Average ttft =  0.551956315835317
P50 ttft =  0.452160120010376
P99 ttft =  1.0483686161041261
Average tbt =  0.14827263752619427
P50 tbt =  0.019920945167541504
P99 tbt =  0.6069793465137485
All GPUs and memories are cold after  100.15799832344055
92886
288  -  96
288  -  256
288  -  1024
1053  -  96
1053  -  256
1053  -  600
8170  -  5
8170  -  256
8170  -  600
8170 - 5
Average ttft =  0.6694199471246629
P50 ttft =  0.5257172584533691
P99 ttft =  1.3754849433898928
Average tbt =  0.3646946555092222
P50 tbt =  0.3281843662261964
P99 tbt =  1.1957270812988285
All GPUs and memories are cold after  90.21049618721008
92886
288  -  96
288  -  256
288  -  1024
1053  -  96
1053  -  256
1053  -  600
8170  -  5
8170  -  256
8170  -  600
8170 - 5
Average ttft =  0.9604031154087611
P50 ttft =  0.7904694080352783
P99 ttft =  1.9967438364028918
Average tbt =  0.9411894627979825
P50 tbt =  0.6357306957244875
P99 tbt =  3.632031413078308
All GPUs and memories are cold after  130.20519995689392
92886
288  -  96
288  -  256
288  -  1024
1053  -  96
1053  -  256
1053  -  600
8170  -  5
8170  -  256
8170  -  600
8170 - 5
Average ttft =  1.1621778592830752
P50 ttft =  0.8909463882446289
P99 ttft =  2.858137702941895
Average tbt =  1.8303598345779792
P50 tbt =  1.7060023069381716
P99 tbt =  5.159647545814516
All GPUs and memories are cold after  131.1873710155487
92886
288  -  96
288  -  256
288  -  1024
1053  -  96
1053  -  256
1053  -  600
8170  -  5
8170  -  256
8170  -  600
8170 - 5
Average ttft =  2.4236895751953127
P50 ttft =  2.496879458427429
P99 ttft =  4.366791105270385
Average tbt =  7.680225663185122
P50 tbt =  7.6831501126289385
P99 tbt =  15.151270636796955
All GPUs and memories are cold after  138.19204187393188
92886
288  -  96
288  -  256
288  -  1024
1053  -  96
1053  -  256
1053  -  600
8170  -  5
8170  -  256
8170  -  600
8170 - 5
Average ttft =  4.642700146883726
P50 ttft =  5.011944890022278
P99 ttft =  8.921726024150848
Average tbt =  9.84972034394741
P50 tbt =  9.847262668609622
P99 tbt =  19.40452143979073
All GPUs and memories are cold after  126.23830509185791
92886
288  -  96
288  -  256
288  -  1024
1053  -  96
1053  -  256
1053  -  600
8170  -  5
8170  -  256
8170  -  600
8170 - 5
Average ttft =  6.849678898510867
P50 ttft =  7.074620485305786
P99 ttft =  13.231082801818848
Average tbt =  11.227587972928402
P50 tbt =  11.222336316108706
P99 tbt =  22.183430362701422
All GPUs and memories are cold after  258.39306473731995
92886
288  -  96
288  -  256
288  -  1024
1053  -  96
1053  -  256
1053  -  600
8170  -  5
8170  -  256
8170  -  600
8170 - 5
Average ttft =  9.106072948639651
P50 ttft =  8.830841779708862
P99 ttft =  17.314579620361325
Average tbt =  12.820803584822686
P50 tbt =  12.810456323623661
P99 tbt =  25.348038239955905
All GPUs and memories are cold after  147.22284388542175
92886
288  -  96
288  -  256
288  -  1024
1053  -  96
1053  -  256
1053  -  600
8170  -  5
8170  -  256
8170  -  600
8170 - 5
Average ttft =  1.134310523668925
P50 ttft =  0.8707743883132935
P99 ttft =  2.2027401018142703
Average tbt =  0.5440565407276156
P50 tbt =  0.6194318890571595
P99 tbt =  1.735876979589463
All GPUs and memories are cold after  1.077955722808838
92886
288  -  96
288  -  256
288  -  1024
1053  -  96
1053  -  256
1053  -  600
8170  -  5
8170  -  256
8170  -  600
8170 - 5
Average ttft =  1.639825707390195
P50 ttft =  1.5054726600646973
P99 ttft =  3.2908134460449223
Average tbt =  1.1044013977050788
P50 tbt =  0.6268979310989382
P99 tbt =  3.524482836723329
All GPUs and memories are cold after  7.043551683425903
92886
288  -  96
288  -  256
288  -  1024
1053  -  96
1053  -  256
1053  -  600
8170  -  5
8170  -  256
8170  -  600
8170 - 5
Average ttft =  5.1705366815839495
P50 ttft =  4.692310094833374
P99 ttft =  10.522983331680296
Average tbt =  10.08282102584839
P50 tbt =  10.078893709182742
P99 tbt =  19.901470288753515
All GPUs and memories are cold after  14.033201932907104
92886
288  -  96
288  -  256
288  -  1024
1053  -  96
1053  -  256
1053  -  600
8170  -  5
8170  -  256
8170  -  600
8170 - 5
Average ttft =  8.809576319485176
P50 ttft =  8.264519691467285
P99 ttft =  15.518499708175659
Average tbt =  11.901359498791582
P50 tbt =  11.859001779556277
P99 tbt =  23.514809718132025
All GPUs and memories are cold after  16.036461353302002
92886
288  -  96
288  -  256
288  -  1024
1053  -  96
1053  -  256
1053  -  600
8170  -  5
8170  -  256
8170  -  600
8170 - 5
Average ttft =  12.712934913635253
P50 ttft =  12.216371774673462
P99 ttft =  24.082859485149385
Average tbt =  14.598580652713778
P50 tbt =  14.636964917182926
P99 tbt =  28.82264272284508
All GPUs and memories are cold after  18.072526931762695
92886
288  -  96
288  -  256
288  -  1024
1053  -  96
1053  -  256
1053  -  600
8170  -  5
8170  -  256
8170  -  600
8170 - 5
Average ttft =  17.759436804801226
P50 ttft =  18.38995134830475
P99 ttft =  34.392983648777005
Average tbt =  18.672683344781404
P50 tbt =  18.64690870046616
P99 tbt =  36.95024842977524
All GPUs and memories are cold after  19.037243604660034
92886
288  -  96
288  -  256
288  -  1024
1053  -  96
1053  -  256
1053  -  600
8170  -  5
8170  -  256
8170  -  600
8170 - 5
Average ttft =  21.829513330982156
P50 ttft =  21.992575883865356
P99 ttft =  42.38778889656067
Average tbt =  21.41458532254989
P50 tbt =  21.38575160503388
P99 tbt =  42.37485490989686
All GPUs and memories are cold after  22.037260055541992
92886
288  -  96
288  -  256
288  -  1024
1053  -  96
1053  -  256
1053  -  600
8170  -  5
8170  -  256
8170  -  600
8170 - 5
Average ttft =  25.990307069686523
P50 ttft =  25.755134105682373
P99 ttft =  50.244912624359124
Average tbt =  24.375262007943117
P50 tbt =  24.334311413764958
P99 tbt =  48.2217446551323
All GPUs and memories are cold after  21.04013156890869
92886
288  -  96
288  -  256
288  -  1024
1053  -  96
1053  -  256
1053  -  600
8170  -  5
8170  -  256
8170  -  600
8170 - 5
Average ttft =  0.7492405374844869
P50 ttft =  0.595531702041626
P99 ttft =  1.4522977042198184
Average tbt =  0.22663272619247443
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
GPU clocks set to "(gpuClkMin 1600, gpuClkMax 1600)" for GPU 00000001:00:00.0

Warning: persistence mode is disabled on device 00000001:00:00.0. See the Known Issues section of the nvidia-smi(1) man page for more information. Run with [--help | -h] switch to get more information on how to enable persistence mode.
GPU clocks set to "(gpuClkMin 1600, gpuClkMax 1600)" for GPU 00000002:00:00.0

Warning: persistence mode is disabled on device 00000002:00:00.0. See the Known Issues section of the nvidia-smi(1) man page for more information. Run with [--help | -h] switch to get more information on how to enable persistence mode.
GPU clocks set to "(gpuClkMin 1600, gpuClkMax 1600)" for GPU 00000003:00:00.0

Warning: persistence mode is disabled on device 00000003:00:00.0. See the Known Issues section of the nvidia-smi(1) man page for more information. Run with [--help | -h] switch to get more information on how to enable persistence mode.
GPU clocks set to "(gpuClkMin 1600, gpuClkMax 1600)" for GPU 00000008:00:00.0

Warning: persistence mode is disabled on device 00000008:00:00.0. See the Known Issues section of the nvidia-smi(1) man page for more information. Run with [--help | -h] switch to get more information on how to enable persistence mode.
GPU clocks set to "(gpuClkMin 1600, gpuClkMax 1600)" for GPU 00000009:00:00.0

Warning: persistence mode is disabled on device 00000009:00:00.0. See the Known Issues section of the nvidia-smi(1) man page for more information. Run with [--help | -h] switch to get more information on how to enable persistence mode.
GPU clocks set to "(gpuClkMin 1600, gpuClkMax 1600)" for GPU 0000000A:00:00.0

Warning: persistence mode is disabled on device 0000000A:00:00.0. See the Known Issues section of the nvidia-smi(1) man page for more information. Run with [--help | -h] switch to get more information on how to enable persistence mode.
GPU clocks set to "(gpuClkMin 1600, gpuClkMax 1600)" for GPU 0000000B:00:00.0

Warning: persistence mode is disabled on device 0000000B:00:00.0. See the Known Issues section of the nvidia-smi(1) man page for more information. Run with [--help | -h] switch to get more information on how to enable persistence mode.
GPU clocks set to "(gpuClkMin 1600, gpuClkMax 1600)" for GPU 0000000C:00:00.0

Warning: persistence mode is disabled on device 0000000C:00:00.0. See the Known Issues section of the nvidia-smi(1) man page for more information. Run with [--help | -h] switch to get more information on how to enable persistence mode.
All done.
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
GPU clocks set to "(gpuClkMin 1980, gpuClkMax 1980)" for GPU 00000001:00:00.0

Warning: persistence mode is disabled on device 00000001:00:00.0. See the Known Issues section of the nvidia-smi(1) man page for more information. Run with [--help | -h] switch to get more information on how to enable persistence mode.
GPU clocks set to "(gpuClkMin 1980, gpuClkMax 1980)" for GPU 00000002:00:00.0

Warning: persistence mode is disabled on device 00000002:00:00.0. See the Known Issues section of the nvidia-smi(1) man page for more information. Run with [--help | -h] switch to get more information on how to enable persistence mode.
GPU clocks set to "(gpuClkMin 1980, gpuClkMax 1980)" for GPU 00000003:00:00.0

Warning: persistence mode is disabled on device 00000003:00:00.0. See the Known Issues section of the nvidia-smi(1) man page for more information. Run with [--help | -h] switch to get more information on how to enable persistence mode.
GPU clocks set to "(gpuClkMin 1980, gpuClkMax 1980)" for GPU 00000008:00:00.0

Warning: persistence mode is disabled on device 00000008:00:00.0. See the Known Issues section of the nvidia-smi(1) man page for more information. Run with [--help | -h] switch to get more information on how to enable persistence mode.
GPU clocks set to "(gpuClkMin 1980, gpuClkMax 1980)" for GPU 00000009:00:00.0

Warning: persistence mode is disabled on device 00000009:00:00.0. See the Known Issues section of the nvidia-smi(1) man page for more information. Run with [--help | -h] switch to get more information on how to enable persistence mode.
GPU clocks set to "(gpuClkMin 1980, gpuClkMax 1980)" for GPU 0000000A:00:00.0

Warning: persistence mode is disabled on device 0000000A:00:00.0. See the Known Issues section of the nvidia-smi(1) man page for more information. Run with [--help | -h] switch to get more information on how to enable persistence mode.
GPU clocks set to "(gpuClkMin 1980, gpuClkMax 1980)" for GPU 0000000B:00:00.0

Warning: persistence mode is disabled on device 0000000B:00:00.0. See the Known Issues section of the nvidia-smi(1) man page for more information. Run with [--help | -h] switch to get more information on how to enable persistence mode.
GPU clocks set to "(gpuClkMin 1980, gpuClkMax 1980)" for GPU 0000000C:00:00.0

Warning: persistence mode is disabled on device 0000000C:00:00.0. See the Known Issues section of the nvidia-smi(1) man page for more information. Run with [--help | -h] switch to get more information on how to enable persistence mode.
All done.
All done.
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
P50 tbt =  0.02536144256591797
P99 tbt =  0.7919070856571201
All GPUs and memories are cold after  3.1145191192626953
92886
288  -  96
288  -  256
288  -  1024
1053  -  96
1053  -  256
1053  -  600
8170  -  5
8170  -  256
8170  -  600
8170 - 5
Average ttft =  0.9722396078563872
P50 ttft =  0.8682584762573242
P99 ttft =  2.0453702449798588
Average tbt =  0.5275561832246328
P50 tbt =  0.4302253007888795
P99 tbt =  1.5689291572570807
All GPUs and memories are cold after  12.026264905929565
92886
288  -  96
288  -  256
288  -  1024
1053  -  96
1053  -  256
1053  -  600
8170  -  5
8170  -  256
8170  -  600
8170 - 5
Average ttft =  1.5639987945556642
P50 ttft =  1.357285976409912
P99 ttft =  3.753458251953123
Average tbt =  1.9737178155354091
P50 tbt =  2.066809272766114
P99 tbt =  4.798093564510345
All GPUs and memories are cold after  21.042568922042847
92886
288  -  96
288  -  256
288  -  1024
1053  -  96
1053  -  256
1053  -  600
8170  -  5
8170  -  256
8170  -  600
8170 - 5
Average ttft =  3.2175151836581346
P50 ttft =  2.871007204055786
P99 ttft =  6.108478450775147
Average tbt =  8.184866142854462
P50 tbt =  8.183715152740481
P99 tbt =  16.14191009998322
All GPUs and memories are cold after  26.044533491134644
92886
288  -  96
288  -  256
288  -  1024
1053  -  96
1053  -  256
1053  -  600
8170  -  5
8170  -  256
8170  -  600
8170 - 5
Average ttft =  6.021784996986389
P50 ttft =  5.592182755470276
P99 ttft =  11.0444313454628
Average tbt =  10.038692170619967
P50 tbt =  10.027232384681703
P99 tbt =  19.879675009012225
All GPUs and memories are cold after  28.062769174575806
92886
288  -  96
288  -  256
288  -  1024
1053  -  96
1053  -  256
1053  -  600
8170  -  5
8170  -  256
8170  -  600
8170 - 5
Average ttft =  9.300657767802477
P50 ttft =  9.95752763748169
P99 ttft =  17.931818785667417
Average tbt =  12.929130458086734
P50 tbt =  12.884575593471531
P99 tbt =  25.57987828111649
All GPUs and memories are cold after  33.05428981781006
92886
288  -  96
288  -  256
288  -  1024
1053  -  96
1053  -  256
1053  -  600
8170  -  5
8170  -  256
8170  -  600
8170 - 5
Average ttft =  11.973403411368801
P50 ttft =  12.365278720855713
P99 ttft =  23.272522926330566
Average tbt =  14.756284752937212
P50 tbt =  14.7478452205658
P99 tbt =  29.135724131584176
All GPUs and memories are cold after  36.05741286277771
92886
288  -  96
288  -  256
288  -  1024
1053  -  96
1053  -  256
1053  -  600
8170  -  5
8170  -  256
8170  -  600
8170 - 5
Average ttft =  14.926685804344086
P50 ttft =  14.696501731872559
P99 ttft =  28.753409276008604
Average tbt =  16.8660169828369
P50 tbt =  16.824895215034488
P99 tbt =  33.29436078405381
All GPUs and memories are cold after  35.05665040016174
92886
288  -  96
288  -  256
288  -  1024
1053  -  96
1053  -  256
1053  -  600
8170  -  5
8170  -  256
8170  -  600
8170 - 5
Average ttft =  0.5808550516764323
P50 ttft =  0.47553586959838867
P99 ttft =  1.1093441891670228
Average tbt =  0.1554610311985016
P50 tbt =  0.01994593143463135
P99 tbt =  0.6338045203685764
All GPUs and memories are cold after  31.060405731201172
92886
288  -  96
288  -  256
288  -  1024
1053  -  96
1053  -  256
1053  -  600
8170  -  5
8170  -  256
8170  -  600
8170 - 5
Average ttft =  0.7133504663194928
P50 ttft =  0.6158785820007324
P99 ttft =  1.4920165061950685
Average tbt =  0.39244058813367577
P50 tbt =  0.3452918767929078
P99 tbt =  1.257706303596497
All GPUs and memories are cold after  45.07873797416687
92886
288  -  96
288  -  256
288  -  1024
1053  -  96
1053  -  256
1053  -  600
8170  -  5
8170  -  256
8170  -  600
8170 - 5
Average ttft =  1.0513344151633126
P50 ttft =  0.8406121730804443
P99 ttft =  2.269365391731261
Average tbt =  0.9894442510604862
P50 tbt =  0.6690176010131837
P99 tbt =  3.817369951248169
All GPUs and memories are cold after  61.085890769958496
92886
288  -  96
288  -  256
288  -  1024
1053  -  96
1053  -  256
1053  -  600
8170  -  5
8170  -  256
8170  -  600
8170 - 5
Average ttft =  1.3264528076823165
P50 ttft =  1.1027820110321045
P99 ttft =  3.1452647209167486
Average tbt =  2.412781471740909
P50 tbt =  2.3024515628814703
P99 tbt =  5.755867075920107
All GPUs and memories are cold after  67.09736967086792
92886
288  -  96
288  -  256
288  -  1024
1053  -  96
1053  -  256
1053  -  600
8170  -  5
8170  -  256
8170  -  600
8170 - 5
Average ttft =  3.0471237897872925
P50 ttft =  2.997869372367859
P99 ttft =  5.316589472293853
Average tbt =  8.029182988643647
P50 tbt =  8.0132558465004
P99 tbt =  15.909460026741032
All GPUs and memories are cold after  77.1241066455841
92886
288  -  96
288  -  256
288  -  1024
1053  -  96
1053  -  256
1053  -  600
8170  -  5
8170  -  256
8170  -  600
8170 - 5
Average ttft =  5.430686801671982
P50 ttft =  5.904379606246948
P99 ttft =  10.447739281654357
Average tbt =  10.329404920339586
P50 tbt =  10.287552249431613
P99 tbt =  20.415474239110953
All GPUs and memories are cold after  97.14539837837219
92886
288  -  96
288  -  256
288  -  1024
1053  -  96
1053  -  256
1053  -  600
8170  -  5
8170  -  256
8170  -  600
8170 - 5
Average ttft =  7.694208350900102
P50 ttft =  8.03076982498169
P99 ttft =  14.924113931655885
Average tbt =  11.82995719452427
P50 tbt =  11.767171692848208
P99 tbt =  23.357604082107553
All GPUs and memories are cold after  93.12807607650757
92886
288  -  96
288  -  256
288  -  1024
1053  -  96
1053  -  256
1053  -  600
8170  -  5
8170  -  256
8170  -  600
8170 - 5
Average ttft =  9.986842069281153
P50 ttft =  9.79281497001648
P99 ttft =  19.13808597564697
Average tbt =  13.499246959226683
P50 tbt =  13.427715206146242
P99 tbt =  26.618054511070255
All GPUs and memories are cold after  87.15594005584717
92886
288  -  96
288  -  256
288  -  1024
1053  -  96
1053  -  256
1053  -  600
8170  -  5
8170  -  256
8170  -  600
8170 - 5
Average ttft =  0.5500190059343973
P50 ttft =  0.4511638879776001
P99 ttft =  1.049937357902527
Average tbt =  0.14777108430862432
P50 tbt =  0.0198577880859375
P99 tbt =  0.6011848001480107
All GPUs and memories are cold after  185.3361461162567
92886
288  -  96
288  -  256
288  -  1024
1053  -  96
1053  -  256
1053  -  600
8170  -  5
8170  -  256
8170  -  600
8170 - 5
Average ttft =  0.6582307702019101
P50 ttft =  0.5187015533447266
P99 ttft =  1.37230863571167
Average tbt =  0.3549688884190151
P50 tbt =  0.32717726230621347
P99 tbt =  1.1938485383987432
All GPUs and memories are cold after  168.26360416412354
92886
288  -  96
288  -  256
288  -  1024
1053  -  96
1053  -  256
1053  -  600
8170  -  5
8170  -  256
8170  -  600
8170 - 5
Average ttft =  0.9577064786638533
P50 ttft =  0.7889766693115234
P99 ttft =  1.9952414751052843
Average tbt =  0.9426863472802298
P50 tbt =  0.6346466779708864
P99 tbt =  3.635158727645874
All GPUs and memories are cold after  104.19079232215881
92886
288  -  96
288  -  256
288  -  1024
1053  -  96
1053  -  256
1053  -  600
8170  -  5
8170  -  256
8170  -  600
8170 - 5
Average ttft =  1.1553370080343106
P50 ttft =  0.8871958255767822
P99 ttft =  2.844951486587525
Average tbt =  1.8308328116812358
P50 tbt =  1.7092926740646366
P99 tbt =  5.154932179450991
All GPUs and memories are cold after  116.18029379844666
92886
288  -  96
288  -  256
288  -  1024
1053  -  96
1053  -  256
1053  -  600
8170  -  5
8170  -  256
8170  -  600
8170 - 5
Average ttft =  2.449708409309387
P50 ttft =  2.522797107696533
P99 ttft =  4.347230129241943
Average tbt =  7.64717886781693
P50 tbt =  7.606131958961488
P99 tbt =  15.111596189737325
All GPUs and memories are cold after  117.19650769233704
92886
288  -  96
288  -  256
288  -  1024
1053  -  96
1053  -  256
1053  -  600
8170  -  5
8170  -  256
8170  -  600
8170 - 5
Average ttft =  4.663822926580906
P50 ttft =  5.034390330314636
P99 ttft =  8.993161652088164
Average tbt =  9.837870055437088
P50 tbt =  9.888041317462925
P99 tbt =  19.399496476650242
All GPUs and memories are cold after  150.26357054710388
92886
288  -  96
288  -  256
288  -  1024
1053  -  96
1053  -  256
1053  -  600
8170  -  5
8170  -  256
8170  -  600
8170 - 5
Average ttft =  6.81734035766288
P50 ttft =  7.006958246231079
P99 ttft =  13.255236721038818
Average tbt =  11.261986496350533
P50 tbt =  11.295104074478152
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
P99 tbt =  22.194896837234506
All GPUs and memories are cold after  245.34324288368225
92886
288  -  96
288  -  256
288  -  1024
1053  -  96
1053  -  256
1053  -  600
8170  -  5
8170  -  256
8170  -  600
8170 - 5
Average ttft =  8.962397848267154
P50 ttft =  8.686521291732788
P99 ttft =  17.157136888504027
Average tbt =  12.810019782939593
P50 tbt =  12.800416040420536
P99 tbt =  25.252460960865022
All GPUs and memories are cold after  164.25711107254028
Exception ignored in: <module 'threading' from '/usr/lib/python3.8/threading.py'>
Traceback (most recent call last):
  File "/usr/lib/python3.8/threading.py", line 1388, in _shutdown
    lock.acquire()
KeyboardInterrupt: