-
Notifications
You must be signed in to change notification settings - Fork 2
/
Copy pathbenchmarks_51826.out
2418 lines (2415 loc) · 80 KB
/
benchmarks_51826.out
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
1.13.1
[2023-11-20 17:11:32,043] [INFO] [distributed.py:36:init_distributed] Not using the DeepSpeed or torch.distributed launchers, attempting to detect MPI environment...
[2023-11-20 17:11:32,821] [INFO] [distributed.py:83:mpi_discovery] Discovered MPI settings of world_rank=0, local_rank=0, world_size=1, master_addr=26.0.144.211, master_port=6000
[2023-11-20 17:11:32,821] [INFO] [distributed.py:46:init_distributed] Initializing torch distributed with backend: nccl
[2023-11-20 17:11:35,949] [INFO] [checkpointing.py:223:model_parallel_cuda_manual_seed] > initializing model parallel cuda seeds on global rank 0, model parallel rank 0, and data parallel rank 0 with model parallel seed: 3952 and data parallel seed: 1234
num_attention_heads: 20, hidden_size: 2560, train_micro_batch_size_per_gpu: 4, tensor_mp_size: 1, pipeline_mp_size: 1, dp_size: 1
Actual
------
QKV Transform: 2.2935431003570557
using-flash
Attention linproj: 0.0068302154541015625
QKV Transform: 0.0025315284729003906
using-flash
Attention linproj: 0.0005497932434082031
QKV Transform: 0.002099752426147461
using-flash
Attention linproj: 0.0005397796630859375
QKV Transform: 0.002203226089477539
using-flash
Attention linproj: 0.0005326271057128906
QKV Transform: 0.002100706100463867
using-flash
Attention linproj: 0.0005319118499755859
QKV Transform: 0.002194643020629883
using-flash
Attention linproj: 0.0005285739898681641
QKV Transform: 0.0022051334381103516
using-flash
Attention linproj: 0.0005309581756591797
QKV Transform: 0.002261638641357422
using-flash
Attention linproj: 0.0005290508270263672
QKV Transform: 0.0022644996643066406
using-flash
Attention linproj: 0.0005307197570800781
QKV Transform: 0.0022056102752685547
using-flash
Attention linproj: 0.000545501708984375
QKV Transform: 0.0022215843200683594
using-flash
Attention linproj: 0.0005359649658203125
QKV Transform: 0.0021631717681884766
using-flash
Attention linproj: 0.0005736351013183594
QKV Transform: 0.0025277137756347656
using-flash
Attention linproj: 0.0005366802215576172
QKV Transform: 0.0021715164184570312
using-flash
Attention linproj: 0.0005390644073486328
QKV Transform: 0.002094268798828125
using-flash
Attention linproj: 0.000530242919921875
QKV Transform: 0.002180337905883789
using-flash
Attention linproj: 0.0005326271057128906
QKV Transform: 0.002150297164916992
using-flash
Attention linproj: 0.0005300045013427734
QKV Transform: 0.0021309852600097656
using-flash
Attention linproj: 0.0005314350128173828
QKV Transform: 0.002238035202026367
using-flash
Attention linproj: 0.0005288124084472656
QKV Transform: 0.0022611618041992188
using-flash
Attention linproj: 0.0005309581756591797
QKV Transform: 0.002292633056640625
using-flash
Attention linproj: 0.0005345344543457031
QKV Transform: 0.0021758079528808594
using-flash
Attention linproj: 0.0005371570587158203
QKV Transform: 0.0029397010803222656
using-flash
Attention linproj: 0.0005285739898681641
QKV Transform: 0.002133607864379883
using-flash
Attention linproj: 0.0005314350128173828
QKV Transform: 0.002216339111328125
using-flash
Attention linproj: 0.0005300045013427734
QKV Transform: 0.002207517623901367
using-flash
Attention linproj: 0.0005331039428710938
QKV Transform: 0.0022933483123779297
using-flash
Attention linproj: 0.0005364418029785156
QKV Transform: 0.002293825149536133
using-flash
Attention linproj: 0.0005366802215576172
QKV Transform: 0.0022516250610351562
using-flash
Attention linproj: 0.0005359649658203125
QKV Transform: 0.0021970272064208984
using-flash
Attention linproj: 0.0005393028259277344
QKV Transform: 0.0021660327911376953
using-flash
Attention linproj: 0.0005309581756591797
QKV Transform: 0.002161741256713867
using-flash
Attention linproj: 0.0005292892456054688
QKV Transform: 0.002085447311401367
using-flash
Attention linproj: 0.0005300045013427734
QKV Transform: 0.001965761184692383
using-flash
Attention linproj: 0.0005297660827636719
QKV Transform: 0.0021419525146484375
using-flash
Attention linproj: 0.0005300045013427734
QKV Transform: 0.0022835731506347656
using-flash
Attention linproj: 0.0005300045013427734
QKV Transform: 0.0022902488708496094
using-flash
Attention linproj: 0.0005464553833007812
QKV Transform: 0.002212047576904297
using-flash
Attention linproj: 0.0005197525024414062
QKV Transform: 0.002223491668701172
using-flash
Attention linproj: 0.0005342960357666016
QKV Transform: 0.002292633056640625
using-flash
Attention linproj: 0.0005249977111816406
QKV Transform: 0.002217531204223633
using-flash
Attention linproj: 0.0005402565002441406
QKV Transform: 0.0022459030151367188
using-flash
Attention linproj: 0.0005385875701904297
QKV Transform: 0.0021135807037353516
using-flash
Attention linproj: 0.0005409717559814453
QKV Transform: 0.002129077911376953
using-flash
Attention linproj: 0.0005314350128173828
QKV Transform: 0.00212860107421875
using-flash
Attention linproj: 0.0005290508270263672
QKV Transform: 0.0020880699157714844
using-flash
Attention linproj: 0.0005290508270263672
QKV Transform: 0.002305269241333008
using-flash
Attention linproj: 0.0005359649658203125
QKV Transform: 0.0022115707397460938
using-flash
Attention linproj: 0.0005371570587158203
QKV Transform: 0.0022149085998535156
using-flash
Attention linproj: 0.0005311965942382812
QKV Transform: 0.0022177696228027344
using-flash
Attention linproj: 0.0005307197570800781
QKV Transform: 0.0019495487213134766
using-flash
Attention linproj: 0.0005345344543457031
QKV Transform: 0.0021331310272216797
using-flash
Attention linproj: 0.0005311965942382812
QKV Transform: 0.0022084712982177734
using-flash
Attention linproj: 0.0005295276641845703
QKV Transform: 0.002137422561645508
using-flash
Attention linproj: 0.0005297660827636719
QKV Transform: 0.0022525787353515625
using-flash
Attention linproj: 0.0005321502685546875
QKV Transform: 0.002262115478515625
using-flash
Attention linproj: 0.0005357265472412109
QKV Transform: 0.002220630645751953
using-flash
Attention linproj: 0.0005373954772949219
QKV Transform: 0.0022602081298828125
using-flash
Attention linproj: 0.0005357265472412109
QKV Transform: 0.0021643638610839844
using-flash
Attention linproj: 0.0005476474761962891
QKV Transform: 0.002218484878540039
using-flash
Attention linproj: 0.0005311965942382812
QKV Transform: 0.0021524429321289062
using-flash
Attention linproj: 0.0005314350128173828
QKV Transform: 0.0021867752075195312
using-flash
Attention linproj: 0.0005314350128173828
QKV Transform: 0.002172708511352539
using-flash
Attention linproj: 0.0005295276641845703
QKV Transform: 0.0021691322326660156
using-flash
Attention linproj: 0.0005295276641845703
QKV Transform: 0.0022563934326171875
using-flash
Attention linproj: 0.0005333423614501953
QKV Transform: 0.0021719932556152344
using-flash
Attention linproj: 0.0005292892456054688
QKV Transform: 0.0021049976348876953
using-flash
Attention linproj: 0.0005307197570800781
QKV Transform: 0.002215147018432617
using-flash
Attention linproj: 0.0005321502685546875
QKV Transform: 0.006205558776855469
using-flash
Attention linproj: 0.0005316734313964844
QKV Transform: 0.002298593521118164
using-flash
Attention linproj: 0.00052642822265625
QKV Transform: 0.002238750457763672
using-flash
Attention linproj: 0.0005390644073486328
QKV Transform: 0.002241849899291992
using-flash
Attention linproj: 0.0005373954772949219
QKV Transform: 0.0021491050720214844
using-flash
Attention linproj: 0.0005285739898681641
QKV Transform: 0.0022115707397460938
using-flash
Attention linproj: 0.0005316734313964844
QKV Transform: 0.002123594284057617
using-flash
Attention linproj: 0.0005304813385009766
QKV Transform: 0.0021872520446777344
using-flash
Attention linproj: 0.0005419254302978516
QKV Transform: 0.0021734237670898438
using-flash
Attention linproj: 0.0005292892456054688
QKV Transform: 0.0022335052490234375
using-flash
Attention linproj: 0.0005397796630859375
QKV Transform: 0.002257108688354492
using-flash
Attention linproj: 0.0005328655242919922
QKV Transform: 0.002222776412963867
using-flash
Attention linproj: 0.0005362033843994141
QKV Transform: 0.002269744873046875
using-flash
Attention linproj: 0.0005314350128173828
QKV Transform: 0.0024900436401367188
using-flash
Attention linproj: 0.0005486011505126953
QKV Transform: 0.0021860599517822266
using-flash
Attention linproj: 0.0005321502685546875
QKV Transform: 0.0021257400512695312
using-flash
Attention linproj: 0.0005328655242919922
QKV Transform: 0.002088785171508789
using-flash
Attention linproj: 0.0005300045013427734
QKV Transform: 0.0021431446075439453
using-flash
Attention linproj: 0.0005295276641845703
QKV Transform: 0.002117633819580078
using-flash
Attention linproj: 0.0005292892456054688
QKV Transform: 0.0022978782653808594
using-flash
Attention linproj: 0.0005316734313964844
QKV Transform: 0.0022821426391601562
using-flash
Attention linproj: 0.0005335807800292969
QKV Transform: 0.002240896224975586
using-flash
Attention linproj: 0.0005276203155517578
QKV Transform: 0.0022363662719726562
using-flash
Attention linproj: 0.0005371570587158203
QKV Transform: 0.0021893978118896484
using-flash
Attention linproj: 0.0005366802215576172
QKV Transform: 0.0022144317626953125
using-flash
Attention linproj: 0.0005400180816650391
QKV Transform: 0.0021538734436035156
using-flash
Attention linproj: 0.0005307197570800781
QKV Transform: 0.0021047592163085938
using-flash
Attention linproj: 0.0005183219909667969
QKV Transform: 0.002176523208618164
using-flash
Attention linproj: 0.0005316734313964844
QKV Transform: 0.0021326541900634766
using-flash
Attention linproj: 0.0005300045013427734
QKV Transform: 0.002281665802001953
using-flash
Attention linproj: 0.0005304813385009766
QKV Transform: 0.0022394657135009766
using-flash
Attention linproj: 0.0005338191986083984
QKV Transform: 0.0023179054260253906
using-flash
Attention linproj: 0.000537872314453125
QKV Transform: 0.0022552013397216797
using-flash
Attention linproj: 0.0005376338958740234
QKV Transform: 0.0022554397583007812
using-flash
Attention linproj: 0.0005357265472412109
QKV Transform: 0.0021970272064208984
using-flash
Attention linproj: 0.0005383491516113281
QKV Transform: 0.0021271705627441406
using-flash
Attention linproj: 0.0005311965942382812
QKV Transform: 0.002131223678588867
using-flash
Attention linproj: 0.0005311965942382812
QKV Transform: 0.0020990371704101562
using-flash
Attention linproj: 0.0005304813385009766
QKV Transform: 0.0022122859954833984
using-flash
Attention linproj: 0.0005304813385009766
QKV Transform: 0.002184152603149414
using-flash
Attention linproj: 0.0005300045013427734
QKV Transform: 0.002298593521118164
using-flash
Attention linproj: 0.0005316734313964844
QKV Transform: 0.002313375473022461
using-flash
Attention linproj: 0.0005354881286621094
QKV Transform: 0.002252340316772461
using-flash
Attention linproj: 0.0005376338958740234
QKV Transform: 0.002240896224975586
using-flash
Attention linproj: 0.000537872314453125
QKV Transform: 0.0021119117736816406
using-flash
Attention linproj: 0.0005373954772949219
QKV Transform: 0.0021457672119140625
using-flash
Attention linproj: 0.0005314350128173828
QKV Transform: 0.002161264419555664
using-flash
Attention linproj: 0.0005309581756591797
QKV Transform: 0.001962900161743164
using-flash
Attention linproj: 0.0005295276641845703
QKV Transform: 0.0021893978118896484
using-flash
Attention linproj: 0.000530242919921875
QKV Transform: 0.0021691322326660156
using-flash
Attention linproj: 0.0005297660827636719
QKV Transform: 0.002276897430419922
using-flash
Attention linproj: 0.0005326271057128906
QKV Transform: 0.0022792816162109375
using-flash
Attention linproj: 0.0005323886871337891
QKV Transform: 0.002293109893798828
using-flash
Attention linproj: 0.000537872314453125
QKV Transform: 0.0022826194763183594
using-flash
Attention linproj: 0.0005474090576171875
QKV Transform: 0.0022034645080566406
using-flash
Attention linproj: 0.0005366802215576172
QKV Transform: 0.0021820068359375
using-flash
Attention linproj: 0.0005316734313964844
QKV Transform: 0.0025446414947509766
using-flash
Attention linproj: 0.0005371570587158203
QKV Transform: 0.0022118091583251953
using-flash
Attention linproj: 0.0005295276641845703
QKV Transform: 0.002209901809692383
using-flash
Attention linproj: 0.0005283355712890625
QKV Transform: 0.0022058486938476562
using-flash
Attention linproj: 0.0005316734313964844
QKV Transform: 0.002272367477416992
using-flash
Attention linproj: 0.0005252361297607422
QKV Transform: 0.0022292137145996094
using-flash
Attention linproj: 0.0005366802215576172
QKV Transform: 0.0022897720336914062
using-flash
Attention linproj: 0.0005381107330322266
QKV Transform: 0.0021729469299316406
using-flash
Attention linproj: 0.0005369186401367188
QKV Transform: 0.0022306442260742188
using-flash
Attention linproj: 0.0005385875701904297
QKV Transform: 0.0022995471954345703
using-flash
Attention linproj: 0.0005373954772949219
QKV Transform: 0.002244710922241211
using-flash
Attention linproj: 0.0005257129669189453
QKV Transform: 0.0022237300872802734
using-flash
Attention linproj: 0.0005311965942382812
QKV Transform: 0.002153635025024414
using-flash
Attention linproj: 0.000530242919921875
QKV Transform: 0.0020813941955566406
using-flash
Attention linproj: 0.0005297660827636719
QKV Transform: 0.0021321773529052734
using-flash
Attention linproj: 0.0005307197570800781
QKV Transform: 0.002132415771484375
using-flash
Attention linproj: 0.0005328655242919922
QKV Transform: 0.0022847652435302734
using-flash
Attention linproj: 0.0005316734313964844
QKV Transform: 0.0022890567779541016
using-flash
Attention linproj: 0.0005426406860351562
QKV Transform: 0.002311229705810547
using-flash
Attention linproj: 0.000537872314453125
QKV Transform: 0.0020537376403808594
using-flash
Attention linproj: 0.0005376338958740234
QKV Transform: 0.0022115707397460938
using-flash
Attention linproj: 0.0005514621734619141
QKV Transform: 0.002167940139770508
using-flash
Attention linproj: 0.0005314350128173828
QKV Transform: 0.0021028518676757812
using-flash
Attention linproj: 0.0005309581756591797
QKV Transform: 0.002086162567138672
using-flash
Attention linproj: 0.0005321502685546875
QKV Transform: 0.0021457672119140625
using-flash
Attention linproj: 0.0005316734313964844
QKV Transform: 0.0021851062774658203
using-flash
Attention linproj: 0.000530242919921875
Attention duration (in seconds): 0.0116
Attention throughput (in TFLOP/s): 51.724
MLP_h_4h: 2.0699191093444824
MLP_4h_h: 0.0018258094787597656
MLP_h_4h: 0.0022826194763183594
MLP_4h_h: 0.0017294883728027344
MLP_h_4h: 0.002253293991088867
MLP_4h_h: 0.0017085075378417969
MLP_h_4h: 0.002249002456665039
MLP_4h_h: 0.0017049312591552734
MLP_h_4h: 0.0022499561309814453
MLP_4h_h: 0.0017006397247314453
MLP_h_4h: 0.002248048782348633
MLP_4h_h: 0.0017001628875732422
MLP_h_4h: 0.0022482872009277344
MLP_4h_h: 0.0016994476318359375
MLP_h_4h: 0.0022478103637695312
MLP_4h_h: 0.00170135498046875
MLP_h_4h: 0.0022535324096679688
MLP_4h_h: 0.0017025470733642578
MLP_h_4h: 0.002259969711303711
MLP_4h_h: 0.0017094612121582031
MLP_h_4h: 0.0022525787353515625
MLP_4h_h: 0.0017099380493164062
MLP_h_4h: 0.002265453338623047
MLP_4h_h: 0.0017063617706298828
MLP_h_4h: 0.002258777618408203
MLP_4h_h: 0.001705169677734375
MLP_h_4h: 0.002263307571411133
MLP_4h_h: 0.0017046928405761719
MLP_h_4h: 0.002263307571411133
MLP_4h_h: 0.0017125606536865234
MLP_h_4h: 0.0022614002227783203
MLP_4h_h: 0.001714468002319336
MLP_h_4h: 0.002279520034790039
MLP_4h_h: 0.0017104148864746094
MLP_h_4h: 0.0022759437561035156
MLP_4h_h: 0.0024139881134033203
MLP_h_4h: 0.0023131370544433594
MLP_4h_h: 0.0017192363739013672
MLP_h_4h: 0.002276182174682617
MLP_4h_h: 0.0017113685607910156
MLP_h_4h: 0.002275228500366211
MLP_4h_h: 0.0017108917236328125
MLP_h_4h: 0.0022766590118408203
MLP_4h_h: 0.001729726791381836
MLP_h_4h: 0.0022737979888916016
MLP_4h_h: 0.00173187255859375
MLP_h_4h: 0.002274751663208008
MLP_4h_h: 0.0017175674438476562
MLP_h_4h: 0.002269744873046875
MLP_4h_h: 0.0017168521881103516
MLP_h_4h: 0.002268075942993164
MLP_4h_h: 0.0017154216766357422
MLP_h_4h: 0.0022728443145751953
MLP_4h_h: 0.0017158985137939453
MLP_h_4h: 0.002271413803100586
MLP_4h_h: 0.0017163753509521484
MLP_h_4h: 0.002271413803100586
MLP_4h_h: 0.0017430782318115234
MLP_h_4h: 0.002282381057739258
MLP_4h_h: 0.0017180442810058594
MLP_h_4h: 0.0022656917572021484
MLP_4h_h: 0.001720428466796875
MLP_h_4h: 0.0022687911987304688
MLP_4h_h: 0.0017185211181640625
MLP_h_4h: 0.002268075942993164
MLP_4h_h: 0.0017201900482177734
MLP_h_4h: 0.0022666454315185547
MLP_4h_h: 0.0017359256744384766
MLP_h_4h: 0.0022716522216796875
MLP_4h_h: 0.0017201900482177734
MLP_h_4h: 0.002267122268676758
MLP_4h_h: 0.0017197132110595703
MLP_h_4h: 0.0022699832916259766
MLP_4h_h: 0.0017201900482177734
MLP_h_4h: 0.002284526824951172
MLP_4h_h: 0.0017230510711669922
MLP_h_4h: 0.0022830963134765625
MLP_4h_h: 0.0017237663269042969
MLP_h_4h: 0.002285480499267578
MLP_4h_h: 0.0017235279083251953
MLP_h_4h: 0.002283811569213867
MLP_4h_h: 0.0017249584197998047
MLP_h_4h: 0.002284526824951172
MLP_4h_h: 0.0017242431640625
MLP_h_4h: 0.0022840499877929688
MLP_4h_h: 0.0017244815826416016
MLP_h_4h: 0.002290487289428711
MLP_4h_h: 0.0017311573028564453
MLP_h_4h: 0.0022852420806884766
MLP_4h_h: 0.001730203628540039
MLP_h_4h: 0.002283811569213867
MLP_4h_h: 0.0017309188842773438
MLP_h_4h: 0.002283811569213867
MLP_4h_h: 0.0017247200012207031
MLP_h_4h: 0.0022847652435302734
MLP_4h_h: 0.0017251968383789062
MLP_h_4h: 0.002282381057739258
MLP_4h_h: 0.001733541488647461
MLP_h_4h: 0.0022840499877929688
MLP_4h_h: 0.001729726791381836
MLP_h_4h: 0.002282857894897461
MLP_4h_h: 0.0017299652099609375
MLP_h_4h: 0.0022830963134765625
MLP_4h_h: 0.0017261505126953125
MLP_h_4h: 0.0022857189178466797
MLP_4h_h: 0.001722097396850586
MLP_h_4h: 0.002285003662109375
MLP_4h_h: 0.0017266273498535156
MLP_h_4h: 0.002282857894897461
MLP_4h_h: 0.0017313957214355469
MLP_h_4h: 0.002283811569213867
MLP_4h_h: 0.0017313957214355469
MLP_h_4h: 0.0022840499877929688
MLP_4h_h: 0.0017211437225341797
MLP_h_4h: 0.0022842884063720703
MLP_4h_h: 0.0017273426055908203
MLP_h_4h: 0.0022852420806884766
MLP_4h_h: 0.0017285346984863281
MLP_h_4h: 0.002285480499267578
MLP_4h_h: 0.0017323493957519531
MLP_h_4h: 0.0022840499877929688
MLP_4h_h: 0.0017328262329101562
MLP_h_4h: 0.0022859573364257812
MLP_4h_h: 0.0017316341400146484
MLP_h_4h: 0.002286672592163086
MLP_4h_h: 0.001729726791381836
MLP_h_4h: 0.002302408218383789
MLP_4h_h: 0.0017337799072265625
MLP_h_4h: 0.0022912025451660156
MLP_4h_h: 0.0017333030700683594
MLP_h_4h: 0.0022974014282226562
MLP_4h_h: 0.0017361640930175781
MLP_h_4h: 0.0023026466369628906
MLP_4h_h: 0.001729726791381836
MLP_h_4h: 0.002295255661010742
MLP_4h_h: 0.0017282962799072266
MLP_h_4h: 0.002295255661010742
MLP_4h_h: 0.001720428466796875
MLP_h_4h: 0.0023119449615478516
MLP_4h_h: 0.001718759536743164
MLP_h_4h: 0.002297639846801758
MLP_4h_h: 0.0017180442810058594
MLP_h_4h: 0.0022995471954345703
MLP_4h_h: 0.0017185211181640625
MLP_h_4h: 0.002301454544067383
MLP_4h_h: 0.0017189979553222656
MLP_h_4h: 0.0022966861724853516
MLP_4h_h: 0.001718282699584961
MLP_h_4h: 0.0022950172424316406
MLP_4h_h: 0.0017199516296386719
MLP_h_4h: 0.002299785614013672
MLP_4h_h: 0.0017199516296386719
MLP_h_4h: 0.0022983551025390625
MLP_4h_h: 0.0017194747924804688
MLP_h_4h: 0.0022962093353271484
MLP_4h_h: 0.0017185211181640625
MLP_h_4h: 0.002294778823852539
MLP_4h_h: 0.001718282699584961
MLP_h_4h: 0.002298116683959961
MLP_4h_h: 0.0017135143280029297
MLP_h_4h: 0.0022771358489990234
MLP_4h_h: 0.00170135498046875
MLP_h_4h: 0.002274036407470703
MLP_4h_h: 0.00170135498046875
MLP_h_4h: 0.0022742748260498047
MLP_4h_h: 0.0017006397247314453
MLP_h_4h: 0.002283334732055664
MLP_4h_h: 0.0017018318176269531
MLP_h_4h: 0.002282381057739258
MLP_4h_h: 0.0017066001892089844
MLP_h_4h: 0.0022950172424316406
MLP_4h_h: 0.0017054080963134766
MLP_h_4h: 0.002287626266479492
MLP_4h_h: 0.001707315444946289
MLP_h_4h: 0.002285003662109375
MLP_4h_h: 0.0017094612121582031
MLP_h_4h: 0.0023109912872314453
MLP_4h_h: 0.0017063617706298828
MLP_h_4h: 0.0022971630096435547
MLP_4h_h: 0.0017056465148925781
MLP_h_4h: 0.00229644775390625
MLP_4h_h: 0.001703023910522461
MLP_h_4h: 0.002292633056640625
MLP_4h_h: 0.00170135498046875
MLP_h_4h: 0.002270221710205078
MLP_4h_h: 0.0017004013061523438
MLP_h_4h: 0.002283334732055664
MLP_4h_h: 0.0017025470733642578
MLP_h_4h: 0.002269744873046875
MLP_4h_h: 0.0017120838165283203
MLP_h_4h: 0.0022683143615722656
MLP_4h_h: 0.0017011165618896484
MLP_h_4h: 0.0022726058959960938
MLP_4h_h: 0.0017008781433105469
MLP_h_4h: 0.0022690296173095703
MLP_4h_h: 0.0017008781433105469
MLP_h_4h: 0.0022726058959960938
MLP_4h_h: 0.0017011165618896484
MLP_h_4h: 0.002271890640258789
MLP_4h_h: 0.0017001628875732422
MLP_h_4h: 0.0022742748260498047
MLP_4h_h: 0.0017018318176269531
MLP_h_4h: 0.0022737979888916016
MLP_4h_h: 0.0017008781433105469
MLP_h_4h: 0.002274036407470703
MLP_4h_h: 0.0017011165618896484
MLP_h_4h: 0.0022721290588378906
MLP_4h_h: 0.00170135498046875
MLP_h_4h: 0.0022733211517333984
MLP_4h_h: 0.0017015933990478516
MLP_h_4h: 0.0022766590118408203
MLP_4h_h: 0.001718759536743164
MLP_h_4h: 0.0022749900817871094
MLP_4h_h: 0.0017113685607910156
MLP_h_4h: 0.002269268035888672
MLP_4h_h: 0.0017290115356445312
MLP_h_4h: 0.002275705337524414
MLP_4h_h: 0.0017087459564208984
MLP_h_4h: 0.0022695064544677734
MLP_4h_h: 0.001707315444946289
MLP_h_4h: 0.0022695064544677734
MLP_4h_h: 0.0017092227935791016
MLP_h_4h: 0.002270221710205078
MLP_4h_h: 0.0017066001892089844
MLP_h_4h: 0.002269268035888672
MLP_4h_h: 0.0017082691192626953
MLP_h_4h: 0.002278566360473633
MLP_4h_h: 0.001707315444946289
MLP_h_4h: 0.002269744873046875
MLP_4h_h: 0.001707315444946289
MLP_h_4h: 0.0022673606872558594
MLP_4h_h: 0.001706838607788086
MLP_h_4h: 0.002272367477416992
MLP_4h_h: 0.0017352104187011719
MLP_h_4h: 0.002279996871948242
MLP_4h_h: 0.001708984375
MLP_h_4h: 0.0022649765014648438
MLP_4h_h: 0.0017092227935791016
MLP_h_4h: 0.002264738082885742
MLP_4h_h: 0.0017096996307373047
MLP_h_4h: 0.0022678375244140625
MLP_4h_h: 0.001708984375
MLP_h_4h: 0.0022666454315185547
MLP_4h_h: 0.001708984375
MLP_h_4h: 0.002266407012939453
MLP_4h_h: 0.0017082691192626953
MLP_h_4h: 0.0022644996643066406
MLP_4h_h: 0.001707315444946289
MLP_h_4h: 0.002266407012939453
MLP_4h_h: 0.0017101764678955078
MLP_h_4h: 0.0022644996643066406
MLP_4h_h: 0.0017085075378417969
MLP_h_4h: 0.0022630691528320312
MLP_4h_h: 0.0017108917236328125
MLP_h_4h: 0.0022652149200439453
MLP_4h_h: 0.0017082691192626953
MLP_h_4h: 0.0022649765014648438
MLP_4h_h: 0.0017080307006835938
MLP_h_4h: 0.002265453338623047
MLP_4h_h: 0.0017092227935791016
MLP_h_4h: 0.002262592315673828
MLP_4h_h: 0.0017096996307373047
MLP_h_4h: 0.0022695064544677734
MLP_4h_h: 0.0017173290252685547
MLP_h_4h: 0.0022661685943603516
MLP_4h_h: 0.0017271041870117188
MLP_h_4h: 0.0022695064544677734
MLP_4h_h: 0.001710653305053711
MLP_h_4h: 0.0022649765014648438
MLP_4h_h: 0.001712799072265625
MLP_h_4h: 0.002259969711303711
MLP_4h_h: 0.0017101764678955078
MLP_h_4h: 0.0022652149200439453
MLP_4h_h: 0.0017099380493164062
MLP_h_4h: 0.0022644996643066406
MLP_4h_h: 0.0017101764678955078
MLP_h_4h: 0.002267599105834961
MLP_4h_h: 0.0017092227935791016
MLP_h_4h: 0.0022673606872558594
MLP_4h_h: 0.0017085075378417969
MLP_h_4h: 0.0022649765014648438
MLP_4h_h: 0.0017082691192626953
MLP_h_4h: 0.002265453338623047
MLP_4h_h: 0.001711130142211914
MLP_h_4h: 0.0022661685943603516
MLP_4h_h: 0.0017104148864746094
MLP_h_4h: 0.002267122268676758
MLP_4h_h: 0.0017108917236328125
MLP_h_4h: 0.002269268035888672
MLP_4h_h: 0.0017096996307373047
MLP_h_4h: 0.0022673606872558594
MLP_4h_h: 0.0017108917236328125
MLP_h_4h: 0.0022649765014648438
MLP_4h_h: 0.0017101764678955078
MLP_h_4h: 0.002268552780151367
MLP_4h_h: 0.0017101764678955078
MLP_h_4h: 0.002267122268676758
MLP_4h_h: 0.0017092227935791016
MLP_h_4h: 0.002266407012939453
MLP_4h_h: 0.0017101764678955078
MLP duration (in seconds): 0.0040
MLP throughput (in TFLOP/s): 213.390
LN1: 0.0030465126037597656
QKV Transform: 0.0016868114471435547
using-flash
Attention linproj: 0.0005350112915039062
Post-attention Dropout: 0.06502366065979004
Post-attention residual: 0.003414630889892578
LN2: 0.00018930435180664062
MLP_h_4h: 0.002358675003051758
MLP_4h_h: 0.0017104148864746094
Post-MLP residual: 0.002012014389038086
Attention layer time: 0.08253693580627441
LN1: 0.00013399124145507812
QKV Transform: 0.002765655517578125
using-flash
Attention linproj: 0.0005352497100830078
Post-attention Dropout: 0.00033783912658691406
Post-attention residual: 0.00011086463928222656
LN2: 0.00011682510375976562
MLP_h_4h: 0.002650737762451172
MLP_4h_h: 0.001722574234008789
Post-MLP residual: 0.00033664703369140625
Attention layer time: 0.01636958122253418
LN1: 0.00013494491577148438
QKV Transform: 0.0020415782928466797
using-flash
Attention linproj: 0.0005304813385009766
Post-attention Dropout: 0.0003464221954345703
Post-attention residual: 0.00011467933654785156
LN2: 0.0001201629638671875
MLP_h_4h: 0.0036211013793945312
MLP_4h_h: 0.0017266273498535156
Post-MLP residual: 0.00033473968505859375
Attention layer time: 0.01664590835571289
LN1: 0.00013136863708496094
QKV Transform: 0.0025882720947265625
using-flash
Attention linproj: 0.0005326271057128906
Post-attention Dropout: 0.0003483295440673828
Post-attention residual: 0.00011324882507324219
LN2: 0.00013780593872070312
MLP_h_4h: 0.003581523895263672
MLP_4h_h: 0.0017209053039550781
Post-MLP residual: 0.00034046173095703125
Attention layer time: 0.017192840576171875
LN1: 0.0001327991485595703
QKV Transform: 0.002669095993041992
using-flash
Attention linproj: 0.0005364418029785156
Post-attention Dropout: 0.0003337860107421875
Post-attention residual: 0.00011014938354492188
LN2: 0.00011467933654785156
MLP_h_4h: 0.002718687057495117
MLP_4h_h: 0.0017158985137939453
Post-MLP residual: 0.0003273487091064453
Attention layer time: 0.03281712532043457
LN1: 0.00012922286987304688
QKV Transform: 0.0018532276153564453
using-flash
Attention linproj: 0.0005366802215576172
Post-attention Dropout: 0.0003421306610107422
Post-attention residual: 0.00011229515075683594
LN2: 0.00011515617370605469
MLP_h_4h: 0.003618478775024414
MLP_4h_h: 0.001711130142211914
Post-MLP residual: 0.0003352165222167969
Attention layer time: 0.017490863800048828
LN1: 0.00013375282287597656
QKV Transform: 0.002536773681640625
using-flash
Attention linproj: 0.0005278587341308594
Post-attention Dropout: 0.0003647804260253906
Post-attention residual: 0.00011515617370605469
LN2: 0.00011873245239257812
MLP_h_4h: 0.0036153793334960938
MLP_4h_h: 0.0017404556274414062
Post-MLP residual: 0.00033783912658691406
Attention layer time: 0.01715850830078125
LN1: 0.00013017654418945312
QKV Transform: 0.0025322437286376953
using-flash
Attention linproj: 0.0005259513854980469
Post-attention Dropout: 0.00033593177795410156
Post-attention residual: 0.00011539459228515625
LN2: 0.00011706352233886719
MLP_h_4h: 0.003632783889770508
MLP_4h_h: 0.0017211437225341797
Post-MLP residual: 0.00034165382385253906
Attention layer time: 0.017138957977294922
LN1: 0.0001316070556640625
QKV Transform: 0.0025644302368164062
using-flash
Attention linproj: 0.0005414485931396484
Post-attention Dropout: 0.00033926963806152344
Post-attention residual: 0.00011348724365234375
LN2: 0.00011682510375976562
MLP_h_4h: 0.0036263465881347656
MLP_4h_h: 0.0017189979553222656
Post-MLP residual: 0.0003476142883300781
Attention layer time: 0.01717853546142578
LN1: 0.0001285076141357422
QKV Transform: 0.002597332000732422
using-flash
Attention linproj: 0.0005352497100830078
Post-attention Dropout: 0.0003447532653808594
Post-attention residual: 0.00012564659118652344
LN2: 0.00012826919555664062
MLP_h_4h: 0.0035965442657470703
MLP_4h_h: 0.0017330646514892578
Post-MLP residual: 0.0003352165222167969
Attention layer time: 0.017197608947753906
LN1: 0.0001342296600341797
QKV Transform: 0.00254058837890625
using-flash
Attention linproj: 0.0005283355712890625
Post-attention Dropout: 0.0003368854522705078
Post-attention residual: 0.00011491775512695312
LN2: 0.00011968612670898438
MLP_h_4h: 0.003627777099609375
MLP_4h_h: 0.001722574234008789
Post-MLP residual: 0.000335693359375
Attention layer time: 0.017145872116088867
LN1: 0.00013065338134765625
QKV Transform: 0.002521991729736328
using-flash
Attention linproj: 0.0005259513854980469
Post-attention Dropout: 0.00033593177795410156
Post-attention residual: 0.00011372566223144531
LN2: 0.00011467933654785156
MLP_h_4h: 0.003651857376098633
MLP_4h_h: 0.001718282699584961
Post-MLP residual: 0.000339508056640625
Attention layer time: 0.0171356201171875
LN1: 0.00013208389282226562
QKV Transform: 0.0025098323822021484
using-flash
Attention linproj: 0.0005335807800292969
Post-attention Dropout: 0.0003364086151123047
Post-attention residual: 0.00011277198791503906
LN2: 0.00011539459228515625
MLP_h_4h: 0.003655672073364258
MLP_4h_h: 0.0017199516296386719
Post-MLP residual: 0.0003414154052734375
Attention layer time: 0.01714181900024414
LN1: 0.0001277923583984375
QKV Transform: 0.0024077892303466797
using-flash
Attention linproj: 0.0005354881286621094
Post-attention Dropout: 0.0003418922424316406
Post-attention residual: 0.00011467933654785156
LN2: 0.00011730194091796875
MLP_h_4h: 0.003641366958618164
MLP_4h_h: 0.0017135143280029297
Post-MLP residual: 0.00033783912658691406
Attention layer time: 0.01699233055114746
LN1: 0.00013375282287597656
QKV Transform: 0.002516031265258789
using-flash
Attention linproj: 0.0005288124084472656
Post-attention Dropout: 0.0003380775451660156
Post-attention residual: 0.00011563301086425781
LN2: 0.00011897087097167969
MLP_h_4h: 0.003634929656982422
MLP_4h_h: 0.001722574234008789
Post-MLP residual: 0.0003376007080078125
Attention layer time: 0.0171201229095459
LN1: 0.0001285076141357422
QKV Transform: 0.0026998519897460938
using-flash
Attention linproj: 0.0005254745483398438
Post-attention Dropout: 0.0003337860107421875
Post-attention residual: 0.00011348724365234375
LN2: 0.00011563301086425781
MLP_h_4h: 0.0036537647247314453
MLP_4h_h: 0.0017180442810058594
Post-MLP residual: 0.0003402233123779297
Attention layer time: 0.017301559448242188
LN1: 0.0001308917999267578
QKV Transform: 0.002580881118774414
using-flash
Attention linproj: 0.0005326271057128906
Post-attention Dropout: 0.0003376007080078125
Post-attention residual: 0.00011348724365234375
LN2: 0.00011515617370605469
MLP_h_4h: 0.003650665283203125
MLP_4h_h: 0.0017192363739013672
Post-MLP residual: 0.00033593177795410156
Attention layer time: 0.017155885696411133
LN1: 0.00012874603271484375
QKV Transform: 0.0025453567504882812
using-flash
Attention linproj: 0.0005350112915039062
Post-attention Dropout: 0.0003437995910644531
Post-attention residual: 0.00011372566223144531
LN2: 0.00012683868408203125
MLP_h_4h: 0.0036094188690185547
MLP_4h_h: 0.0017230510711669922
Post-MLP residual: 0.0003364086151123047
Attention layer time: 0.017138004302978516
LN1: 0.0001342296600341797
QKV Transform: 0.0025370121002197266
using-flash
Attention linproj: 0.0005307197570800781
Post-attention Dropout: 0.0003383159637451172
Post-attention residual: 0.00011420249938964844
LN2: 0.00011992454528808594
MLP_h_4h: 0.0036318302154541016
MLP_4h_h: 0.0017147064208984375
Post-MLP residual: 0.0003368854522705078
Attention layer time: 0.01713418960571289
LN1: 0.00013828277587890625
QKV Transform: 0.0026111602783203125
using-flash
Attention linproj: 0.0005273818969726562
Post-attention Dropout: 0.00033473968505859375
Post-attention residual: 0.00011277198791503906
LN2: 0.00011515617370605469
MLP_h_4h: 0.0036580562591552734
MLP_4h_h: 0.0017180442810058594
Post-MLP residual: 0.00033855438232421875
Attention layer time: 0.01721358299255371
LN1: 0.00013184547424316406
QKV Transform: 0.002522706985473633
using-flash
Attention linproj: 0.0005309581756591797
Post-attention Dropout: 0.0003368854522705078
Post-attention residual: 0.00011205673217773438
LN2: 0.00011563301086425781
MLP_h_4h: 0.003633737564086914
MLP_4h_h: 0.0017189979553222656
Post-MLP residual: 0.00033354759216308594
Attention layer time: 0.017108440399169922
LN1: 0.00012874603271484375
QKV Transform: 0.002454519271850586
using-flash
Attention linproj: 0.0005469322204589844