forked from chenzomi12/AISystem
-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy path07.srt
2076 lines (1557 loc) · 31.6 KB
/
07.srt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
1
00:00:00,000 --> 00:00:04,000
[字幕生成:BLACK 字幕校对:志宇]
2
00:00:07,000 --> 00:00:09,000
刚才刷完抖音刷到深夜
3
00:00:09,000 --> 00:00:11,000
然后现在睡不着了,起来录个课
4
00:00:17,000 --> 00:00:21,000
今天已经来到LLVM的第三节
5
00:00:21,000 --> 00:00:23,000
深度剖析LLVM
6
00:00:23,000 --> 00:00:27,000
今天要讲的内容主要是围绕LLVM的后端
7
00:00:27,000 --> 00:00:32,000
就是后端CodeGen生成会员代码或者生成一些代码的指令
8
00:00:32,000 --> 00:00:36,000
接着会聊一聊基于LLVM的一些AI项目
9
00:00:38,000 --> 00:00:39,000
在进入正式的内容之前
10
00:00:39,000 --> 00:00:41,000
其实我有一个非常大的疑问
11
00:00:41,000 --> 00:00:44,000
就是一开始我做一些MySQL的进阶视频的时候
12
00:00:44,000 --> 00:00:47,000
我的粉丝量永远不超过100个
13
00:00:47,000 --> 00:00:51,000
也就是说我没有一个视频是超过100的浏览量
14
00:00:51,000 --> 00:00:53,000
几乎没有人看
15
00:00:58,000 --> 00:01:02,000
但是当我就发现我要不要做一些AI系统
16
00:01:02,000 --> 00:01:05,000
AI框架计算图比较通用性的东西
17
00:01:05,000 --> 00:01:07,000
于是我就做了一些视频
18
00:01:07,000 --> 00:01:11,000
然后这些视频的浏览量都不超过200到300
19
00:01:11,000 --> 00:01:14,000
那时候我的粉丝量的增加非常可怜
20
00:01:14,000 --> 00:01:18,000
基本上运作了两三个月也就300多个粉丝
21
00:01:18,000 --> 00:01:21,000
然后我想着最近大模型非常火
22
00:01:21,000 --> 00:01:25,000
要不要来搞些大模型的教程呢
23
00:01:26,000 --> 00:01:28,000
于是我又去系列的去梳理了大模型
24
00:01:28,000 --> 00:01:31,000
从张量并行,流水型并行,通讯原语
25
00:01:31,000 --> 00:01:34,000
各种大模型相关的技术全部都梳理了一遍之后
26
00:01:34,000 --> 00:01:36,000
然后我就发现
27
00:01:36,000 --> 00:01:39,000
似乎都好像也不爱看这玩意
28
00:01:39,000 --> 00:01:42,000
我的粉丝量永远上不去600
29
00:01:42,000 --> 00:01:44,000
也就是我耕耘了两个月之后
30
00:01:44,000 --> 00:01:46,000
我发现还是没有什么观看
31
00:01:46,000 --> 00:01:56,000
结果什么鬼
32
00:01:56,000 --> 00:02:01,000
我一开始以为LLVM这种老到掉渣
33
00:02:01,000 --> 00:02:03,000
而且还很硬很难啃的技术
34
00:02:03,000 --> 00:02:06,000
居然浏览量
35
00:02:06,000 --> 00:02:10,000
居然浏览量是我发过所有视频里面最高的
36
00:02:10,000 --> 00:02:11,000
我就想问问
37
00:02:11,000 --> 00:02:13,000
就是给我涨粉的这些粉丝
38
00:02:13,000 --> 00:02:14,000
你们给我弹幕
39
00:02:14,000 --> 00:02:16,000
或者你看到这条视频的人
40
00:02:16,000 --> 00:02:20,000
你告诉我为什么你会去看LLVM
41
00:02:20,000 --> 00:02:22,000
那要是后面我讲完LLVM之后呢
42
00:02:22,000 --> 00:02:24,000
我讲完传统编译器
43
00:02:24,000 --> 00:02:25,000
又去讲AI编译器
44
00:02:25,000 --> 00:02:28,000
我估计又没什么人来观看了
45
00:02:28,000 --> 00:02:29,000
好了
46
00:02:29,000 --> 00:02:31,000
吐槽的东西或者我的疑问呢
47
00:02:31,000 --> 00:02:36,000
我现在真的是满头包
48
00:02:36,000 --> 00:02:37,000
回到正式的内容
49
00:02:37,000 --> 00:02:40,000
这一节还是介绍LLVM的一个架构
50
00:02:40,000 --> 00:02:41,000
那主要是集中在后端
51
00:02:41,000 --> 00:02:43,000
可以看到回顾一下上两节的内容
52
00:02:43,000 --> 00:02:46,000
上一节讲了LLVM的前端
53
00:02:46,000 --> 00:02:48,000
前端主要是对高级语言
54
00:02:48,000 --> 00:02:50,000
做一些词法的分析
55
00:02:50,000 --> 00:02:51,000
把那些词呢
56
00:02:51,000 --> 00:02:53,000
把高级语言的特性的编上token
57
00:02:53,000 --> 00:02:55,000
然后给语法分析
58
00:02:55,000 --> 00:02:56,000
语法分析主要是分析
59
00:02:56,000 --> 00:02:58,000
我这句话有没有写错
60
00:02:58,000 --> 00:03:00,000
而语意分析才是真正的去分析
61
00:03:00,000 --> 00:03:03,000
写的代码的逻辑有没有问题
62
00:03:03,000 --> 00:03:05,000
在这一步语法到语意分析
63
00:03:05,000 --> 00:03:07,000
它传输的是一个AST
64
00:03:07,000 --> 00:03:09,000
语法数这么一个概念
65
00:03:09,000 --> 00:03:12,000
那像这种就是所谓的语法数
66
00:03:12,000 --> 00:03:14,000
语法分析这个环节呢
67
00:03:14,000 --> 00:03:15,000
输出一个语法数
68
00:03:15,000 --> 00:03:16,000
给语意分析
69
00:03:16,000 --> 00:03:17,000
然后去分析每一句话
70
00:03:17,000 --> 00:03:18,000
逻辑
71
00:03:18,000 --> 00:03:20,000
代码到底错在哪里
72
00:03:20,000 --> 00:03:21,000
到底有没有错
73
00:03:21,000 --> 00:03:24,000
接着呢就走到了LLVM的优化层了
74
00:03:24,000 --> 00:03:26,000
优化层有非常多的Path
75
00:03:26,000 --> 00:03:28,000
不同的Path处理不同的任务
76
00:03:28,000 --> 00:03:30,000
那中间的所有的箭头都是处理
77
00:03:30,000 --> 00:03:33,000
LLVM这个数据结构的
78
00:03:33,000 --> 00:03:35,000
Path里面呢主要有两个概念
79
00:03:35,000 --> 00:03:37,000
第一个是分析的Path
80
00:03:37,000 --> 00:03:38,000
第二个是转换的Path
81
00:03:38,000 --> 00:03:41,000
而转换的Path才是真正处理的Path
82
00:03:41,000 --> 00:03:43,000
那接着呢今天要讲讲
83
00:03:43,000 --> 00:03:45,000
LLVM的后端CodeGene
84
00:03:45,000 --> 00:03:47,000
如何生成代码
85
00:03:50,000 --> 00:03:51,000
在这一节里面呢
86
00:03:51,000 --> 00:03:52,000
就会把编译器的前端
87
00:03:52,000 --> 00:03:55,000
优化层还有后端都讲了
88
00:03:55,000 --> 00:03:56,000
那在后端里面呢
89
00:03:56,000 --> 00:03:57,000
其实是最复杂的
90
00:03:57,000 --> 00:03:59,000
也是跟硬件强相关的
91
00:03:59,000 --> 00:04:01,000
所以看到每一个后端呢
92
00:04:01,000 --> 00:04:03,000
它都跟实际的硬件是相关的
93
00:04:03,000 --> 00:04:05,000
但是即使是硬件相关呢
94
00:04:05,000 --> 00:04:06,000
LLVM的后端呢
95
00:04:06,000 --> 00:04:08,000
也对它们做了一个约束
96
00:04:08,000 --> 00:04:09,000
做了一些指定的选择
97
00:04:09,000 --> 00:04:10,000
计算机的分配在做调度
98
00:04:10,000 --> 00:04:11,000
代码布局
99
00:04:11,000 --> 00:04:13,000
最后做到代码的组装
100
00:04:13,000 --> 00:04:14,000
那在这一步工作呢
101
00:04:14,000 --> 00:04:15,000
大部分都叫它
102
00:04:15,000 --> 00:04:18,000
CodeGene代码生成
103
00:04:18,000 --> 00:04:19,000
总的来说呢
104
00:04:19,000 --> 00:04:20,000
就是把LLVM的IR呢
105
00:04:20,000 --> 00:04:22,000
变成目标代码
106
00:04:22,000 --> 00:04:24,000
或者汇编代码
107
00:04:24,000 --> 00:04:25,000
在后端的处理呢
108
00:04:25,000 --> 00:04:27,000
实际上呢是非常复杂的
109
00:04:27,000 --> 00:04:29,000
整个后端的Pipeline流水线呢
110
00:04:29,000 --> 00:04:30,000
用到了不同的IR
111
00:04:30,000 --> 00:04:31,000
不同的指令
112
00:04:31,000 --> 00:04:32,000
那第一个呢
113
00:04:32,000 --> 00:04:33,000
就是LLVM的IR
114
00:04:33,000 --> 00:04:35,000
还有Selection DAG图
115
00:04:35,000 --> 00:04:36,000
还有Machine Instruction
116
00:04:36,000 --> 00:04:38,000
还有MC Instruction
117
00:04:38,000 --> 00:04:39,000
在最后一个阶段呢
118
00:04:39,000 --> 00:04:40,000
把LLVM的IR呢
119
00:04:40,000 --> 00:04:42,000
转换成为目标的汇编代码了
120
00:04:42,000 --> 00:04:44,000
需要经过非常多的
121
00:04:44,000 --> 00:04:45,000
若干的步骤
122
00:04:45,000 --> 00:04:46,000
就是下面看到的
123
00:04:46,000 --> 00:04:48,000
这个Pipeline
124
00:04:48,000 --> 00:04:49,000
LLVM IR最后呢
125
00:04:49,000 --> 00:04:51,000
就会变成跟后端
126
00:04:51,000 --> 00:04:52,000
非常之亲密友好的
127
00:04:52,000 --> 00:04:54,000
一些具体的指令
128
00:04:54,000 --> 00:04:55,000
函数或者全局变量的
129
00:04:55,000 --> 00:04:56,000
具体的表示
130
00:04:56,000 --> 00:04:58,000
还有寄存器的表示
131
00:04:58,000 --> 00:05:00,000
流水线越往下走呢
132
00:05:00,000 --> 00:05:01,000
就越贴近
133
00:05:01,000 --> 00:05:03,000
实际硬件的目标指令
134
00:05:03,000 --> 00:05:05,000
图中白色的这些Path呢
135
00:05:05,000 --> 00:05:07,000
就是一些非必要的Path
136
00:05:07,000 --> 00:05:08,000
而这些灰色的Path呢
137
00:05:08,000 --> 00:05:10,000
就叫做必须的Path
138
00:05:10,000 --> 00:05:12,000
也叫做Super Path
139
00:05:12,000 --> 00:05:13,000
下面可以看到
140
00:05:13,000 --> 00:05:15,000
这里面有五个Super Path
141
00:05:15,000 --> 00:05:16,000
也会逐个的Super Path
142
00:05:16,000 --> 00:05:18,000
去展开
143
00:05:18,000 --> 00:05:19,000
第一个Super Path呢
144
00:05:19,000 --> 00:05:21,000
叫做指令选择
145
00:05:21,000 --> 00:05:23,000
Instruction Selection
146
00:05:23,000 --> 00:05:25,000
LLVM IR呢
147
00:05:25,000 --> 00:05:26,000
作为指令选择的
148
00:05:26,000 --> 00:05:27,000
一个输入
149
00:05:27,000 --> 00:05:28,000
然后在输入的时候呢
150
00:05:28,000 --> 00:05:29,000
就会把它变成一个
151
00:05:29,000 --> 00:05:30,000
Selection DAG
152
00:05:30,000 --> 00:05:31,000
那DAG呢
153
00:05:31,000 --> 00:05:32,000
就是有相无款图
154
00:05:32,000 --> 00:05:33,000
把IR
155
00:05:33,000 --> 00:05:35,000
变成正正的一个图
156
00:05:35,000 --> 00:05:36,000
每一个DAG图呢
157
00:05:36,000 --> 00:05:37,000
就表示
158
00:05:37,000 --> 00:05:39,000
单一的一个基本块的计算
159
00:05:39,000 --> 00:05:40,000
那既然是图
160
00:05:40,000 --> 00:05:41,000
那就有节点和编
161
00:05:41,000 --> 00:05:43,000
节点就表示
162
00:05:43,000 --> 00:05:44,000
具体执行的指令
163
00:05:44,000 --> 00:05:45,000
而边呢
164
00:05:45,000 --> 00:05:46,000
就代表编码之间的
165
00:05:46,000 --> 00:05:48,000
一个数据流的依赖关系
166
00:05:48,000 --> 00:05:49,000
目标呢
167
00:05:49,000 --> 00:05:50,000
就是家出的这一行
168
00:05:50,000 --> 00:05:52,000
希望把LLVM的代码
169
00:05:52,000 --> 00:05:53,000
或者LLVM的IR呢
170
00:05:53,000 --> 00:05:55,000
生成程序库
171
00:05:55,000 --> 00:05:56,000
能够运行呢
172
00:05:56,000 --> 00:05:57,000
基于数的模式匹配的
173
00:05:57,000 --> 00:05:59,000
指令选择的算法
174
00:05:59,000 --> 00:06:00,000
这句话呢
175
00:06:00,000 --> 00:06:01,000
有点拗口
176
00:06:01,000 --> 00:06:02,000
其实到这个步骤为止呢
177
00:06:02,000 --> 00:06:03,000
指令选择
178
00:06:03,000 --> 00:06:05,000
把LLVM IR变成
179
00:06:05,000 --> 00:06:07,000
一个DAG图
180
00:06:07,000 --> 00:06:08,000
这个DAG图呢
181
00:06:08,000 --> 00:06:09,000
其实就是目标的
182
00:06:09,000 --> 00:06:11,000
机器代码的一个节点
183
00:06:11,000 --> 00:06:12,000
这些节点呢
184
00:06:12,000 --> 00:06:13,000
就代表目标的
185
00:06:13,000 --> 00:06:14,000
机器的指令了
186
00:06:14,000 --> 00:06:16,000
而不是LLVM的指令了
187
00:06:16,000 --> 00:06:17,000
LLVM的指令
188
00:06:17,000 --> 00:06:18,000
就是3D子结构嘛
189
00:06:18,000 --> 00:06:19,000
在上一节里面
190
00:06:19,000 --> 00:06:20,000
讲到了
191
00:06:20,000 --> 00:06:21,000
而现在呢
192
00:06:21,000 --> 00:06:23,000
就是真正的机器的指令
193
00:06:23,000 --> 00:06:24,000
变成DAG
194
00:06:24,000 --> 00:06:25,000
那DAG是一个图
195
00:06:25,000 --> 00:06:27,000
图非常方便用于建树
196
00:06:27,000 --> 00:06:29,000
通过指令选择算法呢
197
00:06:29,000 --> 00:06:31,000
去执行DAG的指令
198
00:06:31,000 --> 00:06:33,000
那第二个步骤呢
199
00:06:33,000 --> 00:06:34,000
就是指令调度
200
00:06:34,000 --> 00:06:36,000
Instruction Scheduling
201
00:06:36,000 --> 00:06:37,000
第二个步骤
202
00:06:37,000 --> 00:06:38,000
可以看到
203
00:06:38,000 --> 00:06:39,000
实际上有两个
204
00:06:39,000 --> 00:06:41,000
Instruction Scheduling
205
00:06:41,000 --> 00:06:42,000
从Pipeline里面呢
206
00:06:42,000 --> 00:06:43,000
可以看到
207
00:06:43,000 --> 00:06:45,000
它有两个Instruction Scheduling
208
00:06:45,000 --> 00:06:47,000
就是两个指令调度
209
00:06:47,000 --> 00:06:48,000
现在讲讲
210
00:06:48,000 --> 00:06:50,000
第一次指令调度的工作
211
00:06:50,000 --> 00:06:51,000
也就是我做一个
212
00:06:51,000 --> 00:06:53,000
寄存器的预分配
213
00:06:53,000 --> 00:06:54,000
刚才的第一步工作呢
214
00:06:54,000 --> 00:06:55,000
已经把它变成一个
215
00:06:55,000 --> 00:06:56,000
DAG的图了
216
00:06:56,000 --> 00:06:58,000
我对这些DAG的图的指令呢
217
00:06:58,000 --> 00:06:59,000
做一个排序
218
00:06:59,000 --> 00:07:01,000
就是对节点进行排序
219
00:07:01,000 --> 00:07:02,000
尽可能多的
220
00:07:02,000 --> 00:07:03,000
去发现这些
221
00:07:03,000 --> 00:07:04,000
可以并行的一些指令
222
00:07:04,000 --> 00:07:06,000
同时把指令呢
223
00:07:06,000 --> 00:07:08,000
变成另外一种表示形式
224
00:07:08,000 --> 00:07:09,000
那这种表示形式
225
00:07:09,000 --> 00:07:10,000
其实也叫做IR
226
00:07:10,000 --> 00:07:11,000
但是这个IR呢
227
00:07:11,000 --> 00:07:13,000
叫做Machine Instruction
228
00:07:13,000 --> 00:07:15,000
三地址的表示方式
229
00:07:15,000 --> 00:07:16,000
在第三个步骤呢
230
00:07:16,000 --> 00:07:19,000
就是寄存器的分配
231
00:07:19,000 --> 00:07:21,000
Register Allocation
232
00:07:21,000 --> 00:07:23,000
在前面的章节里面
233
00:07:23,000 --> 00:07:24,000
其实已经谈到了
234
00:07:24,000 --> 00:07:26,000
寄存器是非常昂贵的
235
00:07:26,000 --> 00:07:28,000
而每一个硬件的寄存器
236
00:07:28,000 --> 00:07:29,000
都是有限的
237
00:07:29,000 --> 00:07:30,000
但是LVM IR里面呢
238
00:07:30,000 --> 00:07:32,000
就有两个比较重要的特性
239
00:07:32,000 --> 00:07:34,000
一个是SSA
240
00:07:34,000 --> 00:07:35,000
那第二个特性呢
241
00:07:35,000 --> 00:07:37,000
就是寄存器假设是无限的
242
00:07:37,000 --> 00:07:39,000
所以LVM IR里面用百分
243
00:07:39,000 --> 00:07:41,000
还代表无限的寄存器
244
00:07:41,000 --> 00:07:42,000
这个特性呢
245
00:07:42,000 --> 00:07:43,000
保持到这一步为止呢
246
00:07:43,000 --> 00:07:44,000
就终止了
247
00:07:44,000 --> 00:07:46,000
把LVM IR里面
248
00:07:46,000 --> 00:07:47,000
无限虚拟的寄存器呢
249
00:07:47,000 --> 00:07:49,000
转换成为实际上有目标
250
00:07:49,000 --> 00:07:50,000
有地址