forked from chenzomi12/AISystem
-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy path06.srt
1824 lines (1368 loc) · 27.6 KB
/
06.srt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
1
00:00:00,000 --> 00:00:04,440
字幕校对:米哈游天下第一
2
00:00:05,560 --> 00:00:09,120
哈喽大家好,我是那个又秃头来又发胖
3
00:00:09,120 --> 00:00:11,920
开头还得正能量的ZOMI
4
00:00:12,400 --> 00:00:16,680
每一节课呢,我都要能量满满的给大家去录课
5
00:00:16,680 --> 00:00:18,680
不然呢,大家会听得很无聊
6
00:00:19,120 --> 00:00:21,040
ZOMI发现录了100多期课呢
7
00:00:21,040 --> 00:00:22,560
我现在在B站上面呢
8
00:00:22,560 --> 00:00:25,720
赚的钱还没有达到提现的额度
9
00:00:26,040 --> 00:00:27,800
如果我有机会提现呢
10
00:00:27,800 --> 00:00:31,360
我会把这些钱呢,都捐给到中国儿童基金会
11
00:00:32,400 --> 00:00:35,960
今天的内容还是在AI芯片基础
12
00:00:35,960 --> 00:00:41,560
不过呢,之前其实讲了很多跟AI芯片没有相关的内容
13
00:00:41,560 --> 00:00:43,400
例如,通用处理器CPU呢
14
00:00:43,400 --> 00:00:47,840
怎么从数据去看CPU的计算的时延和内存的带宽
15
00:00:47,840 --> 00:00:52,200
接着呢,去了解了一下图形图像处理器GPU
16
00:00:52,200 --> 00:00:55,840
今天才正式的来到AI专用处理器
17
00:00:55,840 --> 00:00:58,120
叫做NPU也好,TPU也好
18
00:00:58,120 --> 00:01:00,440
都是一些专用的AI处理器
19
00:01:00,440 --> 00:01:05,800
那今天呢,简单的讲一讲AI专用处理器的一些基本的概念
20
00:01:05,800 --> 00:01:10,320
在后面的,应该是在下一节我会深入的去剖析GPU
21
00:01:10,320 --> 00:01:13,800
在下一下节呢,会深入的去讲讲NPU
22
00:01:14,240 --> 00:01:16,040
这里面呢,很有意思的就是
23
00:01:16,040 --> 00:01:18,200
ZOMI会用一些最近啊
24
00:01:18,200 --> 00:01:20,760
有重要性的代表的AI专用处理器
25
00:01:20,760 --> 00:01:22,160
华为昇腾的NPU呢
26
00:01:22,160 --> 00:01:23,280
谷歌的TPU呢
27
00:01:23,280 --> 00:01:25,080
还有特斯拉的DOJO呢
28
00:01:25,240 --> 00:01:27,840
包括一些国内外的其他的AI芯片
29
00:01:27,840 --> 00:01:29,840
去看看它们的一些整体的架构
30
00:01:29,840 --> 00:01:32,480
然后看一下它里面处理器有什么不一样
31
00:01:32,480 --> 00:01:36,880
从而去结束整个AI专用处理器里面的核心内容
32
00:01:36,880 --> 00:01:39,440
今天回到AI专用处理器呢
33
00:01:39,440 --> 00:01:41,040
主要有几个方面
34
00:01:41,040 --> 00:01:44,720
首先呢,看一下了解一下什么是AI芯片
35
00:01:44,720 --> 00:01:47,320
接着呢,去看看AI芯片的任务
36
00:01:47,320 --> 00:01:48,840
还有它的部署的情况
37
00:01:48,840 --> 00:01:51,200
也就是AI芯片用在哪里
38
00:01:51,200 --> 00:01:53,440
接着呢,去回顾
39
00:01:53,480 --> 00:01:57,920
去了解AI芯片的技术路线有几个选项
40
00:01:57,920 --> 00:02:00,520
最后就是AI芯片的应用场景了
41
00:02:00,520 --> 00:02:03,560
今天主要是分开这四部分给大家去汇报的
42
00:02:04,880 --> 00:02:06,960
首先看第一个内容
43
00:02:06,960 --> 00:02:09,200
什么是AI芯片呢
44
00:02:09,200 --> 00:02:10,360
AIchip
45
00:02:10,360 --> 00:02:12,760
NPU还有DPU
46
00:02:12,760 --> 00:02:14,000
首先AI芯片呢
47
00:02:14,000 --> 00:02:17,160
它属于一个特殊领域的一个体系结构
48
00:02:17,160 --> 00:02:19,960
叫Domain Specific Architecture
49
00:02:19,960 --> 00:02:21,000
在CPU上面呢
50
00:02:21,000 --> 00:02:24,440
可能会执行一些非常之通用的应用程序
51
00:02:24,440 --> 00:02:25,880
那某些应用程序呢
52
00:02:25,880 --> 00:02:30,440
可能会通过特殊的专用的芯片进行加速
53
00:02:30,440 --> 00:02:31,080
所以这个呢
54
00:02:31,080 --> 00:02:33,080
通常叫做DSA
55
00:02:33,080 --> 00:02:33,680
假设呢
56
00:02:33,680 --> 00:02:35,200
所有的内容呢
57
00:02:35,200 --> 00:02:36,880
都是围绕着应用来走的
58
00:02:36,880 --> 00:02:37,400
旁边呢
59
00:02:37,400 --> 00:02:41,440
就会出现非常多不同的特殊的芯片
60
00:02:41,440 --> 00:02:43,840
例如现在的解码的芯片呢
61
00:02:43,840 --> 00:02:45,640
还有FPGA的芯片呢
62
00:02:45,640 --> 00:02:47,920
都是属于一些专用领域的芯片
63
00:02:47,920 --> 00:02:49,120
叫做DSA
64
00:02:49,120 --> 00:02:50,080
那它的好处呢
65
00:02:50,080 --> 00:02:54,600
就是可以专门针对某一类型的应用作为加速的
66
00:02:54,600 --> 00:02:58,600
最开始的就有之前提到过的图形图像处理器
67
00:02:58,600 --> 00:02:59,520
GPU
68
00:02:59,520 --> 00:02:59,960
当然了
69
00:02:59,960 --> 00:03:01,720
它现在叫做GPGPU了
70
00:03:01,720 --> 00:03:05,240
就不仅仅能够处理一些图形图像的加速
71
00:03:05,240 --> 00:03:07,640
还能够处理很多并行的内容
72
00:03:07,640 --> 00:03:08,760
今天的主角呢
73
00:03:08,760 --> 00:03:10,640
主要是AI芯片
74
00:03:10,640 --> 00:03:11,880
叫做AI加速器
75
00:03:11,880 --> 00:03:13,120
AI计算卡多好
76
00:03:13,120 --> 00:03:16,680
它是专门用来处理AI应用
77
00:03:16,680 --> 00:03:17,960
神经网络
78
00:03:17,960 --> 00:03:19,800
深度学习
79
00:03:19,800 --> 00:03:23,000
专门对这一类的计算进行加速的
80
00:03:23,000 --> 00:03:25,320
叫做AI芯片
81
00:03:25,320 --> 00:03:27,280
往右边的这个图一看呢
82
00:03:27,280 --> 00:03:30,600
就是AI芯片里面的Dataflow的架构
83
00:03:30,600 --> 00:03:31,680
可以看到这里面呢
84
00:03:31,680 --> 00:03:33,560
有非常多的feature map
85
00:03:33,560 --> 00:03:35,200
还有那个Input Kernel
86
00:03:35,200 --> 00:03:36,120
这些内容呢
87
00:03:36,120 --> 00:03:37,760
都写在硬件上面
88
00:03:37,760 --> 00:03:38,920
都是以AI
89
00:03:38,920 --> 00:03:41,760
都是以神经网络的计算模式
90
00:03:41,760 --> 00:03:46,240
作为芯片的硬件的基础和设计的原则
91
00:03:46,240 --> 00:03:46,720
下面呢
92
00:03:46,720 --> 00:03:48,120
来看看AI芯片
93
00:03:48,160 --> 00:03:50,960
CPU跟GPU最大的架构的差异
94
00:03:50,960 --> 00:03:52,480
下面的这三个图呢
95
00:03:52,480 --> 00:03:54,840
就是最简单的一个架构图
96
00:03:54,840 --> 00:03:55,440
CPU呢
97
00:03:55,440 --> 00:03:57,320
可以看到大部分的工作呢
98
00:03:57,320 --> 00:03:59,240
都是在做一个控制
99
00:03:59,240 --> 00:03:59,760
里面呢
100
00:03:59,760 --> 00:04:03,440
就占了芯片电路面积的大部分
101
00:04:03,440 --> 00:04:05,640
而里面的计算单元呢
102
00:04:05,640 --> 00:04:07,120
其实并不多啊
103
00:04:07,120 --> 00:04:09,760
经常谈到的4核8核
104
00:04:09,760 --> 00:04:11,560
到现在的32核
105
00:04:11,560 --> 00:04:13,560
它的核数还是非常的少的
106
00:04:13,560 --> 00:04:15,000
而了解到GPU呢
107
00:04:15,000 --> 00:04:17,080
可以看到里面的SM数啊
108
00:04:17,080 --> 00:04:19,840
里面的计算单元就有3000个
109
00:04:19,840 --> 00:04:21,160
是非常的夸张
110
00:04:21,160 --> 00:04:25,200
而这里面GPU的控制单元反倒是很少
111
00:04:25,200 --> 00:04:26,680
但NPU呢
112
00:04:26,680 --> 00:04:29,040
更多的是以AI core
113
00:04:29,040 --> 00:04:31,200
Tensor core这种方式呢
114
00:04:31,200 --> 00:04:32,880
进行一个加速的
115
00:04:32,880 --> 00:04:33,880
那这个AI core呢
116
00:04:33,880 --> 00:04:37,960
就是专门用来加速神经网络里面的卷积啊
117
00:04:37,960 --> 00:04:38,720
Transformer啊
118
00:04:38,720 --> 00:04:40,320
MatMul这种计算
119
00:04:41,920 --> 00:04:42,520
下面呢
120
00:04:42,520 --> 00:04:44,320
看看第二个内容
121
00:04:44,320 --> 00:04:45,440
AI芯片的任务
122
00:04:45,440 --> 00:04:46,760
还有它的部署
123
00:04:46,800 --> 00:04:48,800
Text and Development
124
00:04:50,320 --> 00:04:53,240
在AI芯片里面的任务分为两种
125
00:04:53,240 --> 00:04:53,840
第一种呢
126
00:04:53,840 --> 00:04:55,000
就是训练
127
00:04:55,000 --> 00:04:55,600
第二种呢
128
00:04:55,600 --> 00:04:56,840
就是推理
129
00:04:56,840 --> 00:04:57,560
那训练呢
130
00:04:57,560 --> 00:04:58,600
大家都知道了
131
00:04:58,600 --> 00:04:59,960
我简单的去讲讲
132
00:04:59,960 --> 00:05:00,400
训练呢
133
00:05:00,400 --> 00:05:00,880
首先呢
134
00:05:00,880 --> 00:05:03,960
需要输入一系列的数据集
135
00:05:03,960 --> 00:05:04,600
那这里面呢
136
00:05:04,600 --> 00:05:05,560
可能以mini-batch
137
00:05:05,560 --> 00:05:06,120
micro-batch
138
00:05:06,120 --> 00:05:08,600
或者bit-batch的方式呢
139
00:05:08,600 --> 00:05:09,320
数据呢
140
00:05:09,320 --> 00:05:10,720
输进去神经网络里面呢
141
00:05:10,720 --> 00:05:11,920
进行前向的计算
142
00:05:11,920 --> 00:05:12,360
并且呢
143
00:05:12,360 --> 00:05:14,600
计算出具体的损失值
144
00:05:14,640 --> 00:05:15,200
然后呢
145
00:05:15,200 --> 00:05:16,800
通过反向传播
146
00:05:16,800 --> 00:05:18,520
就反向梯度的计算呢
147
00:05:18,520 --> 00:05:19,400
利用优化器呢
148
00:05:19,400 --> 00:05:22,080
来更新整个网络模型
149
00:05:22,080 --> 00:05:23,960
使得总体的损失
150
00:05:23,960 --> 00:05:25,640
lost最小
151
00:05:25,640 --> 00:05:26,080
这个呢
152
00:05:26,080 --> 00:05:29,160
就是计算的本质和计算的原理
153
00:05:29,160 --> 00:05:29,640
接着呢
154
00:05:29,640 --> 00:05:30,480
计算完之后呢
155
00:05:30,480 --> 00:05:31,920
就会导到推理
156
00:05:31,920 --> 00:05:33,640
真正的应用部署
157
00:05:33,640 --> 00:05:34,480
那推理的时候呢
158
00:05:34,480 --> 00:05:35,320
因为刚才呢
159
00:05:35,320 --> 00:05:37,400
已经把神经网络啊
160
00:05:37,400 --> 00:05:38,360
固化下来了
161
00:05:38,360 --> 00:05:39,480
权重参数啊
162
00:05:39,480 --> 00:05:40,920
都已经训练学习好了
163
00:05:40,920 --> 00:05:41,560
于是呢
164
00:05:41,560 --> 00:05:43,840
只需要拿出一小部分的数据
165
00:05:43,840 --> 00:05:44,880
真实的数据
166
00:05:44,880 --> 00:05:46,640
然后执行一个前向
167
00:05:46,640 --> 00:05:47,160
最终呢
168
00:05:47,160 --> 00:05:49,320
就得到分类预测检测
169
00:05:49,320 --> 00:05:51,560
生成不同的任务
170
00:05:51,560 --> 00:05:52,880
不同的结果
171
00:05:54,360 --> 00:05:55,720
AI芯片的部署方式呢
172
00:05:55,720 --> 00:05:57,040
有非常的多啊
173
00:05:57,040 --> 00:05:58,480
主要有端边云
174
00:05:58,480 --> 00:06:00,200
那现在来看看云测呢
175
00:06:00,200 --> 00:06:02,520
主要是部署一些训练的芯片
176
00:06:02,520 --> 00:06:03,280
训练的卡
177
00:06:03,280 --> 00:06:04,080
而边缘呢
178
00:06:04,080 --> 00:06:05,320
就是边缘设备啊
179
00:06:05,320 --> 00:06:06,320
手机啊耳机啊
180
00:06:06,320 --> 00:06:07,000
手环啊
181
00:06:07,000 --> 00:06:10,440
这些真正的去运行一些具体的应用
182
00:06:10,440 --> 00:06:11,040
那当然呢
183
00:06:11,040 --> 00:06:12,560
还有一些端侧的
184
00:06:12,560 --> 00:06:13,560
例如摄像头了
185
00:06:13,560 --> 00:06:14,960
小型基站了
186
00:06:14,960 --> 00:06:16,520
在不同的部署形态里面呢
187
00:06:16,520 --> 00:06:17,760
会部署不同的应用
188
00:06:17,760 --> 00:06:19,920
还有不同的芯片系列
189
00:06:21,240 --> 00:06:21,680
下面呢
190
00:06:21,680 --> 00:06:22,840
再继续展开一下
191
00:06:22,840 --> 00:06:24,360
其实在推理引擎里面呢
192
00:06:24,360 --> 00:06:26,960
我已经给大家去介绍过这一系列的内容
193
00:06:26,960 --> 00:06:28,320
下面简单的看看
194
00:06:28,320 --> 00:06:29,960
其实它有非常多的方式
195
00:06:29,960 --> 00:06:31,080
例如边缘的设备了
196
00:06:31,080 --> 00:06:32,920
主要是部署一些小模型
197
00:06:32,920 --> 00:06:33,360
然后呢
198
00:06:33,360 --> 00:06:36,160
边缘设备跟边缘服务器呢
199
00:06:36,160 --> 00:06:38,400
就是端边协同的时候呢
200
00:06:38,400 --> 00:06:40,880
可能就会由边缘进行决策
201
00:06:40,880 --> 00:06:42,120
然后较大的模型呢
202
00:06:42,120 --> 00:06:43,280
在边缘服务器
203
00:06:43,280 --> 00:06:44,760
在端侧进行学习
204
00:06:44,760 --> 00:06:45,520
那第三种呢
205
00:06:45,520 --> 00:06:49,360
就是边和云进行协同操作
206
00:06:49,360 --> 00:06:51,640
那第四种就是刚才讲到的三种
207
00:06:51,640 --> 00:06:53,920
端边云进行同时协同
208
00:06:53,920 --> 00:06:55,440
第五种其实也是端边云
209
00:06:55,440 --> 00:06:56,920
进行同时协同的方式
210
00:06:56,920 --> 00:06:58,960
所以说AI芯片的部署方式呢
211
00:06:58,960 --> 00:07:01,040
不仅仅能够完全部署在云端
212
00:07:01,040 --> 00:07:02,240
完全部署在边端
213
00:07:02,240 --> 00:07:04,200
或者完全部署在端侧
214
00:07:04,200 --> 00:07:06,840
端边云都有可能去部署的
215
00:07:06,840 --> 00:07:09,880
这个也是不知道是华为内部的一个概念
216
00:07:09,880 --> 00:07:11,600
还是大家业界通用的概念呢
217
00:07:11,640 --> 00:07:13,520
反正华为的设备非常的多嘛
218
00:07:14,720 --> 00:07:16,760
接下来呢来到第三个内容
219
00:07:16,760 --> 00:07:19,800
就是AI芯片的技术路线
220
00:07:19,800 --> 00:07:22,280
它的AI chip roadmap
221
00:07:22,280 --> 00:07:22,880
首先呢
222
00:07:22,880 --> 00:07:24,240
AI芯片的部署路线呢
223
00:07:24,240 --> 00:07:24,960
有三种
224
00:07:24,960 --> 00:07:25,440
第一种呢
225
00:07:25,440 --> 00:07:27,120
就是大家用的非常多
226
00:07:27,120 --> 00:07:30,200
上一节给大家去普及过的GPU
227
00:07:30,200 --> 00:07:30,720
第二种呢
228
00:07:30,720 --> 00:07:32,000
就是FPGA
229
00:07:32,000 --> 00:07:33,640
第三种就是ASIC
230
00:07:33,640 --> 00:07:34,720
可以看到呢
231
00:07:34,720 --> 00:07:35,960
像定制化程度呢
232
00:07:35,960 --> 00:07:38,120
GPU是做的最通用的
233
00:07:38,120 --> 00:07:38,680
而FPGA呢
234
00:07:38,680 --> 00:07:41,760
ASKA是不是说那么的通用
235
00:07:41,760 --> 00:07:43,360
而编程语言就很有意思了
236
00:07:43,360 --> 00:07:44,360
像GPU呢
237
00:07:44,360 --> 00:07:46,480
它有那个CUDA跟OpenCL
238
00:07:46,480 --> 00:07:47,360
那上一节呢
239
00:07:47,360 --> 00:07:48,600
我其实给大家讲过了
240
00:07:48,600 --> 00:07:49,960
CUDA是英伟达推出的
241
00:07:49,960 --> 00:07:50,680
OpenCL呢
242
00:07:50,680 --> 00:07:52,200
是苹果推出的
243
00:07:52,200 --> 00:07:53,120
而FPGA呢
244
00:07:53,120 --> 00:07:54,160
都有自己的硬件
245
00:07:54,160 --> 00:07:56,640
当然它也可以对接到那个OpenCL里面
246
00:07:56,640 --> 00:07:57,560
而ASKA里面呢
247
00:07:57,560 --> 00:07:58,200
其实现在呢
248
00:07:58,200 --> 00:08:01,160
没有太多对应的特殊的编程语言
249
00:08:01,160 --> 00:08:02,040
那你说TVM呢
250
00:08:02,040 --> 00:08:03,600
它可能是其中一种啊