forked from PnYuan/Machine-Learning_ZhouZhihua
-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy path周志华《机器学习》课后习题解答系列(六):Ch5.5 - 编程实现BP算法.html
432 lines (365 loc) · 13.5 KB
/
周志华《机器学习》课后习题解答系列(六):Ch5.5 - 编程实现BP算法.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
<!DOCTYPE html>
<html>
<head>
<title>周志华《机器学习》课后习题解答系列(六):Ch5.5 - 编程实现BP算法</title>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
<style type="text/css">
/* GitHub stylesheet for MarkdownPad (http://markdownpad.com) */
/* Author: Nicolas Hery - http://nicolashery.com */
/* Version: b13fe65ca28d2e568c6ed5d7f06581183df8f2ff */
/* Source: https://github.com/nicolahery/markdownpad-github */
/* RESET
=============================================================================*/
html, body, div, span, applet, object, iframe, h1, h2, h3, h4, h5, h6, p, blockquote, pre, a, abbr, acronym, address, big, cite, code, del, dfn, em, img, ins, kbd, q, s, samp, small, strike, strong, sub, sup, tt, var, b, u, i, center, dl, dt, dd, ol, ul, li, fieldset, form, label, legend, table, caption, tbody, tfoot, thead, tr, th, td, article, aside, canvas, details, embed, figure, figcaption, footer, header, hgroup, menu, nav, output, ruby, section, summary, time, mark, audio, video {
margin: 0;
padding: 0;
border: 0;
}
/* BODY
=============================================================================*/
body {
font-family: Helvetica, arial, freesans, clean, sans-serif;
font-size: 14px;
line-height: 1.6;
color: #333;
background-color: #fff;
padding: 20px;
max-width: 960px;
margin: 0 auto;
}
body>*:first-child {
margin-top: 0 !important;
}
body>*:last-child {
margin-bottom: 0 !important;
}
/* BLOCKS
=============================================================================*/
p, blockquote, ul, ol, dl, table, pre {
margin: 15px 0;
}
/* HEADERS
=============================================================================*/
h1, h2, h3, h4, h5, h6 {
margin: 20px 0 10px;
padding: 0;
font-weight: bold;
-webkit-font-smoothing: antialiased;
}
h1 tt, h1 code, h2 tt, h2 code, h3 tt, h3 code, h4 tt, h4 code, h5 tt, h5 code, h6 tt, h6 code {
font-size: inherit;
}
h1 {
font-size: 28px;
color: #000;
}
h2 {
font-size: 24px;
border-bottom: 1px solid #ccc;
color: #000;
}
h3 {
font-size: 18px;
}
h4 {
font-size: 16px;
}
h5 {
font-size: 14px;
}
h6 {
color: #777;
font-size: 14px;
}
body>h2:first-child, body>h1:first-child, body>h1:first-child+h2, body>h3:first-child, body>h4:first-child, body>h5:first-child, body>h6:first-child {
margin-top: 0;
padding-top: 0;
}
a:first-child h1, a:first-child h2, a:first-child h3, a:first-child h4, a:first-child h5, a:first-child h6 {
margin-top: 0;
padding-top: 0;
}
h1+p, h2+p, h3+p, h4+p, h5+p, h6+p {
margin-top: 10px;
}
/* LINKS
=============================================================================*/
a {
color: #4183C4;
text-decoration: none;
}
a:hover {
text-decoration: underline;
}
/* LISTS
=============================================================================*/
ul, ol {
padding-left: 30px;
}
ul li > :first-child,
ol li > :first-child,
ul li ul:first-of-type,
ol li ol:first-of-type,
ul li ol:first-of-type,
ol li ul:first-of-type {
margin-top: 0px;
}
ul ul, ul ol, ol ol, ol ul {
margin-bottom: 0;
}
dl {
padding: 0;
}
dl dt {
font-size: 14px;
font-weight: bold;
font-style: italic;
padding: 0;
margin: 15px 0 5px;
}
dl dt:first-child {
padding: 0;
}
dl dt>:first-child {
margin-top: 0px;
}
dl dt>:last-child {
margin-bottom: 0px;
}
dl dd {
margin: 0 0 15px;
padding: 0 15px;
}
dl dd>:first-child {
margin-top: 0px;
}
dl dd>:last-child {
margin-bottom: 0px;
}
/* CODE
=============================================================================*/
pre, code, tt {
font-size: 12px;
font-family: Consolas, "Liberation Mono", Courier, monospace;
}
code, tt {
margin: 0 0px;
padding: 0px 0px;
white-space: nowrap;
border: 1px solid #eaeaea;
background-color: #f8f8f8;
border-radius: 3px;
}
pre>code {
margin: 0;
padding: 0;
white-space: pre;
border: none;
background: transparent;
}
pre {
background-color: #f8f8f8;
border: 1px solid #ccc;
font-size: 13px;
line-height: 19px;
overflow: auto;
padding: 6px 10px;
border-radius: 3px;
}
pre code, pre tt {
background-color: transparent;
border: none;
}
kbd {
-moz-border-bottom-colors: none;
-moz-border-left-colors: none;
-moz-border-right-colors: none;
-moz-border-top-colors: none;
background-color: #DDDDDD;
background-image: linear-gradient(#F1F1F1, #DDDDDD);
background-repeat: repeat-x;
border-color: #DDDDDD #CCCCCC #CCCCCC #DDDDDD;
border-image: none;
border-radius: 2px 2px 2px 2px;
border-style: solid;
border-width: 1px;
font-family: "Helvetica Neue",Helvetica,Arial,sans-serif;
line-height: 10px;
padding: 1px 4px;
}
/* QUOTES
=============================================================================*/
blockquote {
border-left: 4px solid #DDD;
padding: 0 15px;
color: #777;
}
blockquote>:first-child {
margin-top: 0px;
}
blockquote>:last-child {
margin-bottom: 0px;
}
/* HORIZONTAL RULES
=============================================================================*/
hr {
clear: both;
margin: 15px 0;
height: 0px;
overflow: hidden;
border: none;
background: transparent;
border-bottom: 4px solid #ddd;
padding: 0;
}
/* TABLES
=============================================================================*/
table th {
font-weight: bold;
}
table th, table td {
border: 1px solid #ccc;
padding: 6px 13px;
}
table tr {
border-top: 1px solid #ccc;
background-color: #fff;
}
table tr:nth-child(2n) {
background-color: #f8f8f8;
}
/* IMAGES
=============================================================================*/
img {
max-width: 100%
}
</style>
</head>
<body>
<p>这里的编程基于<strong>Python-PyBrain</strong>。Pybrain是一个以神经网络为核心的机器学习包,相关内容可参考<a href="http://blog.csdn.net/snoopy_yuan/article/details/70170706">神经网络基础 - PyBrain机器学习包的使用</a></p>
<p>相关答案和源代码托管在我的Github上:<a href="https://github.com/PY131/Machine-Learning_ZhouZhihua">PY131/Machine-Learning_ZhouZhihua</a>.</p>
<h2>5.5 编程实现BP算法</h2>
<blockquote>
<p><img src="Ch5/5.5.png" />
<img src="Ch5/5.5.1.png" /></p>
</blockquote>
<p>编码基于Python实现,整个实验过程如下:(<a href="https://github.com/PY131/Machine-Learning_ZhouZhihua/blob/master/Ch5_neural_network/5.5_BP/src/BP_network.py">这里查看完整代码和数据集</a>):</p>
<p>step 1.基于<strong>PyBrain</strong>分别实现标准BP和累积BP两种算法下的BP网络训练,并进行比较;</p>
<h3>1.算法分析</h3>
<p>参考书上推导及<strong>算法图5.8</strong>,首先给出BP算法的两种版本示意如下:</p>
<pre><code>Algorithms 1. 标准BP算法
----
输入: 训练集 D,学习率 η.
过程:
1. 随即初始化连接权与阈值 (ω,θ).
2. Repeat:
3. for x_k,y_k in D:
4. 根据当前参数计算出样本误差 E_k.
5. 根据公式计算出随机梯度项 g_k.
6. 根据公式更新 (ω,θ).
7. end for
8. until 达到停止条件
输出:(ω,θ) - 即相应的多层前馈神经网络.
----
Algorithms 2. 累积BP算法
----
输入: 训练集 D,学习率 η,迭代次数 n.
过程:
1. 随即初始化连接权与阈值 (ω,θ).
2. Repeat:
3. 根据当前参数计算出累积误差 E.
4. 根据公式计算出标准梯度项 g.
5. 根据公式更新 (ω,θ).
6. n = n-1
7. until n=0 or 达到停止条件
输出:(ω,θ) - 即相应的多层前馈神经网络.
----
</code></pre>
<p>可以看出,两种算法的本质区别类似于<strong>随机梯度下降法</strong>与<strong>标准梯度下降法</strong>的区别。pybrain包为我们实现这两种不同的算法提供了方便。我们只需要修改 pybrain.supervised.trainers 的初始化参数(如learningrate、batchlearning)并设置数据集遍历次数 trainEpochs() 即可。</p>
<h3>2.数据预处理</h3>
<p>从表4.3的西瓜数据集3.0可以看到,样本共有8个属性变量和一个输出变量。其中既有标称变量(色泽~触感、好瓜),也有连续变量(密度、含糖率)。</p>
<p>为了方便进行神经网络模型的搭建(主要是为对离散值进行数值计算),首先考虑对标称变量进行数值编码,这里我们采用pandas.get_dummies()函数进行输入的<strong>独热编码</strong>(转化为<strong>哑变量的形式</strong>),采用pybrain.datasets.ClassificationDataSet的_convertToOneOfMany()进行输出的独热编码。关于独热编码
原理可参考<a href="https://en.wikipedia.org/wiki/One-hot">One-hot_Wikipedia</a>或<a href="http://blog.sina.com.cn/s/blog_5252f6ca0102uy47.html">数据预处理之独热编码(One-Hot Encoding)</a></p>
<p>对“西瓜数据集3.0”进行独热编码:</p>
<p>编码前:</p>
<pre><code>编号 色泽 根蒂 敲声 纹理 脐部 触感 密度 含糖率 好瓜
0 1 青绿 蜷缩 浊响 清晰 凹陷 硬滑 0.697 0.460 是
1 2 乌黑 蜷缩 沉闷 清晰 凹陷 硬滑 0.774 0.376 是
2 3 乌黑 蜷缩 浊响 清晰 凹陷 硬滑 0.634 0.264 是
...
</code></pre>
<p>此时数据集大小[17,10],8输入,1输出。</p>
<p>编码后:</p>
<pre><code>编号 密度 含糖率 色泽_乌黑 色泽_浅白 色泽_青绿 根蒂_硬挺 根蒂_稍蜷 根蒂_蜷缩 敲声_沉闷 ... \
0 1 0.697 0.460 0 0 1 0 0 1 0 ...
1 2 0.774 0.376 1 0 0 0 0 1 1 ...
2 3 0.634 0.264 1 0 0 0 0 1 0 ...
...
纹理_模糊 纹理_清晰 纹理_稍糊 脐部_凹陷 脐部_平坦 脐部_稍凹 触感_硬滑 触感_软粘 好瓜_否 好瓜_是
0 0 1 0 1 0 0 1 0 0 1
1 0 1 0 1 0 0 1 0 0 1
2 0 1 0 1 0 0 1 0 0 1
...
</code></pre>
<p>此时数据集大小[17,22],19输入,2输出。</p>
<h3>3.模型训练与测试</h3>
<p>根据上面的数据,搭建一个19输入,2输出的<strong>前向反馈神经网络</strong>(BP network)。然后划分训练集与测试集,进行建模与验证实验。</p>
<p>实现说明,在pybrain中:splitWithProportion函数可直接划分数据;buildNetwork函数可用于搭建BP神经网络;BackpropTrainer用于生成训练模版并可基于此进行训练,改变相关参数可分别实现标准BP算法和累积BP算法;</p>
<ol>
<li>
<p>生成模型,pybrain默认的是Sigmoid激活函数,其非常适用于二分类,另外还有一种激活函数十分适用于多分类(包括二分类),即<a href="https://en.wikipedia.org/wiki/Softmax_function">Softmax function</a>。这里我们将输出进行了独热编码, 因此考虑采用Softmax作为输出层的激活函数,然后采用<strong>胜者通吃</strong>(winner-takes-all)法则确定分类结果。</p>
<p>模型生成样例代码:</p>
<pre><code>```python
n_h = 5 # hidden layer nodes number
net = buildNetwork(19, n_h, 2, outclass = SoftmaxLayer)
```
</code></pre>
</li>
<li>
<p>标准BP算法学习神经网络:</p>
<p>样例代码:</p>
<pre><code>```python
trainer = BackpropTrainer(net, trndata)
trainer.trainEpochs(1)
```
</code></pre>
</li>
<li>
<p>累积BP算法学习神经网络样例代码(50次迭代):</p>
<p>样例代码:</p>
<pre><code>```python
trainer = BackpropTrainer(net, trndata, batchlearning=True)
trainer.trainEpochs(50)
```
</code></pre>
<p>此外还可以绘制出累积BP算法参数学习过程的收敛曲线,<a href="https://github.com/PY131/Machine-Learning_ZhouZhihua/blob/master/Ch5_neural_network/5.5_BP/src/BP_network.py">查看详细代码</a>:</p>
</li>
<li>
<p>两种算法比较:</p>
<p>上述两种BP算法实现的代码区别可参考<a href="http://pybrain.org/docs/api/supervised/trainers.html?highlight=testonclassdata#pybrain.supervised.trainers.BackpropTrainer.testOnClassData">PyBrain官网: trainers – - Supervised Training for Networks and other Modules</a></p>
<p>进行一次训练,然后基于测试集预测,得出两种方法的预测精度如下:</p>
<pre><code>标准BP算法: epoch: 1 test error: 50.00%
累积BP算法: epoch: 50 test error: 25.00%
</code></pre>
<p>可以看出,本次实验累积BP算法优于前者,但一次实验说服力不够,于是我们进行多次实验得出预测结果平均精度比较如下:</p>
<pre><code>标准BP算法:
25.00% 75.00% 75.00% 75.00% 50.00% 50.00% ...
average error rate: 47.50%
累积BP算法:
25.00% 75.00% 50.00% 50.00% 25.00% 50.00% ...
average error rate: 38.75%
</code></pre>
<p>从结果可以看出,累积BP算法精度总体还是优于标准BP算法。但在实验过程中我们注意到,累积BP算法的运行时间远大于标准BP算法。</p>
<p>进一步地,我们注意到,由于数据集限制(样本量太少),模型精度很差。</p>
</li>
</ol>
<h3>4.参考</h3>
<p>本文涉及到的一些重要参考如下:</p>
<ul>
<li><a href="http://pybrain.org/docs/index.html">Pybrain官网</a></li>
<li><a href="http://blog.csdn.net/snoopy_yuan/article/details/70170706">神经网络基础 - PyBrain机器学习包的使用</a></li>
</ul>
<hr />
</body>
</html>
<!-- This document was created with MarkdownPad, the Markdown editor for Windows (http://markdownpad.com) -->