ch5_neural_networks/周志华《机器学习》课后习题解答系列（六）：Ch5.5 - 编程实现BP算法.html

<!DOCTYPE html>
<html>
<head>
<title>周志华《机器学习》课后习题解答系列（六）：Ch5.5 - 编程实现BP算法</title>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
<style type="text/css">
/* GitHub stylesheet for MarkdownPad (http://markdownpad.com) */
/* Author: Nicolas Hery - http://nicolashery.com */
/* Version: b13fe65ca28d2e568c6ed5d7f06581183df8f2ff */
/* Source: https://github.com/nicolahery/markdownpad-github */

/* RESET
=============================================================================*/

html, body, div, span, applet, object, iframe, h1, h2, h3, h4, h5, h6, p, blockquote, pre, a, abbr, acronym, address, big, cite, code, del, dfn, em, img, ins, kbd, q, s, samp, small, strike, strong, sub, sup, tt, var, b, u, i, center, dl, dt, dd, ol, ul, li, fieldset, form, label, legend, table, caption, tbody, tfoot, thead, tr, th, td, article, aside, canvas, details, embed, figure, figcaption, footer, header, hgroup, menu, nav, output, ruby, section, summary, time, mark, audio, video {
  margin: 0;
  padding: 0;
  border: 0;
}

/* BODY
=============================================================================*/

body {
  font-family: Helvetica, arial, freesans, clean, sans-serif;
  font-size: 14px;
  line-height: 1.6;
  color: #333;
  background-color: #fff;
  padding: 20px;
  max-width: 960px;
  margin: 0 auto;
}

body>*:first-child {
  margin-top: 0 !important;
}

body>*:last-child {
  margin-bottom: 0 !important;
}

/* BLOCKS
=============================================================================*/

p, blockquote, ul, ol, dl, table, pre {
  margin: 15px 0;
}

/* HEADERS
=============================================================================*/

h1, h2, h3, h4, h5, h6 {
  margin: 20px 0 10px;
  padding: 0;
  font-weight: bold;
  -webkit-font-smoothing: antialiased;
}

h1 tt, h1 code, h2 tt, h2 code, h3 tt, h3 code, h4 tt, h4 code, h5 tt, h5 code, h6 tt, h6 code {
  font-size: inherit;
}

h1 {
  font-size: 28px;
  color: #000;
}

h2 {
  font-size: 24px;
  border-bottom: 1px solid #ccc;
  color: #000;
}

h3 {
  font-size: 18px;
}

h4 {
  font-size: 16px;
}

h5 {
  font-size: 14px;
}

h6 {
  color: #777;
  font-size: 14px;
}

body>h2:first-child, body>h1:first-child, body>h1:first-child+h2, body>h3:first-child, body>h4:first-child, body>h5:first-child, body>h6:first-child {
  margin-top: 0;
  padding-top: 0;
}

a:first-child h1, a:first-child h2, a:first-child h3, a:first-child h4, a:first-child h5, a:first-child h6 {
  margin-top: 0;
  padding-top: 0;
}

h1+p, h2+p, h3+p, h4+p, h5+p, h6+p {
  margin-top: 10px;
}

/* LINKS
=============================================================================*/

a {
  color: #4183C4;
  text-decoration: none;
}

a:hover {
  text-decoration: underline;
}

/* LISTS
=============================================================================*/

ul, ol {
  padding-left: 30px;
}

ul li > :first-child, 
ol li > :first-child, 
ul li ul:first-of-type, 
ol li ol:first-of-type, 
ul li ol:first-of-type, 
ol li ul:first-of-type {
  margin-top: 0px;
}

ul ul, ul ol, ol ol, ol ul {
  margin-bottom: 0;
}

dl {
  padding: 0;
}

dl dt {
  font-size: 14px;
  font-weight: bold;
  font-style: italic;
  padding: 0;
  margin: 15px 0 5px;
}

dl dt:first-child {
  padding: 0;
}

dl dt>:first-child {
  margin-top: 0px;
}

dl dt>:last-child {
  margin-bottom: 0px;
}

dl dd {
  margin: 0 0 15px;
  padding: 0 15px;
}

dl dd>:first-child {
  margin-top: 0px;
}

dl dd>:last-child {
  margin-bottom: 0px;
}

/* CODE
=============================================================================*/

pre, code, tt {
  font-size: 12px;
  font-family: Consolas, "Liberation Mono", Courier, monospace;
}

code, tt {
  margin: 0 0px;
  padding: 0px 0px;
  white-space: nowrap;
  border: 1px solid #eaeaea;
  background-color: #f8f8f8;
  border-radius: 3px;
}

pre>code {
  margin: 0;
  padding: 0;
  white-space: pre;
  border: none;
  background: transparent;
}

pre {
  background-color: #f8f8f8;
  border: 1px solid #ccc;
  font-size: 13px;
  line-height: 19px;
  overflow: auto;
  padding: 6px 10px;
  border-radius: 3px;
}

pre code, pre tt {
  background-color: transparent;
  border: none;
}

kbd {
    -moz-border-bottom-colors: none;
    -moz-border-left-colors: none;
    -moz-border-right-colors: none;
    -moz-border-top-colors: none;
    background-color: #DDDDDD;
    background-image: linear-gradient(#F1F1F1, #DDDDDD);
    background-repeat: repeat-x;
    border-color: #DDDDDD #CCCCCC #CCCCCC #DDDDDD;
    border-image: none;
    border-radius: 2px 2px 2px 2px;
    border-style: solid;
    border-width: 1px;
    font-family: "Helvetica Neue",Helvetica,Arial,sans-serif;
    line-height: 10px;
    padding: 1px 4px;
}

/* QUOTES
=============================================================================*/

blockquote {
  border-left: 4px solid #DDD;
  padding: 0 15px;
  color: #777;
}

blockquote>:first-child {
  margin-top: 0px;
}

blockquote>:last-child {
  margin-bottom: 0px;
}

/* HORIZONTAL RULES
=============================================================================*/

hr {
  clear: both;
  margin: 15px 0;
  height: 0px;
  overflow: hidden;
  border: none;
  background: transparent;
  border-bottom: 4px solid #ddd;
  padding: 0;
}

/* TABLES
=============================================================================*/

table th {
  font-weight: bold;
}

table th, table td {
  border: 1px solid #ccc;
  padding: 6px 13px;
}

table tr {
  border-top: 1px solid #ccc;
  background-color: #fff;
}

table tr:nth-child(2n) {
  background-color: #f8f8f8;
}

/* IMAGES
=============================================================================*/

img {
  max-width: 100%
}
</style>
</head>
<body>
<p>这里的编程基于<strong>Python-PyBrain</strong>。Pybrain是一个以神经网络为核心的机器学习包，相关内容可参考<a href="http://blog.csdn.net/snoopy_yuan/article/details/70170706">神经网络基础 - PyBrain机器学习包的使用</a></p>
<p>相关答案和源代码托管在我的Github上：<a href="https://github.com/PY131/Machine-Learning_ZhouZhihua">PY131/Machine-Learning_ZhouZhihua</a>.</p>
<h2>5.5 编程实现BP算法</h2>
<blockquote>
<p><img src="Ch5/5.5.png" />
<img src="Ch5/5.5.1.png" /></p>
</blockquote>
<p>编码基于Python实现，整个实验过程如下：（<a href="https://github.com/PY131/Machine-Learning_ZhouZhihua/blob/master/Ch5_neural_network/5.5_BP/src/BP_network.py">这里查看完整代码和数据集</a>）：</p>
<p>step 1.基于<strong>PyBrain</strong>分别实现标准BP和累积BP两种算法下的BP网络训练，并进行比较；</p>
<h3>1.算法分析</h3>
<p>参考书上推导及<strong>算法图5.8</strong>，首先给出BP算法的两种版本示意如下：</p>
<pre><code>Algorithms 1. 标准BP算法
----
    输入： 训练集 D，学习率 η.
    过程： 
        1. 随即初始化连接权与阈值 (ω，θ).
        2. Repeat：
        3.   for x_k，y_k in D:
        4.     根据当前参数计算出样本误差 E_k.
        5.     根据公式计算出随机梯度项 g_k.
        6.     根据公式更新 (ω，θ).
        7.   end for
        8. until 达到停止条件
    输出：(ω，θ) - 即相应的多层前馈神经网络.
----

Algorithms 2. 累积BP算法
----
    输入： 训练集 D，学习率 η，迭代次数 n.
    过程： 
        1. 随即初始化连接权与阈值 (ω，θ).
        2. Repeat：
        3.     根据当前参数计算出累积误差 E.
        4.     根据公式计算出标准梯度项 g.
        5.     根据公式更新 (ω，θ).
        6.     n = n-1
        7. until n=0 or 达到停止条件
    输出：(ω，θ) - 即相应的多层前馈神经网络.
----
</code></pre>

<p>可以看出，两种算法的本质区别类似于<strong>随机梯度下降法</strong>与<strong>标准梯度下降法</strong>的区别。pybrain包为我们实现这两种不同的算法提供了方便。我们只需要修改 pybrain.supervised.trainers 的初始化参数（如learningrate、batchlearning）并设置数据集遍历次数 trainEpochs() 即可。</p>
<h3>2.数据预处理</h3>
<p>从表4.3的西瓜数据集3.0可以看到，样本共有8个属性变量和一个输出变量。其中既有标称变量（色泽~触感、好瓜），也有连续变量（密度、含糖率）。</p>
<p>为了方便进行神经网络模型的搭建（主要是为对离散值进行数值计算），首先考虑对标称变量进行数值编码，这里我们采用pandas.get_dummies()函数进行输入的<strong>独热编码</strong>（转化为<strong>哑变量的形式</strong>），采用pybrain.datasets.ClassificationDataSet的_convertToOneOfMany()进行输出的独热编码。关于独热编码
原理可参考<a href="https://en.wikipedia.org/wiki/One-hot">One-hot_Wikipedia</a>或<a href="http://blog.sina.com.cn/s/blog_5252f6ca0102uy47.html">数据预处理之独热编码（One-Hot Encoding）</a></p>
<p>对“西瓜数据集3.0”进行独热编码：</p>
<p>编码前：</p>
<pre><code>编号  色泽  根蒂  敲声  纹理  脐部  触感     密度    含糖率 好瓜
0    1  青绿  蜷缩  浊响  清晰  凹陷  硬滑  0.697  0.460  是
1    2  乌黑  蜷缩  沉闷  清晰  凹陷  硬滑  0.774  0.376  是
2    3  乌黑  蜷缩  浊响  清晰  凹陷  硬滑  0.634  0.264  是
...
</code></pre>

<p>此时数据集大小[17,10]，8输入，1输出。</p>
<p>编码后：</p>
<pre><code>编号     密度    含糖率  色泽_乌黑  色泽_浅白  色泽_青绿  根蒂_硬挺  根蒂_稍蜷  根蒂_蜷缩  敲声_沉闷  ...   \
0    1  0.697  0.460      0      0      1      0      0      1      0  ...    
1    2  0.774  0.376      1      0      0      0      0      1      1  ...    
2    3  0.634  0.264      1      0      0      0      0      1      0  ...   
... 

纹理_模糊  纹理_清晰  纹理_稍糊  脐部_凹陷  脐部_平坦  脐部_稍凹  触感_硬滑  触感_软粘  好瓜_否  好瓜_是  
0       0      1      0      1      0      0      1      0     0     1  
1       0      1      0      1      0      0      1      0     0     1  
2       0      1      0      1      0      0      1      0     0     1  
...
</code></pre>

<p>此时数据集大小[17,22]，19输入，2输出。</p>
<h3>3.模型训练与测试</h3>
<p>根据上面的数据，搭建一个19输入，2输出的<strong>前向反馈神经网络</strong>（BP network）。然后划分训练集与测试集，进行建模与验证实验。</p>
<p>实现说明，在pybrain中：splitWithProportion函数可直接划分数据；buildNetwork函数可用于搭建BP神经网络；BackpropTrainer用于生成训练模版并可基于此进行训练，改变相关参数可分别实现标准BP算法和累积BP算法；</p>
<ol>
<li>
<p>生成模型，pybrain默认的是Sigmoid激活函数，其非常适用于二分类，另外还有一种激活函数十分适用于多分类（包括二分类），即<a href="https://en.wikipedia.org/wiki/Softmax_function">Softmax function</a>。这里我们将输出进行了独热编码， 因此考虑采用Softmax作为输出层的激活函数，然后采用<strong>胜者通吃</strong>（winner-takes-all）法则确定分类结果。</p>
<p>模型生成样例代码：</p>
<pre><code>```python
n_h = 5 # hidden layer nodes number
net = buildNetwork(19, n_h, 2, outclass = SoftmaxLayer)
```
</code></pre>

</li>
<li>
<p>标准BP算法学习神经网络：</p>
<p>样例代码：</p>
<pre><code>```python
trainer = BackpropTrainer(net, trndata)
trainer.trainEpochs(1)
```
</code></pre>

</li>
<li>
<p>累积BP算法学习神经网络样例代码（50次迭代）：</p>
<p>样例代码：</p>
<pre><code>```python
trainer = BackpropTrainer(net, trndata, batchlearning=True)
trainer.trainEpochs(50) 
```
</code></pre>

<p>此外还可以绘制出累积BP算法参数学习过程的收敛曲线，<a href="https://github.com/PY131/Machine-Learning_ZhouZhihua/blob/master/Ch5_neural_network/5.5_BP/src/BP_network.py">查看详细代码</a>：</p>
</li>
<li>
<p>两种算法比较：</p>
<p>上述两种BP算法实现的代码区别可参考<a href="http://pybrain.org/docs/api/supervised/trainers.html?highlight=testonclassdata#pybrain.supervised.trainers.BackpropTrainer.testOnClassData">PyBrain官网: trainers – - Supervised Training for Networks and other Modules</a></p>
<p>进行一次训练，然后基于测试集预测，得出两种方法的预测精度如下：</p>
<pre><code>标准BP算法： epoch:    1  test error: 50.00%
累积BP算法： epoch:   50  test error: 25.00%
</code></pre>

<p>可以看出，本次实验累积BP算法优于前者，但一次实验说服力不够，于是我们进行多次实验得出预测结果平均精度比较如下：</p>
<pre><code>标准BP算法：
25.00%  75.00%  75.00%  75.00%  50.00%  50.00% ... 
average error rate: 47.50%

累积BP算法：
25.00%  75.00%  50.00%  50.00%  25.00%  50.00% ...
average error rate: 38.75%
</code></pre>

<p>从结果可以看出，累积BP算法精度总体还是优于标准BP算法。但在实验过程中我们注意到，累积BP算法的运行时间远大于标准BP算法。</p>
<p>进一步地，我们注意到，由于数据集限制（样本量太少），模型精度很差。</p>
</li>
</ol>
<h3>4.参考</h3>
<p>本文涉及到的一些重要参考如下：</p>
<ul>
<li><a href="http://pybrain.org/docs/index.html">Pybrain官网</a></li>
<li><a href="http://blog.csdn.net/snoopy_yuan/article/details/70170706">神经网络基础 - PyBrain机器学习包的使用</a></li>
</ul>
<hr />

</body>
</html>
<!-- This document was created with MarkdownPad, the Markdown editor for Windows (http://markdownpad.com) -->