-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathindex.html
341 lines (242 loc) · 19.6 KB
/
index.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
<!DOCTYPE html>
<html>
<head><meta name="generator" content="Hexo 3.8.0">
<meta charset="utf-8">
<title>machine-learning</title>
<meta name="viewport" content="width=device-width, initial-scale=1, maximum-scale=1">
<meta property="og:type" content="website">
<meta property="og:title" content="machine-learning">
<meta property="og:url" content="https://cxchen100.github.io/index.html">
<meta property="og:site_name" content="machine-learning">
<meta property="og:locale" content="default">
<meta name="twitter:card" content="summary">
<meta name="twitter:title" content="machine-learning">
<link rel="alternate" href="/machine-learning.github.io/atom.xml" title="machine-learning" type="application/atom+xml">
<link rel="icon" href="/favicon.png">
<link href="//fonts.googleapis.com/css?family=Source+Code+Pro" rel="stylesheet" type="text/css">
<link rel="stylesheet" href="/machine-learning.github.io/css/style.css">
</head>
</html>
<body>
<div id="container">
<div id="wrap">
<header id="header">
<div id="banner"></div>
<div id="header-outer" class="outer">
<div id="header-title" class="inner">
<h1 id="logo-wrap">
<a href="/machine-learning.github.io/" id="logo">machine-learning</a>
</h1>
</div>
<div id="header-inner" class="inner">
<nav id="main-nav">
<a id="main-nav-toggle" class="nav-icon"></a>
<a class="main-nav-link" href="/machine-learning.github.io/">Home</a>
<a class="main-nav-link" href="/machine-learning.github.io/archives">Archives</a>
</nav>
<nav id="sub-nav">
<a id="nav-rss-link" class="nav-icon" href="/machine-learning.github.io/atom.xml" title="RSS Feed"></a>
<a id="nav-search-btn" class="nav-icon" title="Search"></a>
</nav>
<div id="search-form-wrap">
<form action="//google.com/search" method="get" accept-charset="UTF-8" class="search-form"><input type="search" name="q" class="search-form-input" placeholder="Search"><button type="submit" class="search-form-submit"></button><input type="hidden" name="sitesearch" value="https://cxchen100.github.io"></form>
</div>
</div>
</div>
</header>
<div class="outer">
<section id="main">
<article id="post-embedding" class="article article-type-post" itemscope itemprop="blogPost">
<div class="article-meta">
<a href="/machine-learning.github.io/2019/10/25/embedding/" class="article-date">
<time datetime="2019-10-25T08:37:47.196Z" itemprop="datePublished">2019-10-25</time>
</a>
</div>
<div class="article-inner">
<div class="article-entry" itemprop="articleBody">
<p>##Embedding</p>
<p>###概念<br>word2Vec ==> item2Vec</p>
<p>###玩转向量<br><a href="https://cloud.tencent.com/developer/article/1120735" target="_blank" rel="noopener">https://cloud.tencent.com/developer/article/1120735</a></p>
</div>
<footer class="article-footer">
<a data-url="https://cxchen100.github.io/2019/10/25/embedding/" data-id="ckodyizw600020s9k15vnpx6m" class="article-share-link">Share</a>
</footer>
</div>
</article>
<article id="post-machine-learning" class="article article-type-post" itemscope itemprop="blogPost">
<div class="article-meta">
<a href="/machine-learning.github.io/2019/03/13/machine-learning/" class="article-date">
<time datetime="2019-03-13T12:13:30.888Z" itemprop="datePublished">2019-03-13</time>
</a>
</div>
<div class="article-inner">
<header class="article-header">
<h1 itemprop="name">
<a class="article-title" href="/machine-learning.github.io/2019/03/13/machine-learning/">Machine-learning</a>
</h1>
</header>
<div class="article-entry" itemprop="articleBody">
<h2 id="机器学习"><a href="#机器学习" class="headerlink" title="机器学习"></a>机器学习</h2><p>机器学习(Machine Learning, ML)是一门多领域交叉学科,涉及概率论、统计学、逼近论、凸分析、算法复杂度理论等多门学科。专门研究计算机怎样模拟或实现人类的学习行为,以获取新的知识或技能,重新组织已有的知识结构使之不断改善自身的性能。<br>它是人工智能的核心,是使计算机具有智能的根本途径,其应用遍及人工智能的各个领域,它主要使用归纳、综合而不是演绎。【来自百度百科】</p>
<p>理解:<br>有数据<br>有模型<br>有方法</p>
<figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br></pre></td><td class="code"><pre><span class="line">* 数据:业务数据、标签数据等(大数据)</span><br><span class="line">* 模型:传统机器学习模型、神经网络深度学习模型</span><br><span class="line">* 方法:特征提取、超参调优、指标判断</span><br></pre></td></tr></table></figure>
<h2 id="特征工程"><a href="#特征工程" class="headerlink" title="特征工程"></a><a href="http://www.huaxiaozhuan.com/%E7%BB%9F%E8%AE%A1%E5%AD%A6%E4%B9%A0/chapters/8_feature_selection.html" target="_blank" rel="noopener">特征工程</a></h2><h3 id="机器学习指标"><a href="#机器学习指标" class="headerlink" title="机器学习指标"></a>机器学习指标</h3><p>####<br>| | actual Positive | actual negative |<br>|:————- |:—————:| ————-:|<br>| pre Positve | TP | FP |<br>| pre Negative | FN | TN |</p>
<ul>
<li>Recall(TPR): TP/ (TP + FN)</li>
<li>Precision: TP / (TP + TP)</li>
<li>FPR: FP/(FP + TN)</li>
<li>AUC:以(FPR, TPR)为坐标的趋势图下半部分的面积</li>
<li>Accuray: (TP + TN)/ (ALL)</li>
</ul>
<p><a href="https://www.cnblogs.com/sddai/p/5696870.html" target="_blank" rel="noopener">指标科普</a></p>
<p><a href="https://blog.csdn.net/u013006675/article/details/81589782" target="_blank" rel="noopener">查准和召回率平衡</a></p>
<h2 id="模型"><a href="#模型" class="headerlink" title="模型"></a>模型</h2><h3 id="Logical-Regression"><a href="#Logical-Regression" class="headerlink" title="Logical Regression"></a>Logical Regression</h3><h3 id="Tensorflow"><a href="#Tensorflow" class="headerlink" title="Tensorflow"></a>Tensorflow</h3><h4 id="关键函数参数理解"><a href="#关键函数参数理解" class="headerlink" title="关键函数参数理解"></a>关键函数参数理解</h4><figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br></pre></td><td class="code"><pre><span class="line">tf.estimator.EvalSpec(</span><br><span class="line"> cls,</span><br><span class="line"> input_fn,</span><br><span class="line"> steps=100,</span><br><span class="line"> name=None,</span><br><span class="line"> hooks=None,</span><br><span class="line"> exporters=None,</span><br><span class="line"> start_delay_secs=120,</span><br><span class="line"> throttle_secs=600</span><br><span class="line">)</span><br><span class="line"></span><br><span class="line">- steps: Int. Positive number of steps for which to evaluate model. If None, evaluates until input_fn raises an end-of-input exception. See Estimator.evaluate for details.</span><br><span class="line">- exporters: Iterable of Exporters, or a single one, or None. exporters will be invoked after each evaluation.</span><br><span class="line">- start_delay_secs: Int. Start evaluating after waiting for this many seconds.</span><br><span class="line">- throttle_secs: Int. Do not re-evaluate unless the last evaluation was started at least this many seconds ago. Of course, evaluation does not occur if no new checkpoints are available, hence, this is the minimum.</span><br></pre></td></tr></table></figure>
<h3 id="模型调优"><a href="#模型调优" class="headerlink" title="模型调优"></a>模型调优</h3><blockquote>
<p>准则</p>
<blockquote>
<ul>
<li>训练集loss大 + 测试集loss大 => 高偏差 => 增加模型复杂度(多项式、增加layer层数和每层的节点数)</li>
</ul>
</blockquote>
</blockquote>
<blockquote>
<blockquote>
<ul>
<li>训练集loss小 + 测试集loss大 => 高方差 => 收集更多数据、regularization</li>
</ul>
</blockquote>
</blockquote>
<p>ddd</p>
<blockquote>
<p>超参</p>
<blockquote>
<ul>
<li>迭代次数</li>
<li>不同的激活函数</li>
<li>layer层</li>
<li>每层node数 </li>
</ul>
</blockquote>
</blockquote>
<blockquote>
<p>影响迭代速度</p>
<blockquote>
<ul>
<li>学习率</li>
<li>optimizer: bgd、sgd、mbgd</li>
</ul>
</blockquote>
</blockquote>
<h2 id="个人总结"><a href="#个人总结" class="headerlink" title="个人总结"></a>个人总结</h2><ul>
<li>batch_size不应该过低和过高可能导致收敛问题。<br>在一定范围内,不应该明显增加auc</li>
</ul>
<h2 id="笔记"><a href="#笔记" class="headerlink" title="笔记"></a>笔记</h2><p><a href="https://blog.csdn.net/dearwind153/article/details/69483190" target="_blank" rel="noopener">batch size设置与影响</a></p>
<figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br></pre></td><td class="code"><pre><span class="line">深度学习的优化算法,说白了就是梯度下降。每次的参数更新有两种方式。</span><br><span class="line"></span><br><span class="line">第一种,遍历全部数据集算一次损失函数,然后算函数对各个参数的梯度,更新梯度。这种方法每更新一次参数都要把数据集里的所有样本都看一遍,计算量开销大,计算速度慢,不支持在线学习,这称为Batch gradient descent,批梯度下降。</span><br><span class="line"></span><br><span class="line">另一种,每看一个数据就算一下损失函数,然后求梯度更新参数,这个称为随机梯度下降,stochastic gradient descent。这个方法速度比较快,但是收敛性能不太好,可能在最优点附近晃来晃去,hit不到最优点。两次参数的更新也有可能互相抵消掉,造成目标函数震荡的比较剧烈。</span><br><span class="line"></span><br><span class="line">为了克服两种方法的缺点,现在一般采用的是一种折中手段,mini-batch gradient decent,小批的梯度下降,这种方法把数据分为若干个批,按批来更新参数,这样,一个批中的一组数据共同决定了本次梯度的方向,下降起来就不容易跑偏,减少了随机性。另一方面因为批的样本数与整个数据集相比小了很多,计算量也不是很大。</span><br></pre></td></tr></table></figure>
<p>逻辑回归为什么要选用logloss</p>
<ul>
<li><p><a href="https://www.cnblogs.com/stAr-1/p/9020537.html" target="_blank" rel="noopener">1</a></p>
</li>
<li><p><a href="https://blog.csdn.net/aliceyangxi1987/article/details/80532586" target="_blank" rel="noopener">2</a></p>
</li>
</ul>
<p><a href="https://blog.csdn.net/young951023/article/details/78702479" target="_blank" rel="noopener">求损失函数时候的最大似然估计
</a></p>
<h2 id="资料"><a href="#资料" class="headerlink" title="资料"></a>资料</h2><p><a href="http://www.cnblogs.com/xing901022/p/9692348.html" target="_blank" rel="noopener">美团《机器学习实践》</a></p>
<p><a href="http://wiki.jikexueyuan.com/project/tensorflow-zh/tutorials/mnist_tf.html" target="_blank" rel="noopener">tensorflow—mnist入手</a></p>
<p><a href="https://me.csdn.net/xs_211314" target="_blank" rel="noopener">神经网络博客</a></p>
<p><a href="http://www.deeplearningbook.org/" target="_blank" rel="noopener">http://www.deeplearningbook.org/</a></p>
<p><a href="https://github.com/huaxz1986?tab=repositories" target="_blank" rel="noopener">大牛github</a></p>
<p><a href="https://www.kaggle.com/" target="_blank" rel="noopener">https://www.kaggle.com/</a></p>
<p><a href="https://zhuanlan.zhihu.com/p/35083779" target="_blank" rel="noopener">分布式训练入门介绍</a></p>
<p><a href="https://jinzequn.github.io/2017/12/01/tensorflow-mulit-gpus/" target="_blank" rel="noopener">单机多卡</a></p>
<p><a href="test">test</a></p>
<p><a href="https://www.twblogs.net/a/5c64fdc8bd9eee06ef3768b9/zh-cn" target="_blank" rel="noopener">模型调优</a></p>
<p><a href="http://crafet.github.io/2015/11/26/understand-auc-and-ctr/" target="_blank" rel="noopener">广告ctr预估的AUC</a></p>
<p><a href="https://blog.csdn.net/heyongluoyao8/article/details/49408131" target="_blank" rel="noopener">样本不平衡问题解决方法</a></p>
<p><a href="http://projector.tensorflow.org" target="_blank" rel="noopener">embedding 演示</a></p>
<h2 id="其他"><a href="#其他" class="headerlink" title="其他"></a>其他</h2><p><a href="https://benchmarksgame-team.pages.debian.net/benchmarksgame/faster/go.html" target="_blank" rel="noopener">https://benchmarksgame-team.pages.debian.net/benchmarksgame/faster/go.html</a></p>
<p><a href="https://zhuanlan.zhihu.com/p/51819048" target="_blank" rel="noopener">spark tensorflow</a></p>
<p><a href="https://blog.csdn.net/v_JULY_v/article/details/81410574" target="_blank" rel="noopener">xgboost</a></p>
<p><a href="https://pypi.org/project/apache-beam/" target="_blank" rel="noopener">pip库查询</a></p>
<p>##gcloud<br><a href="https://cloud.google.com/solutions/machine-learning/data-preprocessing-for-ml-with-tf-transform-pt2?hl=zh-cn" target="_blank" rel="noopener">特征处理、生成tfRecord、训练</a></p>
<p><a href="https://cloud.google.com/storage/docs/composite-objects" target="_blank" rel="noopener">gsutil上传并行</a></p>
<p>##随机<br><a href="https://blog.csdn.net/sfdev/article/list/2?" target="_blank" rel="noopener">https://blog.csdn.net/sfdev/article/list/2?</a></p>
</div>
<footer class="article-footer">
<a data-url="https://cxchen100.github.io/2019/03/13/machine-learning/" data-id="ckodyizw400010s9khsz2btgp" class="article-share-link">Share</a>
</footer>
</div>
</article>
<article id="post-hello-world" class="article article-type-post" itemscope itemprop="blogPost">
<div class="article-meta">
<a href="/machine-learning.github.io/2019/03/13/hello-world/" class="article-date">
<time datetime="2019-03-13T08:02:21.079Z" itemprop="datePublished">2019-03-13</time>
</a>
</div>
<div class="article-inner">
<header class="article-header">
<h1 itemprop="name">
<a class="article-title" href="/machine-learning.github.io/2019/03/13/hello-world/">Hello World</a>
</h1>
</header>
<div class="article-entry" itemprop="articleBody">
<p>Welcome to <a href="https://hexo.io/" target="_blank" rel="noopener">Hexo</a>! This is your very first post. Check <a href="https://hexo.io/docs/" target="_blank" rel="noopener">documentation</a> for more info. If you get any problems when using Hexo, you can find the answer in <a href="https://hexo.io/docs/troubleshooting.html" target="_blank" rel="noopener">troubleshooting</a> or you can ask me on <a href="https://github.com/hexojs/hexo/issues" target="_blank" rel="noopener">GitHub</a>.</p>
<h2 id="Quick-Start"><a href="#Quick-Start" class="headerlink" title="Quick Start"></a>Quick Start</h2><h3 id="Create-a-new-post"><a href="#Create-a-new-post" class="headerlink" title="Create a new post"></a>Create a new post</h3><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">$ hexo new <span class="string">"My New Post"</span></span><br></pre></td></tr></table></figure>
<p>More info: <a href="https://hexo.io/docs/writing.html" target="_blank" rel="noopener">Writing</a></p>
<h3 id="Run-server"><a href="#Run-server" class="headerlink" title="Run server"></a>Run server</h3><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">$ hexo server</span><br></pre></td></tr></table></figure>
<p>More info: <a href="https://hexo.io/docs/server.html" target="_blank" rel="noopener">Server</a></p>
<h3 id="Generate-static-files"><a href="#Generate-static-files" class="headerlink" title="Generate static files"></a>Generate static files</h3><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">$ hexo generate</span><br></pre></td></tr></table></figure>
<p>More info: <a href="https://hexo.io/docs/generating.html" target="_blank" rel="noopener">Generating</a></p>
<h3 id="Deploy-to-remote-sites"><a href="#Deploy-to-remote-sites" class="headerlink" title="Deploy to remote sites"></a>Deploy to remote sites</h3><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">$ hexo deploy</span><br></pre></td></tr></table></figure>
<p>More info: <a href="https://hexo.io/docs/deployment.html" target="_blank" rel="noopener">Deployment</a></p>
</div>
<footer class="article-footer">
<a data-url="https://cxchen100.github.io/2019/03/13/hello-world/" data-id="ckodyizvv00000s9kvbw710ug" class="article-share-link">Share</a>
</footer>
</div>
</article>
</section>
<aside id="sidebar">
<div class="widget-wrap">
<h3 class="widget-title">Archives</h3>
<div class="widget">
<ul class="archive-list"><li class="archive-list-item"><a class="archive-list-link" href="/machine-learning.github.io/archives/2019/10/">October 2019</a></li><li class="archive-list-item"><a class="archive-list-link" href="/machine-learning.github.io/archives/2019/03/">March 2019</a></li></ul>
</div>
</div>
<div class="widget-wrap">
<h3 class="widget-title">Recent Posts</h3>
<div class="widget">
<ul>
<li>
<a href="/machine-learning.github.io/2019/10/25/embedding/">(no title)</a>
</li>
<li>
<a href="/machine-learning.github.io/2019/03/13/machine-learning/">Machine-learning</a>
</li>
<li>
<a href="/machine-learning.github.io/2019/03/13/hello-world/">Hello World</a>
</li>
</ul>
</div>
</div>
</aside>
</div>
<footer id="footer">
<div class="outer">
<div id="footer-info" class="inner">
© 2021 John Doe<br>
Powered by <a href="http://hexo.io/" target="_blank">Hexo</a>
</div>
</div>
</footer>
</div>
<nav id="mobile-nav">
<a href="/machine-learning.github.io/" class="mobile-nav-link">Home</a>
<a href="/machine-learning.github.io/archives" class="mobile-nav-link">Archives</a>
</nav>
<script src="//ajax.googleapis.com/ajax/libs/jquery/2.0.3/jquery.min.js"></script>
<link rel="stylesheet" href="/machine-learning.github.io/fancybox/jquery.fancybox.css">
<script src="/machine-learning.github.io/fancybox/jquery.fancybox.pack.js"></script>
<script src="/machine-learning.github.io/js/script.js"></script>
</div>
</body>
</html>