-
Notifications
You must be signed in to change notification settings - Fork 3
/
Copy pathbuild-our-first-neural-network-for-audio-processing.html
588 lines (482 loc) · 57 KB
/
build-our-first-neural-network-for-audio-processing.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
<!DOCTYPE html>
<html lang="en">
<head>
<script>
window.dataLayer = window.dataLayer || [];
function gtag(){dataLayer.push(arguments);}
gtag('js', new Date());
gtag('config', 'UA-112480279-1');
</script>
<meta charset="utf-8">
<title>channelCS - Build our first Neural Network for Audio Processing</title>
<meta name="description" content="">
<meta name="author" content="channelCS">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<!-- Le HTML5 shim, for IE6-8 support of HTML elements -->
<!--[if lt IE 9]>
<script src="/theme/html5.js"></script>
<![endif]-->
<!-- Le styles -->
<link href="/theme/bootstrap.min.css" rel="stylesheet">
<link href="/theme/bootstrap.min.responsive.css" rel="stylesheet">
<link href="/theme/local.css" rel="stylesheet">
<link href="/theme/pygments.css" rel="stylesheet">
<!-- So Firefox can bookmark->"abo this site" -->
<link href="/feeds/all.atom.xml" rel="alternate" title="channelCS" type="application/atom+xml">
</head>
<body>
<div class="navbar">
<div class="navbar-inner">
<div class="container">
<a class="btn btn-navbar" data-toggle="collapse" data-target=".nav-collapse">
<span class="icon-bar"></span>
<span class="icon-bar"></span>
<span class="icon-bar"></span>
</a>
<a class="brand" href="">channelCS</a>
<div class="nav-collapse">
<ul class="nav">
</ul>
</div>
</div>
</div>
</div>
<div class="container">
<div class="content">
<div class="row">
<div class="span9">
<div class='article'>
<div class="content-title">
<h1>Build our first Neural Network for Audio Processing</h1>
Mon 15 January 2018
by <a class="url fn" href="/author/aditya-arora.html">Aditya Arora</a>
</div>
<div><p>Welcome to this post which guides you through the working of a Deep Neural Network in Audio Processing.
Prerequisites:</p>
<ul>
<li>Up and Running with Keras</li>
<li>Implementing DNN using Keras</li>
</ul>
<h2>What is different in Audio?</h2>
<p>A standard deep learning model passes the arrays of text or images directly to the Deep Neural Network or Convolution Neural Network and the rest is done by the model itself.</p>
<p>As far as Audio is concerned, we first extract features which are then passed to the model for training.</p>
<p>In this tutorial, you will discover how to develop a multichannel convolutional neural network for Acoustic Scene Classification the NAR dataset.</p>
<h2>1. Understanding the Dataset</h2>
<p>We shall be using the NAR dataset which can be downloaded from this <a href="https://team.inria.fr/perception/nard/">Link</a> to the dataset. The data are freely accessible for scientific research purposes and for non-commercial applications.</p>
<p>NAR is a dataset of audio recordings made with the humanoid robot Nao in real-world conditions for sound recognition benchmarking.</p>
<h3>1.a. Audio Characteristics</h3>
<p>There are certain parameters in audio which must be considered. These tell us about how and under what conditions were the recordings made for the dataset. The audio for the NAR Dataset has the following characteristics</p>
<ul>
<li>recorded with low-quality sensors (300 Hz – 18 kHz bandpass)</li>
<li>suffering from typical fan noise from the robot’s internal hardware</li>
<li>recorded in multiple real domestic environments (no special acoustic characteristics, reverberations, the presence of multiple sound sources and unknown locations)</li>
</ul>
<h3>1.b. Dataset Characteristics</h3>
<p>Now comes the details of the dataset files. These are important to consider as we have to convert everything in arrays and pass it on to the model.</p>
<p>The dataset is organized as follows:</p>
<ul>
<li>Each class is represented by a folder containing all the audio files labeled with the class.</li>
<li>The name of a folder is the name of the class attached. The name of an audio file is “foldername$id.wav” where $id is an incremental identifier starting at 1.</li>
<li>Each audio file is provided in a WAV format (mono signal, 48kHz sampling rate and 16 bits per sample).</li>
<li>42 different class for 852 sounds have been recorded and organized.</li>
<li>We shall consider four labels <strong>Kitchen</strong>, <strong>Office</strong>, <strong>Nonverbal</strong>, and <strong>Speech</strong>.</li>
</ul>
<h2>2. Dataset Manipulation</h2>
<p>After downloading, we are going to extract it in a folder named <code>NAR_dataset</code>. The <code>tree</code> looks something like</p>
<div class="highlight"><pre><span></span>└───NAR_dataset
├───alarmfridge
├───alarmmicrowave
├───...
├───zipone
└───ziptwo
</pre></div>
<p>We are going to make a file so that the directory structure changes to</p>
<table>
<thead>
<tr>
<th>Scenarios</th>
<th>Classes</th>
</tr>
</thead>
<tbody>
<tr>
<td>Kitchen</td>
<td>Eating, Choking, Cuttlery, Fill a glass, Running the tap, Open/close a drawer,Move a chair, Open microwave,Close microwave, Microwave, Fridge, Toaster</td>
</tr>
<tr>
<td>Office</td>
<td>Door Close, Open, Key, Knock, Ripped Paper, Zip, (another) Zip</td>
</tr>
<tr>
<td>Nonverbal</td>
<td>Fingerclap, Handclap, Tongue Clic</td>
</tr>
<tr>
<td>Speech</td>
<td>1,2,3,4,5,6,7,8,9,10, Hello, Left, Right, Turn, Move, Stop, Nao, Yes, No, What</td>
</tr>
</tbody>
</table>
<h3>2.a. Making the config file</h3>
<p>We are now making a config file for feature extraction in which we place all the details about the dataset. We call it <strong><em>config.py</em></strong>.
There is no such rule of making a config fie as everything can be placed in a single place but that creates a lot of confusion when it comes to sharing codes between multiple developers.</p>
<div class="highlight"><pre><span></span><span class="kn">import</span> <span class="nn">os</span>
<span class="k">def</span> <span class="nf">CreateFolder</span><span class="p">(</span> <span class="n">fd</span> <span class="p">):</span>
<span class="k">if</span> <span class="ow">not</span> <span class="n">os</span><span class="o">.</span><span class="n">path</span><span class="o">.</span><span class="n">exists</span><span class="p">(</span><span class="n">fd</span><span class="p">):</span>
<span class="n">os</span><span class="o">.</span><span class="n">makedirs</span><span class="p">(</span><span class="n">fd</span><span class="p">)</span>
</pre></div>
<p>The <code>CreateFolder</code> comes in handy while creating multiple folders. It checks whether the folder is already present, if not it creates that folder.</p>
<div class="highlight"><pre><span></span><span class="kn">from</span> <span class="nn">shutil</span> <span class="kn">import</span> <span class="n">copytree</span>
<span class="k">def</span> <span class="nf">MoveFolder</span><span class="p">(</span><span class="n">source</span><span class="p">,</span><span class="n">destination</span><span class="p">):</span>
<span class="k">try</span><span class="p">:</span>
<span class="n">copytree</span><span class="p">(</span><span class="n">source</span><span class="p">,</span><span class="n">destination</span><span class="p">)</span>
<span class="k">except</span><span class="p">:</span>
<span class="k">print</span><span class="p">(</span><span class="s2">"Oops! Folder already exists..."</span><span class="p">)</span>
<span class="k">return</span>
</pre></div>
<p>The <code>MoveFolder</code> function will be used when we want to move certain folders from one diretory to another.</p>
<div class="highlight"><pre><span></span><span class="n">dir_path</span> <span class="o">=</span> <span class="n">os</span><span class="o">.</span><span class="n">path</span><span class="o">.</span><span class="n">dirname</span><span class="p">(</span><span class="n">os</span><span class="o">.</span><span class="n">path</span><span class="o">.</span><span class="n">realpath</span><span class="p">(</span><span class="vm">__file__</span><span class="p">))</span>
<span class="n">dir_path</span><span class="o">=</span><span class="n">dir_path</span><span class="o">.</span><span class="n">replace</span><span class="p">(</span><span class="s1">'</span><span class="se">\\</span><span class="s1">'</span><span class="p">,</span> <span class="s1">'/'</span><span class="p">)</span>
<span class="n">orig_dataset_path</span><span class="o">=</span><span class="n">dir_path</span><span class="o">+</span><span class="s1">'/NAR_dataset/*'</span>
</pre></div>
<p>We define where does all the audio files reside. In our case, we have put all our files under <code>Nar_dataset</code> directory.</p>
<div class="highlight"><pre><span></span><span class="n">kitchen_array</span><span class="o">=</span><span class="p">[</span><span class="s1">'alarmfridge'</span><span class="p">,</span> <span class="s1">'alarmmicrowave'</span><span class="p">,</span> <span class="s1">'chair'</span><span class="p">,</span> <span class="s1">'closemicrowave'</span><span class="p">,</span> <span class="s1">'cuttlery'</span><span class="p">,</span> <span class="s1">'drawer'</span><span class="p">,</span> <span class="s1">'eat'</span><span class="p">,</span> <span class="s1">'openmicrowave'</span><span class="p">,</span> <span class="s1">'strugling'</span><span class="p">,</span> <span class="s1">'tap'</span><span class="p">,</span> <span class="s1">'toaster'</span><span class="p">,</span> <span class="s1">'water'</span><span class="p">]</span>
<span class="n">nonverbal_array</span><span class="o">=</span><span class="p">[</span><span class="s1">'fingerclap'</span><span class="p">,</span> <span class="s1">'handclap'</span><span class="p">,</span> <span class="s1">'tongue'</span><span class="p">]</span>
<span class="n">office_array</span><span class="o">=</span><span class="p">[</span><span class="s1">'doorclose'</span><span class="p">,</span> <span class="s1">'doorkey'</span><span class="p">,</span> <span class="s1">'doorknock'</span><span class="p">,</span> <span class="s1">'dooropen'</span><span class="p">,</span> <span class="s1">'paper'</span><span class="p">,</span> <span class="s1">'zipone'</span><span class="p">,</span> <span class="s1">'ziptwo'</span><span class="p">]</span>
<span class="n">speech_array</span><span class="o">=</span><span class="p">[</span><span class="s1">'eight'</span><span class="p">,</span> <span class="s1">'five'</span><span class="p">,</span> <span class="s1">'four'</span><span class="p">,</span> <span class="s1">'hello'</span><span class="p">,</span> <span class="s1">'left'</span><span class="p">,</span> <span class="s1">'move'</span><span class="p">,</span> <span class="s1">'nao'</span><span class="p">,</span> <span class="s1">'nine'</span><span class="p">,</span> <span class="s1">'no'</span><span class="p">,</span> <span class="s1">'one'</span><span class="p">,</span> <span class="s1">'right'</span><span class="p">,</span> <span class="s1">'seven'</span><span class="p">,</span> <span class="s1">'six'</span><span class="p">,</span> <span class="s1">'stop'</span><span class="p">,</span> <span class="s1">'ten'</span><span class="p">,</span> <span class="s1">'three'</span><span class="p">,</span> <span class="s1">'turn'</span><span class="p">,</span> <span class="s1">'two'</span><span class="p">,</span> <span class="s1">'what'</span><span class="p">,</span> <span class="s1">'yes'</span><span class="p">]</span>
</pre></div>
<p>We now define which folder shall be going under which <code>Class Label</code>. This helps in moving the folders in the defined Label.</p>
<div class="highlight"><pre><span></span><span class="n">audio_folder</span><span class="o">=</span><span class="s1">'audios'</span>
<span class="n">kitchen_folder</span> <span class="o">=</span> <span class="n">audio_folder</span> <span class="o">+</span> <span class="s1">'/Kitchen'</span>
<span class="n">nonverbal_folder</span> <span class="o">=</span> <span class="n">audio_folder</span> <span class="o">+</span> <span class="s1">'/Nonverbal'</span>
<span class="n">office_folder</span> <span class="o">=</span> <span class="n">audio_folder</span> <span class="o">+</span> <span class="s1">'/Office'</span>
<span class="n">speech_folder</span> <span class="o">=</span> <span class="n">audio_folder</span> <span class="o">+</span> <span class="s1">'/Speech'</span>
</pre></div>
<p>We now define the folder names for each Class Label. <strong>Remember</strong>, we have not made these folders yet.
the <code>audios</code> is the main folder and all the audio files shall reside in the same folder.</p>
<h3>2.b. Making the file manipulator file</h3>
<p>We are now going to make a file manipulator file the actual moving of files shall take place. We call it <strong><em>file_manipulator.py</em></strong>.</p>
<h4>2.b.i. Moving folders under Class Labels</h4>
<div class="highlight"><pre><span></span><span class="kn">import</span> <span class="nn">glob</span>
<span class="kn">import</span> <span class="nn">config</span> <span class="kn">as</span> <span class="nn">cfg</span>
<span class="kn">import</span> <span class="nn">os</span>
<span class="kn">from</span> <span class="nn">string</span> <span class="kn">import</span> <span class="n">digits</span>
</pre></div>
<p>The <code>glob</code> function is used to read all files inside a specified folder.
We have also imported out <code>config</code> file here.</p>
<div class="highlight"><pre><span></span><span class="n">path</span><span class="o">=</span><span class="n">cfg</span><span class="o">.</span><span class="n">audio_folder</span>
<span class="n">cfg</span><span class="o">.</span><span class="n">CreateFolder</span><span class="p">(</span><span class="n">cfg</span><span class="o">.</span><span class="n">audio_folder</span><span class="p">)</span>
<span class="n">cfg</span><span class="o">.</span><span class="n">CreateFolder</span><span class="p">(</span><span class="n">cfg</span><span class="o">.</span><span class="n">kitchen_folder</span><span class="p">)</span>
<span class="n">cfg</span><span class="o">.</span><span class="n">CreateFolder</span><span class="p">(</span><span class="n">cfg</span><span class="o">.</span><span class="n">nonverbal_folder</span><span class="p">)</span>
<span class="n">cfg</span><span class="o">.</span><span class="n">CreateFolder</span><span class="p">(</span><span class="n">cfg</span><span class="o">.</span><span class="n">office_folder</span><span class="p">)</span>
<span class="n">cfg</span><span class="o">.</span><span class="n">CreateFolder</span><span class="p">(</span><span class="n">cfg</span><span class="o">.</span><span class="n">speech_folder</span><span class="p">)</span>
</pre></div>
<p>We define the path of our audio files which can directly be taken from the config file.
We are are making folders for our audio and all the class labels.</p>
<div class="highlight"><pre><span></span><span class="k">for</span> <span class="n">f</span> <span class="ow">in</span> <span class="n">glob</span><span class="o">.</span><span class="n">glob</span><span class="p">(</span><span class="n">cfg</span><span class="o">.</span><span class="n">orig_dataset_path</span><span class="p">):</span>
<span class="n">g</span><span class="o">=</span><span class="n">f</span><span class="o">.</span><span class="n">split</span><span class="p">(</span><span class="s1">'</span><span class="se">\\</span><span class="s1">'</span><span class="p">)[</span><span class="o">-</span><span class="mi">1</span><span class="p">]</span>
<span class="k">if</span> <span class="n">g</span> <span class="ow">in</span> <span class="n">cfg</span><span class="o">.</span><span class="n">kitchen_array</span><span class="p">:</span>
<span class="n">cfg</span><span class="o">.</span><span class="n">MoveFolder</span><span class="p">(</span><span class="n">f</span><span class="p">,</span> <span class="n">cfg</span><span class="o">.</span><span class="n">kitchen_folder</span><span class="o">+</span><span class="s1">'/'</span><span class="o">+</span><span class="n">g</span><span class="p">)</span>
<span class="k">elif</span> <span class="n">g</span> <span class="ow">in</span> <span class="n">cfg</span><span class="o">.</span><span class="n">nonverbal_array</span><span class="p">:</span>
<span class="n">cfg</span><span class="o">.</span><span class="n">MoveFolder</span><span class="p">(</span><span class="n">f</span><span class="p">,</span> <span class="n">cfg</span><span class="o">.</span><span class="n">nonverbal_folder</span><span class="o">+</span><span class="s1">'/'</span><span class="o">+</span><span class="n">g</span><span class="p">)</span>
<span class="k">elif</span> <span class="n">g</span> <span class="ow">in</span> <span class="n">cfg</span><span class="o">.</span><span class="n">office_array</span><span class="p">:</span>
<span class="n">cfg</span><span class="o">.</span><span class="n">MoveFolder</span><span class="p">(</span><span class="n">f</span><span class="p">,</span> <span class="n">cfg</span><span class="o">.</span><span class="n">office_folder</span><span class="o">+</span><span class="s1">'/'</span><span class="o">+</span><span class="n">g</span><span class="p">)</span>
<span class="k">elif</span> <span class="n">g</span> <span class="ow">in</span> <span class="n">cfg</span><span class="o">.</span><span class="n">speech_array</span><span class="p">:</span>
<span class="n">cfg</span><span class="o">.</span><span class="n">MoveFolder</span><span class="p">(</span><span class="n">f</span><span class="p">,</span> <span class="n">cfg</span><span class="o">.</span><span class="n">speech_folder</span><span class="o">+</span><span class="s1">'/'</span><span class="o">+</span><span class="n">g</span><span class="p">)</span>
</pre></div>
<p>This checks where should the folder of certain audio go based in the config file and moves it.</p>
<p>The current directory <strong>structure</strong> looks something like:</p>
<div class="highlight"><pre><span></span>└───NAR_dataset
├───Kitchen
├───Nonverbal
├───Speech
└───Office
</pre></div>
<h4>2.b.ii. Renaming Wav Files</h4>
<p>Great Work! We now have moved all our subfolders under the specified class labels.
We are going to move and rename all our <code>wav files</code> so that they look something like:</p>
<div class="highlight"><pre><span></span>ClassLabel_subtype_filename.wav
</pre></div>
<p>Example:</p>
<div class="highlight"><pre><span></span>Kitchen_alarmfridge_alarmfridge1.wav
</pre></div>
<div class="highlight"><pre><span></span><span class="k">def</span> <span class="nf">move_files</span><span class="p">():</span>
<span class="n">x</span><span class="o">=</span><span class="n">os</span><span class="o">.</span><span class="n">listdir</span><span class="p">(</span><span class="n">path</span><span class="p">)</span>
<span class="k">print</span> <span class="s1">'The folder has {} subfolders'</span><span class="o">.</span><span class="n">format</span><span class="p">(</span><span class="nb">len</span><span class="p">(</span><span class="n">x</span><span class="p">))</span>
<span class="k">for</span> <span class="n">folder</span> <span class="ow">in</span> <span class="n">x</span><span class="p">:</span>
<span class="n">new_path</span><span class="o">=</span><span class="n">path</span><span class="o">+</span><span class="s1">'/'</span><span class="o">+</span><span class="n">folder</span>
<span class="k">if</span> <span class="n">os</span><span class="o">.</span><span class="n">path</span><span class="o">.</span><span class="n">isdir</span><span class="p">(</span><span class="n">new_path</span><span class="p">):</span>
<span class="n">y</span><span class="o">=</span><span class="n">os</span><span class="o">.</span><span class="n">listdir</span><span class="p">(</span><span class="n">new_path</span><span class="p">)</span>
<span class="k">if</span> <span class="n">y</span> <span class="o">==</span> <span class="p">[]:</span>
<span class="k">print</span> <span class="s1">'Empty subfolder:'</span><span class="p">,</span><span class="n">folder</span>
<span class="k">else</span><span class="p">:</span>
<span class="k">for</span> <span class="n">file_</span> <span class="ow">in</span> <span class="n">y</span><span class="p">:</span>
<span class="n">os</span><span class="o">.</span><span class="n">rename</span><span class="p">(</span><span class="n">new_path</span><span class="o">+</span><span class="s1">'/'</span><span class="o">+</span><span class="n">file_</span><span class="p">,</span><span class="n">path</span><span class="o">+</span><span class="s1">'/'</span><span class="o">+</span><span class="n">folder</span><span class="o">+</span><span class="s1">'_'</span><span class="o">+</span><span class="n">file_</span><span class="p">)</span>
<span class="k">if</span> <span class="ow">not</span> <span class="n">os</span><span class="o">.</span><span class="n">listdir</span><span class="p">(</span><span class="n">new_path</span><span class="p">):</span>
<span class="n">os</span><span class="o">.</span><span class="n">rmdir</span><span class="p">(</span><span class="n">new_path</span><span class="p">)</span>
</pre></div>
<p>The function checks for non-empty subfolders and moves it. It then removes the original folder.
We have to run the function two times in order to get to the root directory level.</p>
<div class="highlight"><pre><span></span><span class="k">for</span> <span class="n">f</span> <span class="ow">in</span> <span class="n">glob</span><span class="o">.</span><span class="n">glob</span><span class="p">(</span><span class="n">path</span><span class="o">+</span><span class="s1">'/*'</span><span class="p">):</span>
<span class="n">x</span><span class="o">=</span><span class="n">f</span><span class="o">.</span><span class="n">split</span><span class="p">(</span><span class="s1">'</span><span class="se">\\</span><span class="s1">'</span><span class="p">)[</span><span class="o">-</span><span class="mi">1</span><span class="p">]</span>
<span class="k">if</span> <span class="n">x</span><span class="p">[</span><span class="o">-</span><span class="mi">4</span><span class="p">:]</span><span class="o">!=</span><span class="s1">'.wav'</span><span class="p">:</span>
<span class="n">os</span><span class="o">.</span><span class="n">remove</span><span class="p">(</span><span class="n">path</span><span class="o">+</span><span class="s1">'//'</span><span class="o">+</span><span class="n">x</span><span class="p">)</span>
</pre></div>
<p>This deletes the DS_Store files which are not required.</p>
<h4>2.b.iii. Generating the metafile</h4>
<div class="highlight"><pre><span></span>str1=''
arr1=[]
for f in glob.glob(path+'/*'):
x=f.split('\\')[1]
res = x.translate(None, digits).split('.')[0].split('_')[0]
arr1.append(res)
str1+='audio/'+x+'\t'+res+'\n'
file1 = open("meta.txt","w")
file1.write(str1)
file1.close()
</pre></div>
<p>This fetches all the files and puts them under the <code>meta.txt</code> such that each file corresponds to their class labels.</p>
<h2>3. Lets Code</h2>
<h3>3.a. Feature Extraction</h3>
<p>In case of Audio, we flatten the data and pass it to the layers. We need to encapsulate the statistics of sound and make our model learn faster. IN our case we are using various features such as:
- <strong>MEL filterbanks</strong>: Create a Filterbank matrix to combine FFT bins into Mel-frequency bins.
- <strong>CQT(Constant Q Transform)</strong>: THe Constant-Q-Transform (CQT) is a time-frequency representation where the frequency bins are geometrically spaced and the so called Q-factors (ratios of the center frequencies to bandwidths) of all bins are equal.The CQT essentially a wavelet transform, which means that the frequency resolution is better for low frequencies and the time resolution is better for high frequencies.[4]
- <strong>LOG-MEL(Logarithm - mel)</strong>: We take logarithm of the Filterbank matrix.
- <strong>LOG-MFCC(Logarithm - MFCC )</strong>: A widely used metric for describing timbral characteristics based on the Mel scale. Implemented according to Huang [1], Davis [2], Grierson [3] and the librosa library.</p>
<h4>3.a.i. CQT</h4>
<div class="highlight"><pre><span></span><span class="c1">#Define all features to be extracted</span>
<span class="kn">import</span> <span class="nn">config</span> <span class="kn">as</span> <span class="nn">cfg</span>
<span class="k">def</span> <span class="nf">cqt_lib</span><span class="p">(</span><span class="n">wav_fd</span><span class="p">,</span><span class="n">fe_fd</span><span class="p">):</span>
<span class="n">names</span> <span class="o">=</span> <span class="p">[</span> <span class="n">na</span> <span class="k">for</span> <span class="n">na</span> <span class="ow">in</span> <span class="n">os</span><span class="o">.</span><span class="n">listdir</span><span class="p">(</span><span class="n">wav_fd</span><span class="p">)</span> <span class="k">if</span> <span class="n">na</span><span class="o">.</span><span class="n">endswith</span><span class="p">(</span><span class="s1">'.wav'</span><span class="p">)</span> <span class="p">]</span>
<span class="n">names</span> <span class="o">=</span> <span class="nb">sorted</span><span class="p">(</span><span class="n">names</span><span class="p">)</span>
<span class="k">for</span> <span class="n">na</span> <span class="ow">in</span> <span class="n">names</span><span class="p">:</span>
<span class="n">path</span> <span class="o">=</span> <span class="n">wav_fd</span> <span class="o">+</span> <span class="s1">'/'</span> <span class="o">+</span> <span class="n">na</span>
<span class="n">wav</span><span class="p">,</span> <span class="n">sr</span><span class="o">=</span><span class="n">librosa</span><span class="o">.</span><span class="n">load</span><span class="p">(</span><span class="n">path</span><span class="p">,</span><span class="n">sr</span><span class="o">=</span><span class="mi">44100</span><span class="p">)</span>
<span class="n">cqt</span><span class="o">=</span><span class="n">librosa</span><span class="o">.</span><span class="n">core</span><span class="o">.</span><span class="n">cqt</span><span class="p">(</span><span class="n">y</span><span class="o">=</span><span class="n">wav</span><span class="p">)</span>
<span class="n">out_path</span> <span class="o">=</span> <span class="n">fe_fd</span> <span class="o">+</span> <span class="s1">'/'</span> <span class="o">+</span> <span class="n">na</span><span class="p">[</span><span class="mi">0</span><span class="p">:</span><span class="o">-</span><span class="mi">4</span><span class="p">]</span> <span class="o">+</span> <span class="s1">'.f'</span>
<span class="n">cPickle</span><span class="o">.</span><span class="n">dump</span><span class="p">(</span> <span class="n">cqt</span><span class="p">,</span> <span class="nb">open</span><span class="p">(</span><span class="n">out_path</span><span class="p">,</span> <span class="s1">'wb'</span><span class="p">),</span> <span class="n">protocol</span><span class="o">=</span><span class="n">cPickle</span><span class="o">.</span><span class="n">HIGHEST_PROTOCOL</span> <span class="p">)</span>
<span class="k">if</span> <span class="vm">__name__</span> <span class="o">==</span> <span class="s2">"__main__"</span><span class="p">:</span>
<span class="n">cfg</span><span class="o">.</span><span class="n">CreateFolder</span><span class="p">(</span><span class="s1">'Fe'</span><span class="p">)</span>
<span class="n">cfg</span><span class="o">.</span><span class="n">CreateFolder</span><span class="p">(</span><span class="s1">'Fe/cqt'</span><span class="p">)</span>
<span class="n">cqt_lib</span><span class="p">(</span><span class="n">cfg</span><span class="o">.</span><span class="n">wav_fd</span><span class="p">,</span><span class="n">cfg</span><span class="o">.</span><span class="n">fe_cqt_fd</span><span class="p">)</span>
</pre></div>
<h3>3.b. Simple DNN Model</h3>
<p>We are going to make a simple dnn model and pass on certain parameters which are required for the model.</p>
<div class="highlight"><pre><span></span><span class="c1">#Our files</span>
<span class="kn">import</span> <span class="nn">config</span> <span class="kn">as</span> <span class="nn">cfg</span>
<span class="kn">import</span> <span class="nn">features</span> <span class="kn">as</span> <span class="nn">F</span>
<span class="kn">import</span> <span class="nn">apnahat</span> <span class="kn">as</span> <span class="nn">H</span>
<span class="c1">#Python modules</span>
<span class="kn">import</span> <span class="nn">time</span>
<span class="kn">import</span> <span class="nn">csv</span>
<span class="kn">import</span> <span class="nn">cPickle</span>
<span class="c1">#Data managing modules</span>
<span class="kn">import</span> <span class="nn">matplotlib.pyplot</span> <span class="kn">as</span> <span class="nn">plt</span>
<span class="kn">import</span> <span class="nn">numpy</span> <span class="kn">as</span> <span class="nn">np</span>
<span class="kn">from</span> <span class="nn">sklearn.cross_validation</span> <span class="kn">import</span> <span class="n">KFold</span>
<span class="kn">from</span> <span class="nn">sklearn.metrics</span> <span class="kn">import</span> <span class="n">classification_report</span><span class="p">,</span> <span class="n">accuracy_score</span>
<span class="c1">#Deep Learning Modules</span>
<span class="kn">from</span> <span class="nn">keras.models</span> <span class="kn">import</span> <span class="n">Sequential</span>
<span class="kn">from</span> <span class="nn">keras.layers</span> <span class="kn">import</span> <span class="n">Dense</span><span class="p">,</span><span class="n">Dropout</span>
<span class="kn">from</span> <span class="nn">keras.layers.advanced_activations</span> <span class="kn">import</span> <span class="n">LeakyReLU</span>
<span class="kn">from</span> <span class="nn">keras.utils</span> <span class="kn">import</span> <span class="n">to_categorical</span>
</pre></div>
<p>Whenever we work with machine learning algorithms that use a stochastic process (e.g. random numbers), it is a good idea to set the random number seed.</p>
<p>This is so that you can run the same code again and again and get the same result.
You can initialize the random number generator with any seed you like </p>
<div class="highlight"><pre><span></span><span class="n">np</span><span class="o">.</span><span class="n">random</span><span class="o">.</span><span class="n">seed</span><span class="p">(</span><span class="mi">1234</span><span class="p">)</span>
</pre></div>
<p>Each feature has a separate dimension, we are using cqt which returns features with dimension as 80.</p>
<p>We set the parameters initially based on the feature.</p>
<div class="highlight"><pre><span></span><span class="n">dimension1</span> <span class="o">=</span> <span class="mi">80</span>
<span class="n">dimension2</span> <span class="o">=</span> <span class="n">dimension1</span><span class="o">*</span><span class="mi">10</span>
<span class="n">agg_num</span><span class="o">=</span><span class="mi">10</span>
<span class="n">hop</span><span class="o">=</span><span class="mi">10</span>
<span class="n">feature_text</span><span class="o">=</span><span class="s2">"cqt"</span>
<span class="n">fe_fd</span> <span class="o">=</span> <span class="n">cfg</span><span class="o">.</span><span class="n">fe_cqt_fd</span>
<span class="k">print</span> <span class="s2">"Feature"</span><span class="p">,</span><span class="n">feature_text</span>
</pre></div>
<p>We define all our <code>hyperparameters</code>. Configuring neural networks is difficult because there is no good theory on how to do it.</p>
<p>We must be systematic and explore different configurations and understand what is going on for a given predictive modeling problem.</p>
<div class="highlight"><pre><span></span><span class="n">input_neurons</span><span class="o">=</span><span class="mi">200</span>
<span class="n">dropout</span><span class="o">=</span><span class="mf">0.1</span>
<span class="n">act1</span><span class="o">=</span><span class="s1">'linear'</span>
<span class="n">act2</span><span class="o">=</span><span class="s1">'relu'</span>
<span class="n">act3</span><span class="o">=</span><span class="s1">'sigmoid'</span>
<span class="n">act4</span><span class="o">=</span><span class="s1">'softmax'</span>
<span class="n">epochs</span><span class="o">=</span><span class="mi">20</span>
<span class="n">batchsize</span><span class="o">=</span><span class="mi">20</span>
<span class="n">num_classes</span><span class="o">=</span><span class="mi">4</span>
</pre></div>
<p>We now make a separate function for using the <code>meta</code> file as a base for calling all the features of the <strong>audio</strong> files. This function returns a 3d array as it comes in handy when handling Convolution Neural Networks. </p>
<div class="highlight"><pre><span></span><span class="k">def</span> <span class="nf">GetAllData</span><span class="p">(</span><span class="n">fe_fd</span><span class="p">,</span> <span class="n">csv_file</span><span class="p">,</span> <span class="n">agg_num</span><span class="p">,</span> <span class="n">hop</span><span class="p">):</span>
<span class="c1"># read csv</span>
<span class="k">with</span> <span class="nb">open</span><span class="p">(</span> <span class="n">csv_file</span><span class="p">,</span> <span class="s1">'rb'</span><span class="p">)</span> <span class="k">as</span> <span class="n">f</span><span class="p">:</span>
<span class="n">reader</span> <span class="o">=</span> <span class="n">csv</span><span class="o">.</span><span class="n">reader</span><span class="p">(</span><span class="n">f</span><span class="p">)</span>
<span class="n">lis</span> <span class="o">=</span> <span class="nb">list</span><span class="p">(</span><span class="n">reader</span><span class="p">)</span>
<span class="c1"># init list</span>
<span class="n">X3d_all</span> <span class="o">=</span> <span class="p">[]</span>
<span class="n">y_all</span> <span class="o">=</span> <span class="p">[]</span>
<span class="n">i</span><span class="o">=</span><span class="mi">0</span>
<span class="k">for</span> <span class="n">li</span> <span class="ow">in</span> <span class="n">lis</span><span class="p">:</span>
<span class="c1"># load data</span>
<span class="p">[</span><span class="n">na</span><span class="p">,</span> <span class="n">lb</span><span class="p">]</span> <span class="o">=</span> <span class="n">li</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span><span class="o">.</span><span class="n">split</span><span class="p">(</span><span class="s1">'</span><span class="se">\t</span><span class="s1">'</span><span class="p">)</span>
<span class="n">na</span> <span class="o">=</span> <span class="n">na</span><span class="o">.</span><span class="n">split</span><span class="p">(</span><span class="s1">'/'</span><span class="p">)[</span><span class="mi">1</span><span class="p">][</span><span class="mi">0</span><span class="p">:</span><span class="o">-</span><span class="mi">4</span><span class="p">]</span>
<span class="n">path</span> <span class="o">=</span> <span class="n">fe_fd</span> <span class="o">+</span> <span class="s1">'/'</span> <span class="o">+</span> <span class="n">na</span> <span class="o">+</span> <span class="s1">'.f'</span>
<span class="c1">#i+=1</span>
<span class="c1">#print i</span>
<span class="k">try</span><span class="p">:</span>
<span class="n">X</span> <span class="o">=</span> <span class="n">cPickle</span><span class="o">.</span><span class="n">load</span><span class="p">(</span> <span class="nb">open</span><span class="p">(</span> <span class="n">path</span><span class="p">,</span> <span class="s1">'rb'</span> <span class="p">)</span> <span class="p">)</span>
<span class="k">except</span> <span class="ne">Exception</span> <span class="k">as</span> <span class="n">e</span><span class="p">:</span>
<span class="k">print</span> <span class="s1">'Error while parsing'</span><span class="p">,</span><span class="n">path</span>
<span class="k">continue</span>
<span class="c1"># reshape data to (n_block, n_time, n_freq)</span>
<span class="n">i</span><span class="o">+=</span><span class="mi">1</span>
<span class="k">if</span> <span class="n">i</span><span class="o">%</span><span class="mi">100</span><span class="o">==</span><span class="mi">0</span><span class="p">:</span>
<span class="k">print</span> <span class="s2">"Files Loaded"</span><span class="p">,</span><span class="n">i</span>
<span class="n">X3d</span> <span class="o">=</span> <span class="n">H</span><span class="o">.</span><span class="n">mat_2d_to_3d</span><span class="p">(</span> <span class="n">X</span><span class="p">,</span> <span class="n">agg_num</span><span class="p">,</span> <span class="n">hop</span> <span class="p">)</span>
<span class="n">X3d_all</span><span class="o">.</span><span class="n">append</span><span class="p">(</span> <span class="n">X3d</span> <span class="p">)</span>
<span class="n">y_all</span> <span class="o">+=</span> <span class="p">[</span> <span class="n">cfg</span><span class="o">.</span><span class="n">lb_to_id</span><span class="p">[</span><span class="n">lb</span><span class="p">]</span> <span class="p">]</span> <span class="o">*</span> <span class="nb">len</span><span class="p">(</span> <span class="n">X3d</span> <span class="p">)</span>
<span class="k">print</span> <span class="s1">'All files loaded successfully'</span>
<span class="c1"># concatenate list to array</span>
<span class="n">X3d_all</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">concatenate</span><span class="p">(</span> <span class="n">X3d_all</span> <span class="p">)</span>
<span class="n">y_all</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">array</span><span class="p">(</span> <span class="n">y_all</span> <span class="p">)</span>
<span class="k">return</span> <span class="n">X3d_all</span><span class="p">,</span> <span class="n">y_all</span>
</pre></div>
<p>We call the function to return a 3d array of train X and 1d array of train Y. We now reshape our 3d array into 1d.</p>
<div class="highlight"><pre><span></span><span class="n">tr_X</span><span class="p">,</span> <span class="n">tr_y</span> <span class="o">=</span> <span class="n">GetAllData</span><span class="p">(</span> <span class="n">fe_fd</span><span class="p">,</span> <span class="n">cfg</span><span class="o">.</span><span class="n">meta_csv</span><span class="p">,</span> <span class="n">agg_num</span><span class="p">,</span> <span class="n">hop</span> <span class="p">)</span>
<span class="n">tr_X</span><span class="o">=</span><span class="n">tr_X</span><span class="o">.</span><span class="n">reshape</span><span class="p">(</span><span class="n">tr_X</span><span class="o">.</span><span class="n">shape</span><span class="p">[</span><span class="mi">0</span><span class="p">],</span><span class="n">tr_X</span><span class="o">.</span><span class="n">shape</span><span class="p">[</span><span class="mi">1</span><span class="p">]</span><span class="o">*</span><span class="n">tr_X</span><span class="o">.</span><span class="n">shape</span><span class="p">[</span><span class="mi">2</span><span class="p">])</span>
</pre></div>
<p><strong>Altering the feature arrays</strong></p>
<p>We are using a single function from the <a href="https://github.com/qiuqiangkong/Hat">hat</a> module. We are going to make a separate model for that and call it <strong><em>apnahat.py</em></strong>. The function takes a 2d array as input and returns a 3d array. We shall be using this to pass into our model.</p>
<div class="highlight"><pre><span></span><span class="kn">import</span> <span class="nn">numpy</span> <span class="kn">as</span> <span class="nn">np</span>
<span class="k">def</span> <span class="nf">mat_2d_to_3d</span><span class="p">(</span><span class="n">X</span><span class="p">,</span> <span class="n">agg_num</span><span class="p">,</span> <span class="n">hop</span><span class="p">):</span>
<span class="c1"># pad to at least one block</span>
<span class="n">len_X</span><span class="p">,</span> <span class="n">n_in</span> <span class="o">=</span> <span class="n">X</span><span class="o">.</span><span class="n">shape</span>
<span class="k">if</span> <span class="p">(</span><span class="n">len_X</span> <span class="o"><</span> <span class="n">agg_num</span><span class="p">):</span>
<span class="n">X</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">concatenate</span><span class="p">((</span><span class="n">X</span><span class="p">,</span> <span class="n">np</span><span class="o">.</span><span class="n">zeros</span><span class="p">((</span><span class="n">agg_num</span><span class="o">-</span><span class="n">len_X</span><span class="p">,</span> <span class="n">n_in</span><span class="p">))))</span>
<span class="c1"># agg 2d to 3d</span>
<span class="n">len_X</span> <span class="o">=</span> <span class="nb">len</span><span class="p">(</span><span class="n">X</span><span class="p">)</span>
<span class="n">i1</span> <span class="o">=</span> <span class="mi">0</span>
<span class="n">X3d</span> <span class="o">=</span> <span class="p">[]</span>
<span class="k">while</span> <span class="p">(</span><span class="n">i1</span><span class="o">+</span><span class="n">agg_num</span> <span class="o"><=</span> <span class="n">len_X</span><span class="p">):</span>
<span class="n">X3d</span><span class="o">.</span><span class="n">append</span><span class="p">(</span><span class="n">X</span><span class="p">[</span><span class="n">i1</span><span class="p">:</span><span class="n">i1</span><span class="o">+</span><span class="n">agg_num</span><span class="p">])</span>
<span class="n">i1</span> <span class="o">+=</span> <span class="n">hop</span>
<span class="k">return</span> <span class="n">np</span><span class="o">.</span><span class="n">array</span><span class="p">(</span><span class="n">X3d</span><span class="p">)</span>
</pre></div>
<h4>3.b.ii. The DNN Model</h4>
<p>We are now going to make a function for our model which returns a compiled model.</p>
<div class="highlight"><pre><span></span><span class="k">def</span> <span class="nf">prepare_model</span><span class="p">():</span>
<span class="n">model</span> <span class="o">=</span> <span class="n">Sequential</span><span class="p">()</span>
<span class="n">model</span><span class="o">.</span><span class="n">add</span><span class="p">(</span><span class="n">Dense</span><span class="p">(</span><span class="n">input_neurons</span><span class="p">,</span> <span class="n">input_dim</span> <span class="o">=</span> <span class="n">dimension2</span><span class="p">,</span> <span class="n">activation</span><span class="o">=</span><span class="n">act1</span><span class="p">))</span>
<span class="n">lr</span><span class="o">=</span><span class="n">LeakyReLU</span><span class="p">(</span><span class="n">alpha</span><span class="o">=.</span><span class="mo">001</span><span class="p">)</span>
<span class="n">model</span><span class="o">.</span><span class="n">add</span><span class="p">(</span><span class="n">lr</span><span class="p">)</span>
<span class="n">model</span><span class="o">.</span><span class="n">add</span><span class="p">(</span><span class="n">Dropout</span><span class="p">(</span><span class="n">dropout</span><span class="p">))</span>
<span class="n">model</span><span class="o">.</span><span class="n">add</span><span class="p">(</span><span class="n">Dense</span><span class="p">(</span><span class="n">input_neurons</span><span class="p">,</span> <span class="n">activation</span><span class="o">=</span><span class="n">act2</span><span class="p">))</span>
<span class="n">model</span><span class="o">.</span><span class="n">add</span><span class="p">(</span><span class="n">Dropout</span><span class="p">(</span><span class="n">dropout</span><span class="p">))</span>
<span class="n">model</span><span class="o">.</span><span class="n">add</span><span class="p">(</span><span class="n">Dense</span><span class="p">(</span><span class="n">input_neurons</span><span class="p">,</span> <span class="n">activation</span><span class="o">=</span><span class="n">act3</span><span class="p">))</span>
<span class="n">model</span><span class="o">.</span><span class="n">add</span><span class="p">(</span><span class="n">Dropout</span><span class="p">(</span><span class="n">dropout</span><span class="p">))</span>
<span class="n">model</span><span class="o">.</span><span class="n">add</span><span class="p">(</span><span class="n">Dense</span><span class="p">(</span><span class="n">num_classes</span><span class="p">,</span> <span class="n">activation</span><span class="o">=</span><span class="n">act4</span><span class="p">))</span>
<span class="n">model</span><span class="o">.</span><span class="n">compile</span><span class="p">(</span><span class="n">loss</span><span class="o">=</span><span class="s1">'categorical_crossentropy'</span><span class="p">,</span>
<span class="n">optimizer</span><span class="o">=</span><span class="s1">'adam'</span><span class="p">,</span>
<span class="n">metrics</span><span class="o">=</span><span class="p">[</span><span class="s1">'accuracy'</span><span class="p">])</span>
<span class="k">return</span> <span class="n">model</span>
</pre></div>
<p>We are going to implement cross-validation. In our case, we are going to do a 10-fold cross-validation.</p>
<div class="highlight"><pre><span></span><span class="n">kf</span> <span class="o">=</span> <span class="n">KFold</span><span class="p">(</span><span class="nb">len</span><span class="p">(</span><span class="n">tr_X</span><span class="p">),</span><span class="mi">2</span><span class="p">,</span><span class="n">shuffle</span><span class="o">=</span><span class="bp">True</span><span class="p">,</span><span class="n">random_state</span><span class="o">=</span><span class="mi">42</span><span class="p">)</span>
<span class="n">results</span><span class="o">=</span><span class="p">[]</span>
<span class="k">for</span> <span class="n">train_indices</span><span class="p">,</span> <span class="n">test_indices</span> <span class="ow">in</span> <span class="n">kf</span><span class="p">:</span>
<span class="n">train_x</span> <span class="o">=</span> <span class="p">[</span><span class="n">tr_X</span><span class="p">[</span><span class="n">ii</span><span class="p">]</span> <span class="k">for</span> <span class="n">ii</span> <span class="ow">in</span> <span class="n">train_indices</span><span class="p">]</span>
<span class="n">train_y</span> <span class="o">=</span> <span class="p">[</span><span class="n">tr_y</span><span class="p">[</span><span class="n">ii</span><span class="p">]</span> <span class="k">for</span> <span class="n">ii</span> <span class="ow">in</span> <span class="n">train_indices</span><span class="p">]</span>
<span class="n">test_x</span> <span class="o">=</span> <span class="p">[</span><span class="n">tr_X</span><span class="p">[</span><span class="n">ii</span><span class="p">]</span> <span class="k">for</span> <span class="n">ii</span> <span class="ow">in</span> <span class="n">test_indices</span><span class="p">]</span>
<span class="n">test_y</span> <span class="o">=</span> <span class="p">[</span><span class="n">tr_y</span><span class="p">[</span><span class="n">ii</span><span class="p">]</span> <span class="k">for</span> <span class="n">ii</span> <span class="ow">in</span> <span class="n">test_indices</span><span class="p">]</span>
<span class="n">train_y</span> <span class="o">=</span> <span class="n">to_categorical</span><span class="p">(</span><span class="n">train_y</span><span class="p">,</span><span class="n">num_classes</span><span class="o">=</span><span class="mi">4</span><span class="p">)</span>
<span class="n">test_y</span> <span class="o">=</span> <span class="n">to_categorical</span><span class="p">(</span><span class="n">test_y</span><span class="p">,</span><span class="n">num_classes</span><span class="o">=</span><span class="mi">4</span><span class="p">)</span>
<span class="n">train_x</span><span class="o">=</span><span class="n">np</span><span class="o">.</span><span class="n">array</span><span class="p">(</span><span class="n">train_x</span><span class="p">)</span>
<span class="n">train_y</span><span class="o">=</span><span class="n">np</span><span class="o">.</span><span class="n">array</span><span class="p">(</span><span class="n">train_y</span><span class="p">)</span>
<span class="n">test_x</span><span class="o">=</span><span class="n">np</span><span class="o">.</span><span class="n">array</span><span class="p">(</span><span class="n">test_x</span><span class="p">)</span>
<span class="n">test_y</span><span class="o">=</span><span class="n">np</span><span class="o">.</span><span class="n">array</span><span class="p">(</span><span class="n">test_y</span><span class="p">)</span>
<span class="k">print</span> <span class="s2">"All arrays loaded"</span>
<span class="c1">#get compiled model</span>
<span class="n">lrmodel</span><span class="o">=</span><span class="n">prepare_model</span><span class="p">()</span>
<span class="c1">#see the model</span>
<span class="k">print</span> <span class="n">lrmodel</span><span class="o">.</span><span class="n">summary</span><span class="p">()</span>
<span class="c1">#fit the model</span>
<span class="n">lrmodel</span><span class="o">.</span><span class="n">fit</span><span class="p">(</span><span class="n">train_x</span><span class="p">,</span><span class="n">train_y</span><span class="p">,</span><span class="n">batch_size</span><span class="o">=</span><span class="n">batchsize</span><span class="p">,</span><span class="n">epochs</span><span class="o">=</span><span class="n">epochs</span><span class="p">,</span><span class="n">verbose</span><span class="o">=</span><span class="mi">1</span><span class="p">)</span>
<span class="c1">#make prediction</span>
<span class="n">pred</span><span class="o">=</span><span class="n">lrmodel</span><span class="o">.</span><span class="n">predict</span><span class="p">(</span><span class="n">test_x</span><span class="p">,</span> <span class="n">batch_size</span><span class="o">=</span><span class="mi">32</span><span class="p">,</span> <span class="n">verbose</span><span class="o">=</span><span class="mi">0</span><span class="p">)</span>
<span class="n">pred</span> <span class="o">=</span> <span class="p">[</span><span class="n">ii</span><span class="o">.</span><span class="n">argmax</span><span class="p">()</span><span class="k">for</span> <span class="n">ii</span> <span class="ow">in</span> <span class="n">pred</span><span class="p">]</span>
<span class="n">test_y</span> <span class="o">=</span> <span class="p">[</span><span class="n">ii</span><span class="o">.</span><span class="n">argmax</span><span class="p">()</span><span class="k">for</span> <span class="n">ii</span> <span class="ow">in</span> <span class="n">test_y</span><span class="p">]</span>
<span class="n">results</span><span class="o">.</span><span class="n">append</span><span class="p">(</span><span class="n">accuracy_score</span><span class="p">(</span><span class="n">pred</span><span class="p">,</span><span class="n">test_y</span><span class="p">))</span>
<span class="k">print</span> <span class="n">accuracy_score</span><span class="p">(</span><span class="n">pred</span><span class="p">,</span><span class="n">test_y</span><span class="p">)</span>
<span class="n">jj</span><span class="o">=</span><span class="nb">str</span><span class="p">(</span><span class="nb">set</span><span class="p">(</span><span class="nb">list</span><span class="p">(</span><span class="n">test_y</span><span class="p">)))</span>
<span class="k">print</span> <span class="s2">"Unique in test_y"</span><span class="p">,</span><span class="n">jj</span>
<span class="k">print</span> <span class="s2">"Results: "</span> <span class="o">+</span> <span class="nb">str</span><span class="p">(</span> <span class="n">np</span><span class="o">.</span><span class="n">array</span><span class="p">(</span><span class="n">results</span><span class="p">)</span><span class="o">.</span><span class="n">mean</span><span class="p">()</span> <span class="p">)</span>
<span class="k">print</span> <span class="n">classification_report</span><span class="p">(</span><span class="n">np</span><span class="o">.</span><span class="n">array</span><span class="p">(</span><span class="n">test_y</span><span class="p">),</span><span class="n">pred</span><span class="p">)</span><span class="o">.</span><span class="n">split</span><span class="p">(</span><span class="s1">'</span><span class="se">\n</span><span class="s1">'</span><span class="p">)</span>
</pre></div>
<h2>4. References</h2>
<ol>
<li>X. Huang, A. Acero, and H.-W. Hon, Spoken Language Processing: A Guide to Theory, Algorithm, and System Development. Upper Saddle River, NJ, USA: Prentice Hall PTR, 1st ed., 2001.</li>
<li>S. Davis and P. Mermelstein, “Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences,” Acoustics, Speech and Signal Processing, IEEE Transactions on, vol. 28, pp. 357–366, Aug 1980.</li>
<li>M. Grierson, “Maximilian: A cross platform c++ audio synthesis library for artists learning to program.,” in Proceedings of International Computer Music Conference, 2010.</li>
<li>lidy2016cqt,"CQT-based convolutional neural networks for audio scene classification and domestic audio tagging," in IEEE AASP Challenge on Detection and Classification of Acoustic Scenes and Events (DCASE 2016), Budapest, Hungary, Tech. Rep, 2016.</li>
</ol></div>
<hr>
<h2>Comments</h2>
<div id="disqus_thread"></div>
<script type="text/javascript">
var disqus_shortname = 'channelcsgit';
var disqus_title = 'Build our first Neural Network for Audio Processing';
(function() {
var dsq = document.createElement('script'); dsq.type = 'text/javascript'; dsq.async = true;
dsq.src = 'https://' + disqus_shortname + '.disqus.com/embed.js';
(document.getElementsByTagName('head')[0] || document.getElementsByTagName('body')[0]).appendChild(dsq);
})();
</script>
<noscript>Please enable JavaScript to view the <a href="http://disqus.com/?ref_noscript">comments powered by Disqus.</a></noscript>
</div>
</div>
<div class="span3">
<div class="well" style="padding: 8px 0; background-color: #FBFBFB;">
<ul class="nav nav-list">
<li class="nav-header">
Site
</li>
<li><a href="/archives.html">Archives</a>
<li><a href="/tags.html">Tags</a>
<li><a href="/feeds/all.atom.xml" rel="alternate">Atom feed</a></li>
</ul>
</div>
<div class="well" style="padding: 8px 0; background-color: #FBFBFB;">
<ul class="nav nav-list">
<li class="nav-header">
Categories
</li>
<li><a href="/category/audio-processing.html">Audio Processing</a></li>
<li><a href="/category/deep-learning.html">Deep Learning</a></li>
<li><a href="/category/docs.html">Docs</a></li>
<li><a href="/category/github.html">Github</a></li>
<li><a href="/category/main.html">main</a></li>
<li><a href="/category/outreachy.html">Outreachy</a></li>
</ul>
</div>
<div class="well" style="padding: 8px 0; background-color: #FBFBFB;">
<ul class="nav nav-list">
<li class="nav-header">
Links
</li>
<li><a href="http://github.com/channelCS">Github</a></li>
<li><a href="http://github.com/Deeplearn-lab">Deeplearn-lab</a></li>
</ul>
</div>
</div>
</div> </div>
<footer>
<br />
<p><a href="">channelCS</a> © channelCS 2018</p>
</footer>
</div> <!-- /container -->
<script src="http://ajax.googleapis.com/ajax/libs/jquery/1.7.1/jquery.min.js"></script>
<script src="/theme/bootstrap-collapse.js"></script>
</body>
</html>