-
Notifications
You must be signed in to change notification settings - Fork 12
/
FirstContactWithTensorFlow.txt
3613 lines (2784 loc) · 131 KB
/
FirstContactWithTensorFlow.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
First contact with TensorFlow
get started with Deep Learning programming
First contact with TensorFlow
get started with Deep Learning programming
Jordi Torres
Universitat Politecnica de Catalunya - UPC Barcelona Tech
Barcelona Supercomputing Center - Centro Nacional de
Supercomputacion (BSC-CNS)
April, 2016
Cover illustration: Supercomputer Marenostrum - Torre Girona chapel
WATCH THIS SPACE Collection
First english edition: March 2016.
Jordi Torres
www.JordiTorres.eu
Universitat Politecncia de Catalunya - UPC Barcelona Tech
UPC Campus Nord, modul C6 desp. 217
Jordi Girona 1-3
08034 Barcelona
Cover design: Jordi Torres
Ilustrations: Jordi Torres
Orthographic and typographic proofreader: Laura Juan Merino
Editor: Ferran Julia Masso
Publisher: Jordi Torres , BSC-CNS
Citation:
First contact with TensorFlow,
get started with Deep Learing programming
Jordi Torres,
Ed. BSC-CNS, Barcelona, 2016
ISBN 978-1-326-56933-4
This book is licensed under a Creative Commons Attribution-
NonCommercial-ShareAlike 3.0 Unported License (CC BY-NC-SA 3.0). In
short: Jordi Torres retains the Copyright but you are free to reproduce, re-
blog, remix and modify the content only under the same license to this
one. You may not use this work for commercial purposes but permission
to use this material in nonprofit teaching is still granted, provided the au-
thorship and licensing information here is displayed.
This book is devoted to the open-source community,
whose work we consume every day without knowing
?
?
Contents
Contents vii
Foreword ix
Preface xi
A practical approach xiii
1. TensorFlow basics 17
An Open Source Package 17
TensorFlow Serving 19
TensorFlow Installation 21
My first code in TensorFlow 24
Display panel Tensorboard 30
2. Linear Regression in TensorFlow 33
Model of relationship between variables 33
Cost function and gradient descent algorithm 36
Running the algorithm 39
3. Clustering in TensorFlow 47
Basic data structure: tensor 48
Data Storage in TensorFlow 52
K-means algorithm 57
New groups 63
Computation of the new centroids 70
Graph Execution 72
4. Single Layer Neural Network in TensorFlow 77
The MNIST Data-set 78
An artificial neuron 82
An easy example to start: Softmax 87
Programming in TensorFlow 93
Model evaluation 100
5. Multi-layer Neural Networks in TensorFlow 103
Convolutional Neural Networks 104
Implementation of the model 111
Training and evaluation of the model 116
6. Parallelism 121
Execution environment with GPUs 121
Parallelism with several GPUs 123
Code example with GPUs 125
Distributed version of TensorFlow 128
Closing 131
Acknowledgments 135
About the Author 137
About BSC 139
About UPC 139
Foreword
The area of Machine Learning has shown a great expansion
thanks to the co-development of key areas such as compu-
ting, massive data storage and Internet technologies. Many
of the technologies and events of everyday life of many peo-
ple are directly or indirectly influenced by automatic learn-
ing. Examples of technologies such as speech recognition,
image classification on our phones or detection of spam
emails, have enabled apps that a decade ago would have on-
ly sounded possible in science fiction. The use of learning in
stock market models or medical models has impacted our
society massively. In addition, cars with cruise control,
drones and robots of all types will impact society in the not
too distant future.
Deep Learning, a subtype of Machine Learning, has un-
doubtedly been one of the fields which has had an explosive
expansion since it was rediscovered in 2006. Indeed, many of
the startups in Silicon Valley specialize in it, and big technol-
ogy companies like Google, Facebook, Microsoft or IBM have
both development and research teams. Deep Learning has
generated interest even outside the university and research
areas: a lot of specialized magazines (like Wired) and even
generic ones (such as New York Times, Bloomberg or BBC)
have written many articles about this subject.
This interest has led many students, entrepreneurs and in-
vestors to join Deep Learning. Thanks to all the interest gen-
erated, several packages have been opened as "Open Source".
Being one of the main promoters of the library we developed
at Berkeley (Caffe) in 2012 as a PhD student, I can say that
TensorFlow, presented in this book and also designed by
Google (California), where I have been researching since
2013, will be one of the main tools that researchers and SME
companies will use to develop their ideas about Deep Learn-
ing and Machine Learning. A guarantee of this is the number
of engineers and top researchers who have participated in
this project, culminated with the Open Sourcing.
I hope this introductory book will help the reader interested
in starting their adventure in this very interesting field. I
would like to thank the author, whom I have the pleasure of
knowing, the effort to disseminate this technology. He wrote
this book (first Spanish version) in record time, two months
after the open source project release was announced. This is
another example of the vitality of Barcelona and its interest
to be one of the actors in this technological scenario that un-
doubtedly will impact our future.
Oriol Vinyals
Research Scientist at Google Brain
Preface
Education is the most powerful
weapon which you can use to change
the world.
Nelson Mandela
The purpose of this book is to help to spread this knowledge
among engineers who want to expand their wisdom in the
exciting world of Machine Learning. I believe that anyone
with an engineering background may find applications of
Deep Learning, and Machine Learning in general, valuable
to their work.
Given my background, the reader probably will wonder why
I have proposed this challenge of writing about this new
Deep Learning technology. My research focus is gradually
moving from supercomputing architectures and runtimes to
execution middleware's for big data workloads, and more
recently to platforms for Machine Learning on massive data.
Precisely by being an engineer, not a data scientist, I think I
can contribute with this introductory approach to the subject,
and that it can be helpful for many engineers in the early
stages; then it will be their choice to go deeper into what they
need.
I hope this book adds some value to this world of education
that I love so much. I think that knowledge is liberation and
should be accessible to all. For this reason, the content of this
book will be available on the website
www.JordiTorres.eu/TensorFlow completely free. If the reader
finds the content useful and considers it appropriate to com-
pensate the effort of the author in writing it, there is a tab on
the website to make a donation. On the other hand, if the
reader prefers to opt for a paper copy, you can purchase the
book through Amazon.com portal.
A Spanish version is also available. Indeed, this book is the
translation of the Spanish one, which was finished last Janu-
ary and it was presented in the GEMLeB Meetup (Grup d'Es-
tudi de Machine Learning de Barcelona) of which I am one of
the coorganizers.
Let me thank you for reading this book! It comforts me and
justifies my effort for writing it. Those who know me, know
that technological diffusion is one of my passions. It energiz-
es and motivates me to keep learning.
Jordi Torres, February 2016
A practical approach
Tell me and I forget. Teach me and I
remember. Involve me and I learn.
Benjamin Franklin
One of the common applications of Deep Learning includes
pattern recognition. Therefore, in the same way as when you
start programming there is sort of a tradition to start printing
"Hello World", in Deep Learning a model for the recognition
of handwritten digits is usually constructed . The first exam-
ple of a neural network that I will provide, will also allow me
to introduce this new technology called TensorFlow.
However, I do not intend to write a research book on Ma-
chine Learning or Deep Learning, I only want to make this
new Machine Learning's package, TensorFlow, available to
everybody, as soon as possible. Therefore I apologise in to
my fellow data scientists for certain simplifications that I
have allowed myself in order to share this knowledge with
the general reader.
The reader will find here the regular structure that I use in
my classes; that is inviting you to use your computer's key-
board while you learn. We call it "learn by doing", and my ex-
perience as a professor at UPC tells me that it is an approach
that works very well with engineers who are trying to start a
new topic.
For this reason, the book is of a practical nature, and there-
fore I have reduced the theoretical part as much as possible.
However certain mathematical details have been included in
the text when they are necessary for the learning process.
I assume that the reader has some basic underestanding of
Machine Learning, so I will use some popular algorithms to
gradually organize the reader's training in TensorFlow.
In the first chapter, in addition to an introduction to the sce-
nario in which TensorFlow will have an important role, I
take the opportunity to explain the basic structure of a Ten-
sorFlow program, and explain briefly the data it maintains
internally.
In chapter two, through an example of linear regression, I
will present some code basics and, at the same time, how to
call various important components in the learning process,
such as the cost function or the gradient descent optimiza-
tion algorithm.
In chapter three, where I present a clustering algorithm, I go
into detail to present the basic data structure of TensorFlow
called tensor, and the different classes and functions that the
TensorFlow package offers to create and manage the tensors.
In chapter four, how to build a neural network with a single
layer to recognize handwritten digits is presented in detail.
This will allow us to sort all the concepts presented above, as
well as see the entire process of creating and testing a model.
The next chapter begins with an explanation based on neural
network concepts seen in the previous chapter and introduc-
es how to construct a multilayer neural network to get a bet-
ter result in the recognition of handwritten digits. What it is
known as convolutional neural network will be presented in
more detail.
In chapter six we look at a more specific issue, probably not
of interest to all readers, harnessing the power of calculation
presented by GPUs. As introduced in chapter 1, GPUs play
an important role in the training process of neural networks.
The book ends with closing remarks, in which I highlight
some conclusions. I would like to emphasize that the exam-
ples of code in this book can be downloaded from the github
repository of the book .
1. TensorFlow basics
In this chapter I will present very briefly how a TensorFlow's
code and their programming model is. At the end of this
chapter, it is expected that the reader can install the Tensor-
Flow package on their personal computer.
An Open Source Package
Machine Learning has been investigated by the academy for
decades, but it is only in recent years that its penetration has
also increased in corporations. This happened thanks to the
large volume of data it already had and the unprecedented
computing capacity available nowadays.
In this scenario, there is no doubt that Google, under the
holding of Alphabet, is one of the largest corporations where
Machine Learning technology plays a key role in all of its vir-
tual initiatives and products.
Last October, when Alphabet announced its quarterly
Google's results, with considerable increases in sales and
profits, CEO Sundar Pichai said clearly: "Machine learning is a
core, transformative way by which we're rethinking everything
we're doing".
Technologically speaking, we are facing a change of era in
which Google is not the only big player. Other technology
companies such as Microsoft, Facebook, Amazon and Apple,
among many other corporations are also increasing their in-
vestment in these areas.
In this context, a few months ago Google released its Tensor-
Flow engine under an open source license (Apache 2.0). Ten-
sorFlow can be used by developers and researchers who
want to incorporate Machine Learning in their projects and
products, in the same way that Google is doing internally
with different commercial products like Gmail, Google Pho-
tos, Search, voice recognition, etc.
TensorFlow was originally developed by the Google Brain
Team, with the purpose of conducting Machine Learning
and deep neural networks research, but the system is general
enough to be applied in a wide variety of other Machine
Learning problems.
Since I am an engineer and I am speaking to engineers, the
book will look under the hood to see how the algorithms are
represented by a data flow graph. TensorFlow can be seen as
a library for numerical computation using data flow graphs.
The nodes in the graph represent mathematical operations,
while the graph edges represent the multidimensional data
arrays (tensors), which interconnect the nodes.
TensorFlow is constructed around the basic idea of building
and manipulating a computational graph, representing sym-
bolically the numerical operations to be performed. This al-
lows TensorFlow to take advantage of both CPUs and GPUs
right now from Linux 64-bit platforms such as Mac OS X, as
well as mobile platforms such as Android or iOS.
Another strength of this new package is its visual Tensor-
Board module that allows a lot of information about how the
algorithm is running to be monitored and displayed. Being
able to measure and display the behavior of algorithms is ex-
tremely important in the process of creating better models. I
have a feeling that currently many models are refined
through a little blind process, through trial and error, with
the obvious waste of resources and, above all, time.
TensorFlow Serving
Recently Google launched TensorFlow Serving , that helps
developers to take their TensorFlow machine learning
models (and, even so, can be extended to serve other types of
models) into production. TensorFlow Serving is an open
source serving system (written in C++) now available on
GitHub under the Apache 2.0 license.
What is the difference between TensorFlow and TensorFlow
Serving? While in TensorFlow it is easier for the developers
to build machine learning algorithms and train them
for certain types of data inputs, TensorFlow Serving
specializes in making these models usable in production
environments. The idea is that developers train their models
using TensorFlow and then they use TensorFlow Serving's
APIs to react to input from a client.
This allows developers to experiment with different models
on a large scale that change over time, based on real-world
data, and maintain a stable architecture and API in place.
The typical pipeline is that a training data is fed to the
learner, which outputs a model, which after being validated
is ready to be deployed to the TensorFlow serving system. It
is quite common to launch and iterate on our model over
time, as new data becomes available, or as you improve the
model. In fact, in the google post they mention that at
Google, many pipelines are running continuously, producing
new model versions as new data becomes available.
Developers use to communicate with TensorFlow Serving
a front-end implementation based on gRPC, a high
performance, open source RPC framework from Google.
If you are interested in learning more about TensorFlow
Serving, I suggest you start by by reading the Serving
architecture overview section, set up your environment and
start to do a basic tutorial .
TensorFlow Installation
It is time to get your hands dirty. From now on, I recom-
mend that you interleave the reading with the practice on
your computer.
TensorFlow has a Python API (plus a C / C ++) that requires
the installation of Python 2.7 (I assume that any engineer
who reads this book knows how to do it).
In general, when you are working in Python, you should use
the virtual environment virtualenv. Virtualenv is a tool to keep
Python dependencies required in different projects, in differ-
ent parts of the same computer. If we use virtualenv to install
TensorFlow, this will not overwrite existing versions of Py-
thon packages from other projects required by TensorFlow.
First, you should install pip and virtualenv if they are not al-
ready installed, like the follow script shows:
# Ubuntu/Linux 64-bit
$ sudo apt-get install python-pip python-dev python-virtualenv
# Mac OS X
$ sudo easy_install pip
$ sudo pip install --upgrade virtualenv
Then you must create a virtual environment virtualenv. The
following commands create a virtual environment virtualenv
in the ~/tensorflow directory:
$ virtualenv --system-site-packages ~/tensorflow
The next step is to activate the virtualenv. This can be done
as follows:
$ source ~/tensorflow/bin/activate# si se usa bash
$ source ~/tensorflow/bin/activate.csh# si se usa csh
(tensorflow)$
The name of the virtual environment in which we are work-
ing will appear at the beginning of each command line from
now on. Once the virtualenv is activated, you can use pip to
install TensorFlow inside it:
# Ubuntu/Linux 64-bit, CPU only:
(tensorflow)$ sudo pip install --upgrade
https://storage.googleapis.com/tensorflow/linux/cpu/tensorflow
-0.7.1-cp27-none-linux_x86_64.whl
# Mac OS X, CPU only:
(tensorflow)$ sudo easy_install --upgrade six
(tensorflow)$ sudo pip install --upgrade
https://storage.googleapis.com/tensorflow/mac/tensorflow-
0.7.1-cp27-none-any.whl
I recommend that you visit the official documentation indi-
cated here, to be sure that you are installing the latest availa-
ble version.
If the platform where you are running your code has a GPU,
the package to use will be different. I recommend that you
visit the official documentation to see if your GPU meets the
specifications required to support Tensorflow. Installing ad-
ditional software is required to run Tensorflow GPU and all
the information can be found at Download and Setup Tensor-
Flow web page. For more information on the use of GPUs, I
suggest reading chapter 6.
Finally, when you've finished, you should disable the virtual
environment as follows:
(tensorflow)$ deactivate
Given the introductory nature of this book, we suggest that-
the reader visits the mentioned official documentation page
to find more information about other ways to install Tensor-
flow.
My first code in TensorFlow
As I mentioned at the beginning, we will move in this explo-
ration of the planet TensorFlow with little theory and lots of
practice. Let's start!
From now on, it is best to use any text editor to write python
code and save it with extension ".py" (eg test.py). To run the
code, it will be enough with the command python test.py.
To get a first impression of what a TensorFlow's program is,
I suggest doing a simple multiplication program; the code
looks like this:
import tensorflow as tf
a = tf.placeholder("float")
b = tf.placeholder("float")
y = tf.mul(a, b)
sess = tf.Session()
printsess.run(y, feed_dict={a: 3, b: 3})
In this code, after importing the Python module tensorflow,
we define "symbolic" variables, called placeholder in order to
manipulate them during the program execution. Then, we
move these variables as a parameter in the call to the func-
tion multiply that TensorFlow offers. tf.mul is one of the
many mathematical operations that TensorFlow offers to
manipulate the tensors. In this moment, tensors can be con-
sidered dynamically-sized, multidimensional data arrays.
The main ones are shown in the following table:
Operation
Description
tf.add
sum
tf.sub
substraction
tf.mul
multiplication
tf.div
division
tf.mod
module
tf.abs
return the absolute value
tf.neg
return negative value
tf.sign
return the sign
tf.inv
returns the inverse
tf.square
calculates the square
tf.round
returns the nearest integer
tf.sqrt
calculates the square root
tf.pow
calculates the power
tf.exp
calculates the exponential
tf.log
calculates the logarithm
tf.maximum
returns the maximum
tf.minimum
returns the minimum
tf.cos
calculates the cosine
tf.sin
calculates the sine
TensorFlow also offers the programmer a number of func-
tions to perform mathematical operations on matrices. Some
are listed below:
Operation
Description
tf.diag
returns a diagonal tensor with a
given diagonal values
tf.transpose
returns the transposes of the ar-
gument
tf.matmul
returns a tensor product of multi-
plying two tensors listed as argu-
ments
tf.matrix_determinant
returns the determinant of the
square matrix specified as an ar-
gument
tf.matrix_inverse
returns the inverse of the square
matrix specified as an argument
The next step, one of the most important, is to create a ses-
sion to evaluate the specified symbolic expression. Indeed,
until now nothing has yet been executed in this TensorFlow-
code. Let me emphasize that TensorFlow is both, an interface
to express Machine Learning's algorithms and an implemen-
tation to run them, and this is a good example.
Programs interact with Tensorflow libraries by creating a
session with Session(); it is only from the creation of this ses-
sion when we can call the run() method, and that is when it
really starts to run the specified code. In this particular ex-
ample, the values of the variables are introduced into the
run() method with a feed_dict argument. That's when the as-
sociated code solves the expression and exits from the dis-
play a 9 as a result of multiplication.
With this simple example, I tried to introduce the idea that
the normal way to program in TensorFlow is to specify the
whole problem first, and eventually create a session to allow
the running of the associated computation.
Sometimes however, we are interested in having more flexi-
bility in order to structure the code, inserting operations to
build the graph with operations running part of it. It hap-
pens when we are, for example, using interactive environ-
ments of Python such as IPython . For this purpose, Tesor-
Flow offers the tf.InteractiveSession() class.
The motivation for this programming model is beyond the
reach of this book. However, to continue with the next chap-
ter, we only need to know that all information is saved inter-
nally in a graph structure that contains all the information
operations and data .
This graph describes mathematical computations. The nodes
typically implement mathematical operations, but they can
also represent points of data entry, output results, or
read/write persistent variables. The edges describe the rela-
tionships between nodes with their inputs and outputs and
at the same time carry tensors, the basic data structure of
TensorFlow.
The representation of the information as a graph allows Ten-
sorFlow to know the dependencies between transactions and
assigns operations to devices asynchronously, and in paral-
lel, when these operations already have their associated ten-
sors (indicated in the edges input) available.
Parallelism is therefore one of the factors that enables us to
speed up the execution of some computationally expensive
algorithms, but also because TensorFlow has already effi-
ciently implemented a set of complex operations. In addition,
most of these operations have associated kernels which are
implementations of operations designed for specific devices
such as GPUs. The following table summarizes the most im-
portant operations/kernels :
Operations groups
Operations
Maths
Add, Sub, Mul, Div,
Exp, Log, Greater, Less,
Equal
Array
Concat, Slice, Split,
Constant, Rank, Shape,
Shuffle
Matrix
MatMul, MatrixInverse,
MatrixDeterminant
Neuronal Network
SoftMax, Sigmoid,
ReLU, Convolution2D,
MaxPool
Checkpointing
Save, Restore
Queues and syncronizations
Enqueue, Dequeue,
MutexAcquire,
MutexRelease
Flow control
Merge, Switch, Enter,
Leave, NextIteration
Display panel Tensorboard
To make it more comprehensive, TensorFlow includes func-
tions to debug and optimize programs in a visualization tool
called TensorBoard. TensorBoard can view different types of
statistics about the parameters and details of any part of the
graph computing graphically.
The data displayed with TensorBoard module is generated
during the execution of TensorFlow and stored in trace files
whose data is obtained from the summary operations. In the
documentation page of TensorFlow, you can find detailed
explanation of the Python API.
The way we can invoke it is very simple: a service with Ten-
sorflow commands from the command line, which will in-
clude as an argument the file that contains the trace.
(tensorflow)$ tensorboard --logdir=<trace file>
You simply need to access the local socket 6006 from the
browser with http://localhost:6006/ .
The visualization tool called TensorBoard is beyond the
reach of this book. For more details about how Tensorboard
works, the reader can visit the section TensorBoard Graph Vis-
ualization from the TensorFlow tutorial page.
2. Linear Regression in TensorFlow
In this chapter, I will begin exploring TensorFlow's coding
with a simple model: Linear Regression. Based on this exam-
ple, I will present some code basics and, at the same time,
how to call various important components in the learning
process, such as the cost function or the algorithm gradient
descent.
Model of relationship between variables
Linear regression is a statistical technique used to measure
the relationship between variables. Its interest is that the al-
gorithm that implements it is not conceptually complex, and
can also be adapted to a wide variety of situations. For these
reasons, I have found it interesting to start delving into Ten-
sorFlow with an example of linear regression.
Remember that both, in the case of two variables (simple re-
gression) and the case of more than two variables (multiple
regression), linear regression models the relationship be-
tween a dependent variable, independent variables xi and a
random term b.
In this section I will create a simple example to explain how
TensorFlow works assuming that our data model corre-
sponds to a simple linear regression as y = W * x + b. For this,
I use a simple Python program that creates data in a two-
dimensional space, and then I will ask TensorFlow to look
for the line that fits the best in these points.
The first thing to do is to import the NumPy package that we
will use to generate points. The code we have created is as it
follows:
import numpy as np
num_points = 1000
vectors_set = []
for i in xrange(num_points):
x1= np.random.normal(0.0, 0.55)
y1= x1 * 0.1 + 0.3 + np.random.normal(0.0, 0.03)
vectors_set.append([x1, y1])
x_data = [v[0] for v in vectors_set]
y_data = [v[1] for v in vectors_set]
As you can see from the code, we have generated points fol-
lowing the relationship y = 0.1 * x + 0.3, albeit with some var-
iation, using a normal distribution, so the points do not fully
correspond to a line, allowing us to make a more interesting
example.
In our case, a display of the resulting cloud of points is:
The reader can view them with the following code (in this
case, we need to import some of the functions of matplotlib
package, running pip install matplotlib ):
import matplotlib.pyplot as plt
plt.plot(x_data, y_data, 'ro', label='Original data')
plt.legend()
plt.show()
These points are the data that we will consider the training
dataset for our model.
Cost function and gradient descent
algorithm
The next step is to train our learning algorithm to be able to
obtain output values y, estimated from the input data x_data.
In this case, as we know in advance that it is a linear regres-
sion, we can represent our model with only two parameters:
W and b.
The objective is to generate a TensorFlow code that allows to
find the best parameters W and b, that from input data
x_data, adjunct them to y_data output data, in our case it will
be a straight line defined by y_data = W * x_data + b . The
reader knows that W should be close to 0.1 and b to 0.3, but
TensorFlow does not know and it must realize it for itself.
A standard way to solve such problems is to iterate through
each value of the data set and modify the parameters W and
b in order to get a more precise answer every time. To find
out if we are improving in these iterations, we will define a
cost function (also called "error function") that measures
how "good" (actually, as "bad") a certain line is.
This function receives the pair of W and as parameters b and
returns an error value based on how well the line fits the da-
ta. In our example we can use as a cost function the mean
squared error . With the mean squared error we get the average
of the "errors" based on the distance between the real values
and the estimated one on each iteration of the algorithm.
Later, I will go into more detail with the cost function and its
alternatives, but for this introductory example the mean
squared error helps us to move forward step by step.
Now it is time to program everything that I have explained
with TensorFlow. To do this, first we will create three varia-
bles with the following sentences:
W = tf.Variable(tf.random_uniform([1], -1.0, 1.0))
b = tf.Variable(tf.zeros([1]))
y = W * x_data + b
For now, we can move forward knowing only that the call to
the method Variable is defining a variable that resides in the
internal graph data structure of TensorFlow, of which I have
spoken above. We will return with more information about
the method parameters later, but for now I think that it's bet-
ter to move forward to facilitate this first approach.
Now, with these variables defined, we can express the cost
function that we discussed earlier, based on the distance be-
tween each point and the calculated point with the function
y= W * x + b. After that, we can calculate its square, and aver-
age the sum. In TensorFlow this cost function is expressed as
follows:
loss = tf.reduce_mean(tf.square(y - y_data))
As we see, this expression calculates the average of the
squared distances between the y_data point that we know,
and the point y calculated from the input x_data.
At this point, the reader might already suspects that the line
that best fits our data is the one that obtains the lesser error
value. Therefore, if we minimize the error function, we will
find the best model for our data.
Without going into too much detail at the moment, this is
what the optimization algorithm that minimizes functions
known as gradient descent achieves. At a theoretical level
gradient descent is an algorithm that given a function de-
fined by a set of parameters, it starts with an initial set of pa-
rameter values and iteratively moves toward a set of values
that minimize the function. This iterative minimization is
achieved taking steps in the negative direction of the func-
tion gradient . It's conventional to square the distance to en-
sure that it is positive and to make the error function differ-
entiable in order to compute the gradient.
The algorithm begins with the initial values of a set of pa-
rameters (in our case W and b), and then the algorithm is it-
eratively adjusting the value of those variables in a way that,
in the end of the process, the values of the variables mini-
mize the cost function.
To use this algorithm in TensorFlow, we just have to execute
the following two statements:
optimizer = tf.train.GradientDescentOptimizer(0.5)
train = optimizer.minimize(loss)
Right now, this is enough to have the idea that TensorFlow
has created the relevant data in its internal data structure,
and it has also implemented in this structure an optimizer
that may be invoked by train, which it is a gradient descent al-
gorithm to the cost function defined. Later on, we will dis-
cuss the function parameter called learning rate (in our exam-
ple with value 0.5).
Running the algorithm
As we have seen before, at this point in the code the calls
specified to the library TensorFlow have only added infor-
mation to its internal graph, and the runtime of TensorFlow
has not yet run any of the algorithms. Therefore, like the ex-
ample of the previous chapter, we must create a session, call
the run method and passing train as parameter. Also, because
in the code we have specified variables, we must initialize
them previously with the following calls:
init = tf.initialize_all_variables()
sess = tf.Session()
sess.run(init)
Now we can start the iterative process that will allow us to
find the values of W and b, defining the model line that best
fits the points of entry. The training process continues until
the model achieves a desired level of accuracy on the train-
ing data. In our particular example, if we assume that with
only 8 iterations is sufficient, the code could be:
for step in xrange(8):
sess.run(train)
print step, sess.run(W), sess.run(b)
The result of running this code show that the values of W
and b are close to the value that we know beforehand. In my
case, the result of the print is:
(array([ 0.09150752], dtype=float32), array([ 0.30007562],
dtype=float32))
And, if we graphically display the result with the following
code:
plt.plot(x_data, y_data, 'ro')
plt.plot(x_data, sess.run(W) * x_data + sess.run(b))
plt.legend()
plt.show()
We can see graphically the line defined by parameters W =
0.0854 and b = 0.299 achieved with only 8 iterations:
Note that we have only executed eight iterations to simplify
the explanation, but if we run more, the value of parameters
get closer to the expected values. We can use the following
sentence to print the values of W and b:
print(step, sess.run(W), sess.run(b))
In our case the print outputs are:
(0, array([-0.04841119], dtype=float32), array([ 0.29720169], dtype=float32))
(1, array([-0.00449257], dtype=float32), array([ 0.29804006], dtype=float32))
(2, array([ 0.02618564], dtype=float32), array([ 0.29869056], dtype=float32))
(3, array([ 0.04761609], dtype=float32), array([ 0.29914495], dtype=float32))
(4, array([ 0.06258646], dtype=float32), array([ 0.29946238], dtype=float32))
(5, array([ 0.07304412], dtype=float32), array([ 0.29968411], dtype=float32))
(6, array([ 0.08034936], dtype=float32), array([ 0.29983902], dtype=float32))
(7, array([ 0.08545248], dtype=float32), array([ 0.29994723], dtype=float32))
You can observe that the algorithm begins with the initial