-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy path04_Numpy
479 lines (344 loc) · 15.4 KB
/
04_Numpy
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
CHAPTER FOUR: NumPy
==============Your First NumPy Array
In this chapter, we're going to dive into the world of baseball. Along the way,
you'll get comfortable with the basics of numpy, a powerful package to do data
science.
A list baseball has already been defined in the Python script, representing the
height of some baseball players in centimeters. Can you add some code here and
there to create a numpy array from it?
Instructions
Import the numpy package as np, so that you can refer to numpy with np.
Use np.array() to create a numpy array from baseball. Name this array np_baseball.
Print out the type of np_baseball to check that you got it right.
# Create list baseball
baseball = [180, 215, 210, 210, 188, 176, 209, 200]
# Import the numpy package as np
import numpy as np
# Create a numpy array from baseball: np_baseball
np_baseball = np.array([baseball])
# Print out type of np_baseball
print(type(np_baseball))
=============Baseball players' height
You are a huge baseball fan. You decide to call the MLB (Major League Baseball)
and ask around for some more statistics on the height of the main players. They
pass along data on more than a thousand players, which is stored as a regular
Python list: height_in. The height is expressed in inches. Can you make a numpy
array out of it and convert the units to meters?
height_in is already available and the numpy package is loaded, so you can start
straight away (Source: stat.ucla.edu).
Instructions
Create a numpy array from height_in. Name this new array np_height_in.
Print np_height_in.
Multiply np_height_in with 0.0254 to convert all height measurements from inches
to meters. Store the new values in a new array, np_height_m.
Print out np_height_m and check if the output makes sense.
# height_in is available as a regular list
# Import numpy
import numpy as np
# Create a numpy array from height_in: np_height_in
np_height_in = np.array([height_in])
# Print out np_height_in
print(np_height_in)
# Convert np_height_in to m: np_height_m
np_height_m = np_height_in * 0.0254
# Print np_height_m
print(np_height_m)
================Baseball player's BMI
The MLB also offers to let you analyze their weight data. Again, both are
available as regular Python lists: height_in and weight_lb. height_in is in
inches and weight_lb is in pounds.
It's now possible to calculate the BMI of each baseball player. Python code to
convert height_in to a numpy array with the correct units is already available
in the workspace. Follow the instructions step by step and finish the game!
height_in and weight_lb are available as regular lists.
Instructions
Create a numpy array from the weight_lb list with the correct units. Multiply
by 0.453592 to go from pounds to kilograms. Store the resulting numpy array as
np_weight_kg.
Use np_height_m and np_weight_kg to calculate the BMI of each player. Use the
following equation:
BMI = weight(kg) / height(m^2)
Save the resulting numpy array as bmi.
Print out bmi.
# height_in and weight_lb are available as regular lists
# Import numpy
import numpy as np
# Create array from height_in with metric units: np_height_m
np_height_m = np.array(height_in) * 0.0254
# Create array from weight_lb with metric units: np_weight_kg
np_weight_kg = np.array(weight_lb) * 0.453592
# Calculate the BMI: bmi
bmi = np_weight_kg/np_height_m**2
# Print out bmi
print(bmi)
==============ightweight baseball players
To subset both regular Python lists and numpy arrays, you can use square
brackets:
x = [4 , 9 , 6, 3, 1]
x[1]
import numpy as np
y = np.array(x)
y[1]
For numpy specifically, you can also use boolean numpy arrays:
high = y > 5
y[high]
The code that calculates the BMI of all baseball players is already included.
Follow the instructions and reveal interesting things from the data! height_in
and weight_lb are available as regular lists.
Instructions
Create a boolean numpy array: the element of the array should be True if the
corresponding baseball player's BMI is below 21. You can use the < operator for
this. Name the array light.
Print the array light.
Print out a numpy array with the BMIs of all baseball players whose BMI is
below 21. Use light inside square brackets to do a selection on the bmi array.
# height_in and weight_lb are available as a regular lists
# Import numpy
import numpy as np
# Calculate the BMI: bmi
np_height_m = np.array(height_in) * 0.0254
np_weight_kg = np.array(weight_lb) * 0.453592
bmi = np_weight_kg / np_height_m ** 2
# Create the light array
light = bmi < 21
# Print out light
print(light)
# Print out BMIs of all baseball players whose BMI is below 21
print(bmi[light])
==============NumPy Side Effects
As Hugo explained before, numpy is great for doing vector arithmetic. If you
compare its functionality with regular Python lists, however, some things have
changed.
First of all, numpy arrays cannot contain elements with different types. If you
try to build such a list, some of the elements' types are changed to end up
with a homogeneous list. This is known as type coercion.
Second, the typical arithmetic operators, such as +, -, * and / have a
different meaning for regular Python lists and numpy arrays.
Have a look at this line of code:
np.array([True, 1, 2]) + np.array([3, 4, False])
Can you tell which code chunk builds the exact same Python object? The numpy
package is already imported as np, so you can start experimenting in the
IPython Shell straight away!
Instructions
np.arrary([4, 3, 0]) + np.arrary([0, 2, 2])
==============Subsetting NumPy Arrays
You've seen it with your own eyes: Python lists and numpy arrays sometimes
behave differently. Luckily, there are still certainties in this world. For
example, subsetting (using the square bracket notation on lists or arrays)
works exactly the same. To see this for yourself, try the following lines of
code in the IPython Shell:
x = ["a", "b", "c"]
x[1]
np_x = np.array(x)
np_x[1]
The script in the editor already contains code that imports numpy as np, and
stores both the height and weight of the MLB players as numpy arrays.
height_in and weight_lb are available as regular lists.
Instructions
Subset np_weight_lb by printing out the element at index 50.
Print out a sub-array of np_height_in that contains the elements at index 100
up to and including index 110.
# height_in and weight_lb are available as a regular lists
# Import numpy
import numpy as np
# Store weight and height lists as numpy arrays
np_weight_lb = np.array(weight_lb)
np_height_in = np.array(height_in)
# Print out the weight at index 50
print(np_weight_lb[50])
# Print out sub-array of np_height_in: index 100 up to and including index 110
print(np_height_in[100:111])
===============Your First 2D NumPy Array
Before working on the actual MLB data, let's try to create a 2D numpy array
from a small list of lists.
In this exercise, baseball is a list of lists. The main list contains 4
elements. Each of these elements is a list containing the height and the weight
of 4 baseball players, in this order. baseball is already coded for you in the
script.
Instructions
Use np.array() to create a 2D numpy array from baseball. Name it np_baseball.
Print out the type of np_baseball.
Print out the shape attribute of np_baseball. Use np_baseball.shape.
# Create baseball, a list of lists
baseball = [[180, 78.4],
[215, 102.7],
[210, 98.5],
[188, 75.2]]
# Import numpy
import numpy as np
# Create a 2D numpy array from baseball: np_baseball
np_baseball = np.array(baseball)
# Print out the type of np_baseball
print(type(np_baseball))
# Print out the shape of np_baseball
print(np_baseball.shape)
===============Baseball data in 2D form
You have another look at the MLB data and realize that it makes more sense to
restructure all this information in a 2D numpy array. This array should have
1015 rows, corresponding to the 1015 baseball players you have information on,
and 2 columns (for height and weight).
The MLB was, again, very helpful and passed you the data in a different
structure, a Python list of lists. In this list of lists, each sublist
represents the height and weight of a single baseball player. The name of this
embedded list is baseball.
Can you store the data as a 2D array to unlock numpy's extra functionality?
baseball is available as a regular list of lists.
Instructions
Use np.array() to create a 2D numpy array from baseball. Name it np_baseball.
Print out the shape attribute of np_baseball.
# baseball is available as a regular list of lists
# Import numpy package
import numpy as np
# Create a 2D numpy array from baseball: np_baseball
np_baseball = np.array(baseball)
# Print out the shape of np_baseball
print(np_baseball.shape)
==============Subsetting 2D NumPy Arrays
If your 2D numpy array has a regular structure, i.e. each row and column has a
fixed number of values, complicated ways of subsetting become very easy. Have a
look at the code below where the elements "a" and "c" are extracted from a list
of lists.
# regular list of lists
x = [["a", "b"], ["c", "d"]]
[x[0][0], x[1][0]]
# numpy
import numpy as np
np_x = np.array(x)
np_x[:, 0]
For regular Python lists, this is a real pain. For 2D numpy arrays, however,
it's pretty intuitive! The indexes before the comma refer to the rows, while
those after the comma refer to the columns. The : is for slicing; in this
example, it tells Python to include all rows.
The code that converts the pre-loaded baseball list to a 2D numpy array is
already in the script. The first column contains the players' height in inches
and the second column holds player weight, in pounds. Add some lines to make
the correct selections. Remember that in Python, the first element is at index
0! baseball is available as a regular list of lists.
Instructions
Print out the 50th row of np_baseball.
Make a new variable, np_weight_lb, containing the entire second column of
np_baseball.
Select the height (first column) of the 124th baseball player in np_baseball
and print it out.
# baseball is available as a regular list of lists
# Import numpy package
import numpy as np
# Create np_baseball (2 cols)
np_baseball = np.array(baseball)
# Print out the 50th row of np_baseball
print(np_baseball[49, :])
# Select the entire second column of np_baseball: np_weight_lb
np_weight_lb = np_baseball[:,1]
# Print out height of 124th player
print(np_baseball[123, 0])
============2D Arithmetic
Remember how you calculated the Body Mass Index for all baseball players? numpy
was able to perform all calculations element-wise (i.e. element by element).
For 2D numpy arrays this isn't any different! You can combine matrices with
single numbers, with vectors, and with other matrices.
Execute the code below in the IPython shell and see if you understand:
import numpy as np
np_mat = np.array([[1, 2],
[3, 4],
[5, 6]])
np_mat * 2
np_mat + np.array([10, 10])
np_mat + np_mat
np_baseball is coded for you; it's again a 2D numpy array with 3 columns
representing height (in inches), weight (in pounds) and age (in years).
baseball is available as a regular list of lists and updated is available as
2D numpy array.
Instructions
You managed to get hold of the changes in height, weight and age of all
baseball players. It is available as a 2D numpy array, updated. Add np_baseball
and updated and print out the result.
You want to convert the units of height and weight to metric (meters and
kilograms, respectively). As a first step, create a numpy array with three
values: 0.0254, 0.453592 and 1. Name this array conversion.
Multiply np_baseball with conversion and print out the result.
# baseball is available as a regular list of lists
# updated is available as 2D numpy array
# Import numpy package
import numpy as np
# Create np_baseball (3 cols)
np_baseball = np.array(baseball)
# Print out addition of np_baseball and updated
print(np_baseball + updated)
# Create numpy array: conversion
conversion = np.array([0.0254, 0.453592, 1 ])
# Print out product of np_baseball and conversion
print(np_baseball*conversion)
Average versus median
You now know how to use numpy functions to get a better feeling for your data.
It basically comes down to importing numpy and then calling several simple
functions on the numpy arrays:
import numpy as np
x = [1, 4, 8, 10, 12]
np.mean(x)
np.median(x)
The baseball data is available as a 2D numpy array with 3 columns (height,
weight, age) and 1015 rows. The name of this numpy array is np_baseball. After
restructuring the data, however, you notice that some height values are
abnormally high. Follow the instructions and discover which summary statistic
is best suited if you're dealing with so-called outliers. np_baseball is
available.
Instructions
Create numpy array np_height_in that is equal to first column of np_baseball.
Print out the mean of np_height_in.
Print out the median of np_height_in.
# np_baseball is available
# Import numpy
import numpy as np
# Create np_height_in from np_baseball
np_height_in = np.array(np_baseball[:,0])
# Print out the mean of np_height_in
print(np.mean(np_height_in))
# Print out the median of np_height_in
print(np.median(np_height_in))
==============Blend it all together
In the last few exercises you've learned everything there is to know about
heights and weights of baseball players. Now it's time to dive into another
sport: soccer.
You've contacted FIFA for some data and they handed you two lists. The lists
are the following:
positions = ['GK', 'M', 'A', 'D', ...]
heights = [191, 184, 185, 180, ...]
Each element in the lists corresponds to a player. The first list, positions,
contains strings representing each player's position. The possible positions
are: 'GK' (goalkeeper), 'M' (midfield), 'A' (attack) and 'D' (defense). The
second list, heights, contains integers representing the height of the player
in cm. The first player in the lists is a goalkeeper and is pretty tall
(191 cm).
You're fairly confident that the median height of goalkeepers is higher than
that of other players on the soccer field. Some of your friends don't believe
you, so you are determined to show them using the data you received from FIFA
and your newly acquired Python skills. heights and positions are available as
lists.
Instructions
Convert heights and positions, which are regular lists, to numpy arrays. Call
them np_heights and np_positions.
Extract all the heights of the goalkeepers. You can use a little trick here:
use np_positions == 'GK' as an index for np_heights. Assign the result
to gk_heights.
Extract all the heights of all the other players. This time use
np_positions != 'GK' as an index for np_heights. Assign the result to
other_heights.
Print out the median height of the goalkeepers using np.median(). Replace None
with the correct code.
Do the same for the other players. Print out their median height. Replace None
with the correct code.
# heights and positions are available as lists
# Import numpy
import numpy as np
# Convert positions and heights to numpy arrays: np_positions, np_heights
np_heights = np.array(heights)
np_positions = np.array(positions)
# Heights of the goalkeepers: gk_heights
gk_heights = np_heights[np_positions == 'GK']
# Heights of the other players: other_heights
other_heights = np_heights[np_positions != 'GK']
# Print out the median height of goalkeepers. Replace 'None'
print("Median height of goalkeepers: " + str(np.median(gk_heights)))
# Print out the median height of other players. Replace 'None'
print("Median height of other players: " + str(np.median(other_heights)))
===============