generated from rstudio/bookdown-demo
-
Notifications
You must be signed in to change notification settings - Fork 16
/
06-functions.Rmd
639 lines (444 loc) · 34.9 KB
/
06-functions.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
# Functions
This text has already covered how to *use* functions that come to us pre-made.\index{functions} At least we have discussed how to use them in a one-off way--just write the name of the function, write some parentheses after that name, and then plug in any requisite arguments by writing them in a comma-separated way between those two parentheses. This is how it works in both R and Python.
In this section we take a look at how to *define* our own functions. This will not only help us to understand pre-made functions, but it will also be useful if we need some extra functionality that isn't already provided to us.
Writing our own functions is also useful for "packaging up" computations. The utility of this will become apparent very soon. Consider the task of estimating a regression model. If you have a function that performs all of the required calculations, then
* you can estimate models without having to think about lower-level details or write any code yourself, and
* you can re-use this function every time you fit any model on any data set for any project.
## Defining R Functions
To create a function in R, we need another function called `function()`.\index{functions!creating functions in R} We give the output of `function()` a name in the same way we give names to any other variable in R, by using the assignment operator `<-` \index{assignment operator!assignment operator in R}. Here's an example of a toy function called `addOne()`. Here `myInput` is a placeholder that refers to whatever the user of the function ends up plugging in.
```{r, collapse = TRUE}
addOne <- function(myInput){ # define the function
myOutput <- myInput + 1
return(myOutput)
}
addOne(41) # call/invoke/use the function
```
Below the definition, the function is called with an input of `41`. When this happens, the following sequence of events occurs
- The value `41` is assigned to `myInput`
- `myOutput` is given the value `42`
- `myOutput`, which is `42`, is returned from the function
- the temporary variables `myInput` and `myOutput` are destroyed.
We get the desired answer, and all the unnecessary intermediate variables are cleaned up and thrown away after they are no longer needed.
::: {.rmd-caution data-latex=""}
If you are interested in writing a function, I recommend that you first write the logic outside of a function. This initial code will be easier to debug because your temporary variables will not be destroyed after the final result has been obtained. Once you are happy with the working code, you can copy and paste the logic into a function definition, and replace permanent variables with function inputs like `myInput` above.
:::
## Defining Python Functions
To create a function in Python, we use the `def` statement (instead of the `function()` function in R).\index{functions!creating functions in Python} The desired name of the function comes next. After that, the formal parameters come, comma-separated inside parentheses, just like in R.
Defining a function in Python is a little more concise. There is no assignment operator like there is in R, there are no curly braces, and `return` isn't a function like it is in R, so there is no need to use parentheses after it. There is one syntactic addition, though--we need a colon (`:`) at the end of the first line of the definition.
Here is an example of a toy function called `add_one()`.
```{python, collapse = TRUE}
def add_one(my_input): # define the function
my_output = my_input + 1
return my_output
add_one(41) # call/invoke/use the function
```
Below the definition, the function is called with an input of `41`. When this happens, the following sequence of events occurs
- The value `41` is assigned to `my_input`
- `my_output` is given the value `42`
- `my_output`, which is `42`, is returned from the function
- the temporary variables `my_input` and `my_output` are destroyed.
We get the desired answer, and all the unnecessary intermediate variables are cleaned up and thrown away after they are no longer needed.
## More Details On R's User-Defined Functions
Technically, in R, functions are [defined as three things bundled together](https://cran.r-project.org/doc/manuals/r-release/R-lang.html#Function-objects):
1. a **formal argument list** (also known as *formals*),
2. a **body**, and
3. a **parent environment**.
The *formal argument list* is exactly what it sounds like. It is the list of arguments a function takes. You can access a function's formal argument list using the `formals()` function. Note that it is not the *actual* arguments a user will plug in--that isn't knowable at the time the function is created in the first place.
Here is another function that takes a parameter called `whichNumber` that comes with a **default argument** of `1`. If the caller of the function does not specify what she wants to add to `myInput`, `addNumber()` will use `1` as the default. This default value shows up in the output of `formals(addNumber)`.
```{r, collapse = TRUE}
addNumber <- function(myInput, whichNumber = 1){
myOutput <- myInput + whichNumber
return(myOutput)
}
addNumber(3) # no second argument being provided by the user here
formals(addNumber)
```
The function's *body* is also exactly what it sounds like. It is the work that a function performs. You can access a function's body using the the `body()` function.
```{r, collapse = TRUE}
addNumber <- function(myInput, whichNumber = 1){
myOutput <- myInput + whichNumber
return(myOutput)
}
body(addNumber)
```
Every function you create also has a *parent environment*^[Primitive functions are functions that contain no R code and are internally implemented in C. These are the only type of function in R that don't have a parent environment.]. You can get/set this using the `environment()` function. Environments help a function know which variables it is allowed to use and how to use them. The parent environment of a function is where the function was *created*, and it contains variables outside of the body that the function can also use. The rules of which variables a function can use are called *scoping*. When you create functions in R, you are primarily using **lexical scoping**. This is discussed in more detail in section \@ref(function-scope-in-r).
::: {.rmd-details data-latex=""}
There is a lot more information about environments that isn't provided in this text. For instance, a user-defined function also has [binding, execution, and calling environments associated with it](http://adv-r.had.co.nz/Environments.html#function-envs), and environments are used in creating package namespaces, which are important when two packages each have a function with the same name.\index{environments in R}
:::
## More details on Python's user-defined functions
Roughly, Python functions have the same things R functions have. They have a **formal parameter list**, a body, and there are [namespaces](https://docs.python.org/3/tutorial/classes.html#python-scopes-and-namespaces) created that help organize which variables the function can access, as well as which pieces of code can call this new function. A namespace is just a "mapping from names to objects."
These three concepts are analogous to those in R. The names are just a bit different sometimes, and it isn't organized in the same way. To access these bits of information, you need to access the *special attributes* of a function. User-defined functions in Python have a lot of pieces of information attached to them. If you'd like to see all of them, you can visit [this page of documentation](https://docs.python.org/3/reference/datamodel.html#objects-values-and-types).
<!-- Below is a table, taken straight from [the documentation](https://docs.python.org/3/reference/datamodel.html#objects-values-and-types), of all each user-defined function's . -->
<!-- | Attribute | Meaning | -->
<!-- |-----------|-------------------| -->
<!-- `__doc__` | The function’s documentation string, or `None` if unavailable; not inherited by subclasses. -->
<!-- `__name__` | The function’s name. -->
<!-- `__qualname__` | The function’s qualified name. -->
<!-- `__module__` | The name of the module the function was defined in, or None if unavailable. -->
<!-- `__defaults__` | A tuple containing default argument values for those arguments that have defaults, or None if no arguments have a default value. -->
<!-- `__code__` | The code object representing the compiled function body. -->
<!-- `__globals__` | A reference to the dictionary that holds the function’s global variables — the global namespace of the module in which the function was defined. -->
<!-- `__dict__` | The namespace supporting arbitrary function attributes. -->
<!-- `__closure__` | `None` or a tuple of cells that contain bindings for the function’s free variables. See below for information on the `cell_contents` attribute. -->
<!-- `__annotations__` | A dict containing annotations of parameters. The keys of the dict are the parameter names, and 'return' for the return annotation, if provided. -->
<!-- `__kwdefaults__` | A dict containing defaults for keyword-only parameters. -->
So, for instance, let's try to find the *formal parameter list* of a user-defined function below. This is, again, the collection of inputs a function takes. Just like in R, this is not the *actual* arguments a user will plug in--that isn't knowable at the time the function is created.^[You might have noticed that Python uses two different words to prevent confusion. Unlike R, Python uses the word "parameter" (instead of "argument") to refer to the inputs a function takes, and "arguments" to the specific values a user plugs in.] Here we have another function called `add_number()` that takes a parameter `which_number` that is accompanied by a default argument of `1`.
```{python, collapse = TRUE}
def add_number(my_input, which_number = 1): # define a function
my_output = my_input + which_number
return my_output
add_number(3) # no second argument being provided by the user here
add_number.__code__.co_varnames # note this also contains *my_output*
add_number.__defaults__
```
The `__code__` attribute has much more to offer. To see a list of names of all its contents, you can use `dir(add_number.__code__)`.
::: {.rmd-details data-latex=""}
Don't worry if the notation `add_number.__code__` looks strange. The dot (`.`) operator will become more clear in the future chapter on *object-oriented programming*. For now, just think of `__code__` as being an object *belonging to* `add_number`. Objects that belong to other objects are called **attributes** in Python. The dot operator helps us access attributes *inside* other objects. It also helps us access objects belonging to modules that we `import` into our scripts.
:::
## Function Scope in R
R uses **lexical scoping**. This means, in R, \index{scope!function scope in R}
1. functions can use *local* variables that are defined inside themselves,
2. functions can use *global* variables defined in the environment where the function itself was *defined* in, and
3. functions *cannot* necessarily use *global* variables defined in the environment where the function was *called* in, and
4. functions will prefer *local* variables to *global* variables if there is a name clash.
The first characteristic is obvious. The second and third are import to distinguish between. Consider the following code below. `sillyFunction()` can access `a` because `sillyFunction()` and `a` are defined in the same place.
```{r, collapse = TRUE}
a <- 3
sillyFunction <- function(){
return(a + 20)
}
environment(sillyFunction) # the env. it was defined in contains a
sillyFunction()
```
On the other hand, the following example will not work because `a` and `anotherSillyFunc()` are not defined in the same place. Calling the function is not the same as defining a function.
```{r, collapse = TRUE}
anotherSillyFunc <- function(){
return(a + 20)
}
highLevelFunc <- function(){
a <- 99
# this isn't the global environment anotherSillyFunc() was defined in
cat("environment inside highLevelFunc(): ", environment())
anotherSillyFunc()
}
```
Finally, here is a demonstration of a function preferring one `a` over another. When `sillyFunction()` attempts to access `a`, it first looks in its own body, and so the innermost one gets used. On the other hand, `print(a)` shows `3`, the global variable.
```{r, collapse = TRUE}
a <- 3
sillyFunction <- function(){
a <- 20
return(a + 20)
}
sillyFunction()
print(a)
```
The same concept applies if you create functions within functions. The inner function `innerFunc()` looks "inside-out" for variables, but only in the place it was defined.
Below we call `outerFunc()`, which then calls `innerFunc()`. `innerFunc()` can refer to the variable `b`, because it lies in the same environment in which `innerFunc()` was created. Interestingly, `innerFunc()` can also refer to the variable `a`, because that variable was captured by `outerFunc()`, which provides access to `innerFunc()`.
```{r, collapse = TRUE}
a <- "outside both"
outerFunc <- function(){
b <- "inside one"
innerFunc <- function(){
print(a)
print(b)
}
return(innerFunc())
}
outerFunc()
```
Here's another interesting example. If we ask `outerFunc()` to return the function `innerFunc()` (not the return object of `innerFunct()`...functions are objects, too!), then we might be surprised to see that `innerFunc()` can still successfully refer to `b`, even though it doesn't exist inside the *calling environment.* But don't be surprised! What matters is what was available when the function was *created*.
```{r, collapse = TRUE}
outerFuncV2 <- function(){
b <- "inside one"
innerFunc <- function(){
print(b)
}
return(innerFunc) # note the missing inner parentheses!
}
myFunc <- outerFuncV2() # get a new function
ls(environment(myFunc)) # list all data attached to this function
myFunc()
```
We use this property all the time when we create functions that return other functions. This is discussed in more detail in chapter \@ref(an-introduction-to-functional-programming). In the above example, `outerFuncV2()`, the function that returned another function, is called a *function factory*.\index{functions!function factory}
::: {.rmd-details data-latex=""}
Sometimes people will refer to R's functions as **closures** to emphasize that they are capturing variables from the parent environment in which they were created, to emphasize the data that they are bundled with.\index{closures in R}
:::
## Function Scope in Python
Python uses **lexical scoping** just like R.\index{scope!function scope in Python} This means, in Python,
1. functions can use *local* variables that are defined inside themselves,
2. functions have an order of preference for which variable to prefer in the case of a name clash, and
3. functions can sometimes use variables defined outside itself, but that ability depends on where the function and variable were *defined*, not where the function was *called*.
Regarding characteristics (2) and (3), there is a famous acronym that describes the rules Python follows when finding and choosing variables: **LEGB**.\index{scope!LEGB in Python}
- L: Local,
- E: Enclosing,
- G: Global, and
- B: Built-in.
A Python function will search for a variable in these namespaces in this order.^[Functions aren't the only thing that get their own namespace. [Classes do, too](https://docs.python.org/3/tutorial/classes.html#a-first-look-at-classes). More information on classes is provided in Chapter \@ref(an-introduction-to-object-oriented-programming)].
"*Local*" refers to variables that are defined inside of the function's block. The function below uses the local `a` over the global one.
```{python, collapse = TRUE}
a = 3
def silly_function():
a = 22 # local a
print("local variables are ", locals())
return a + 20
silly_function()
silly_function.__code__.co_nlocals # number of local variables
silly_function.__code__.co_varnames # names of local variables
```
"*Enclosing*" refers to variables that were defined in the enclosing namespace, but not the global namespace. These variables are sometimes called **free variables.** In the example below, there is no local `a` variable for `inner_func()`, but there is a global one, and one in the enclosing namespace. `inner_func()` chooses the one in the enclosing namespace. Moreover, `inner_func()` has its own copy of `a` to use, even after `a` was initially destroyed upon the completion of the call to `outer_func()`.
```{python, collapse = TRUE}
a = "outside both"
def outer_func():
a = "inside one"
def inner_func():
print(a)
return inner_func
my_new_func = outer_func()
my_new_func()
my_new_func.__code__.co_freevars
```
"*Global*" scope contains variables defined in the module-level namespace. If the code in the below example was the entirety of your script, then `a` would be a global variable.
```{python, collapse = TRUE}
a = "outside both"
def outer_func():
b = "inside one"
def inner_func():
print(a)
inner_func()
outer_func()
```
Just like in R, Python functions **cannot** necessarily find variables where the function was *called*. For example, here is some code that mimics the above R example. Both `a` and `b` are accessible from within `inner_func()`. That is due to LEGB.
However, if we start using `outer_func()` inside another function, *calling* it in another function, when it was *defined* somewhere else, well then it doesn't have access to variables in the call site. You might be surprised at how the following code functions. Does this print the right string: `"this is the a I want to use now!"` No!
```{python}
a = "outside both"
def outer_func():
b = "inside one"
def inner_func():
print(a)
print(b)
return inner_func()
def third_func():
a = "this is the a I want to use now!"
outer_func()
third_func()
```
If you feel like you understand lexical scoping, great! You should be ready to take on chapter \@ref(an-introduction-to-functional-programming), then. If not, keep playing around with examples. Without understanding the scoping rules R and Python share, writing your own functions will persistently feel more difficult than it really is.
## Modifying a Function's Arguments
Can/should we modify a function's argument? The flexibility to do this sounds empowering; however, not doing it is recommended because it makes programs easier to reason about.
### Passing By Value In R
In R, it is *difficult* for a function to modify one of its argument.^[There are some exceptions to this, but it's generally true.] Consider the following code.
```{r, collapse=TRUE}
a <- 1
f <- function(arg){
arg <- 2 # modifying a temporary variable, not a
return(arg)
}
print(f(a))
print(a)
```
The function `f` has an argument called `arg`. When `f(a)` is performed, changes are made to a *copy* of `a`. When a function constructs a copy of all input variables inside its body, this is called **pass-by-value** semantics.\index{pass-by-value} This copy is a temporary intermediate value that only serves as a starting point for the function to produce a return value of `2`.
`arg` could have been called `a`, and the same behavior will take place. However, giving these two things different names is helpful to remind you and others that R copies its arguments.
It is still possible to modify `a`, but I don't recommend doing this either. I will discuss this more in subsection \@ref(modifying-a-functions-arguments).
### Passing By Assignment In Python
The story is more complicated in Python. Python functions have **pass-by-assignment** semantics.\index{pass-by-assignment} This is something that is very unique to Python. What this means is that your ability to modify the arguments of a function depends on
- what the type of the argument is, and
- what you're trying to do to it.
We will go throw some examples first, and then explain why this works the way it does. Here is some code that is analogous to the example above.
```{python, collapse=TRUE}
a = 1
def f(arg):
arg = 2
return arg
print(f(a))
print(a)
```
In this case, `a` is not modified. That is because `a` is an `int`. `int`s are **immutable** in Python, which means that their [value](https://docs.python.org/3/reference/datamodel.html#objects-values-and-types) cannot be changed after they are created, either inside or outside of the function's scope. However, consider the case when `a` is a `list`, which is a **mutable** type. A mutable type is one that can have its value changed after its created.\index{mutable in Python}
```{python, collapse=TRUE}
a = [999]
def f(arg):
arg[0] = 2
return arg
print(f(a))
print(a) # not [999] anymore!
```
In this case `a` *is* modified. Changing the value of the argument *inside* the function effects changes to that variable outside of the function.
Ready to be confused? Here is a tricky third example. What happens if we take in a list, but try to do something else with it.
```{python, collapse=TRUE}
a = [999]
def f(arg):
arg = [2]
return arg
print(f(a))
print(a) # didn't change this time :(
```
That time `a` did not permanently change in the global scope. Why does this happen? I thought `list`s were mutable!
The reason behind all of this doesn't even have anything to do with functions, per se. Rather, it has to do with how Python manages, [objects, values, and types](https://docs.python.org/3/reference/datamodel.html#objects-values-and-types). It also has to do with what happens during [assignment](https://docs.python.org/3/reference/executionmodel.html#naming-and-binding).
Let's revisit the above code, but bring everything out of a function. Python is pass-by-assignment, so all we have to do is understand how assignment works. Starting with the immutable `int` example, we have the following.
```{python, collapse=TRUE}
# old code:
# a = 1
# def f(arg):
# arg = 2
# return arg
a = 1 # still done in global scope
arg = a # arg is a name that is bound to the object a refers to
arg = 2 # arg is a name that is bound to the object 2
print(arg is a)
print(id(a), id(arg)) # different!`
print(a)
```
::: {.rmd-details data-latex=""}
The [`id()`](https://docs.python.org/3/library/functions.html#id) function returns the **identity** of an object, which is kind of like its memory address. Identities of objects are unique and constant. If two variables, `a` and `b` say, have the same identity, `a is b` will evaluate to `True`. Otherwise, it will evaluate to `False`.
:::
In the first line, the *name* `a` is bound to the *object* `1`. In the second line, the name `arg` is bound to the *object* that is referred to by the *name* `a`. After the second line finishes, `arg` and `a` are two names for the same object (a fact that you can confirm by inserting `arg is a` immediately after this line).
In the third line, `arg` is bound to `2`. The variable `arg` can be changed, but only by re-binding it with a separate object. Re-binding `arg` does not change the value referred to by `a` because `a` still refers to `1`, an object separate from `2`. There is no reason to re-bind `a` because it wasn't mentioned at all in the third line.
If we go back to the first function example, it's basically the same idea. The only difference, however, is that `arg` is in its own scope. Let's look at a simplified version of our second code chunk that uses a mutable list.
```{python, collapse=TRUE}
a = [999]
# old code:
# def f(arg):
# arg[0] = 2
# return arg
arg = a
arg[0] = 2
print(arg)
print(a)
print(arg is a)
```
In this example, when we run `arg = a`, the name `arg` is bound to the same object that is bound to `a`. This much is the same. The only difference here, though, is that because lists are mutable, changing the first element of `arg` is done "in place", and all variables can access the mutated object.
Why did the third example produce unexpected results? The difference is in the line `arg = [2]`. This rebinds the name `arg` to a different variable. `list`s are still mutable, but this has nothing to do with re-binding--re-binding a name works no matter what type of object you're binding it to. In this case we are re-binding `arg` to a completely different list.
## Accessing and Modifying Captured Variables
In the last section, we were talking about variables that were passed in as function arguments. Here we are talking about variables that are **captured**.\index{capturing variables} They are not passed in as variables, but they are still used inside a function. In general, even though it is possible to access and modify non-local captured variables in both languages, it is not a good idea.
### Accessing Captured Variables in R
As Hadley Wickham writes in [his book](https://adv-r.hadley.nz/functions.html#dynamic-lookup), "[l]exical scoping determines where, but not when to look for values." R has **dynamic lookup**,\index{dynamic lookup} meaning code inside a function will only try to access a referred-to variable when the function is *running*, not when it is defined.
Consider the R code below. The `dataReadyForModeling()` function is created in the global environment, and the global environment contains a Boolean variable called `dataAreClean`.
```{r, collapse=TRUE}
# R
dataAreClean <- TRUE
dataReadyForModeling <- function(){
return(dataAreClean)
}
dataAreClean <- FALSE
# readyToDoSecondPart() # what happens if we call it now?
```
Now imagine sharing some code with a collaborator. Imagine, further, that your collaborator is the subject-matter expert, and knows little about R programming. Suppose that he changes `dataAreClean`, a global variable in the script, after he is done . Shouldn't this induce a relatively trivial change to the overall program?
Let's explore this hypothetical further. Consider what could happen if any of the following (very typical) conditions are true:
- you or your collaborators aren't sure what `dataReadyForModeling()` will return because you don't understand dynamic lookup, or
- it's difficult to visually keep track of all assignments to `dataAreClean` (e.g. your script is quite long or it changes often), or
- you are not running code sequentially (e.g. you are repeatedly testing chunks at a time instead of clearing out your memory and `source()`ing from scratch, over and over again).
In each of these situations, understanding of the program would be compromised. However, if you follow the above principle of never referring to non-local variables in function code, all members of the group could do their own work separately, minimizing the dependence on one another.
Another reason violating this could be troublesome is if you define a function that refers to a nonexistent variable. *Defining* the function will never throw an error because R will assume that variable is defined in the global environment. *Calling* the function might throw an error, unless you accidentally defined the variable, or if you forgot to delete a variable whose name you no longer want to use. Defining `myFunc()` with the code below will not throw an error, even if you think it should!
```{r, collapse = TRUE}
# R
myFunc <- function(){
return(varigbleNameWithTypo) #varigble?
}
```
### Accessing Captured Variables in Python
It is the same exact situation in Python. Consider `everything_is_safe()`, a function that is analogous to `dataReadyForModeling()`.
```{python, collapse=TRUE}
# python
missile_launch_codes_set = True
def everything_is_safe():
return not missile_launch_codes_set
missile_launch_codes_set = False
everything_is_safe()
```
We can also define `my_func()`, which is analogous to `myFunc()`. Defining this function doesn't throw an error either.
```{python, collapse = TRUE}
# python
def my_func():
return varigble_name_with_typo
```
So stay away from referring to variables outside the body of your function!
### Modifying Captured Variables In R
Now what if we want to be extra bad, and in addition to *accessing* global variables, we *modify* them, too?
```{r, collapse=TRUE}
a <- 1
makeATwo <- function(arg){
arg <- 2
a <<- arg
}
print(makeATwo(a))
print(a)
```
In the program above, `makeATwo()` copies `a` into `arg`. It then assigns `2` to that copy. **Then it takes that `2` and writes it to the global `a` variable in the parent environment.** It does this using R's super assignment operator `<<-`\index{assignment operator!super assignment operator in R}. Regardless of the inputs passed in to this function, it will always assign exactly `2` to `a`, no matter what.
This is problematic because you are pre-occupying your mind with one function: `makeATwo()`. Whenever you write code that depends on `a` (or on things that depend on `a`, or on things that depended on things that depend on `a`, or ...), you'll have to repeatedly interrupt your train of thought to *try* and remember if what you're doing is going to be okay with the current and future `makeATwo()` call sites.
### Modifying Captured Variables In Python
There is something in Python that is similar to R's super assignment operator (`<<-`). It is the `global` keyword. This keyword will let you modify global variables from inside a function.
::: {.rmd-details data-latex=""}
The upside to the `global` keyword is that it makes hunting for **side effects**\index{side effects} relatively easy (A function's side effects are changes it makes to non-local variables). Yes, this keyword should be used sparingly, even more sparingly than merely referring to global variables, but if you are ever debugging, and you want to hunt down places where variables are surprisingly being changed, you can hit `Ctrl-F` and search for the phrase "global."
:::
```{python, collapse = TRUE}
a = 1
def increment_a():
global a
a += 1
[increment_a() for _ in range(10)]
print(a)
```
## Exercises
### R Questions
1.
Suppose you have a matrix $\mathbf{X} \in \mathbb{R}^{n \times p}$ and a column vector $\mathbf{y} \in \mathbb{R}^{n}$. To estimate the linear regression model
\begin{equation}
\mathbf{y} = \mathbf{X}\boldsymbol{\beta} + \epsilon,
\end{equation}
where $\boldsymbol{\beta} \in \mathbb{R}^p$ is a column vector of errors, you can use calculus instead of numerical optimization. The formula for the least squares estimate of $\boldsymbol{\beta}$ is
\begin{equation}
\hat{\boldsymbol{\beta}} = (\mathbf{X}^\intercal \mathbf{X})^{-1} \mathbf{X}^\intercal \mathbf{y}.
\end{equation}
Once this $p$-dimensional vector is found, you can also obtain the *predicted (or fitted) values*
\begin{equation}
\hat{\mathbf{y}} := \mathbf{X}\hat{\boldsymbol{\beta}},
\end{equation}
and the *residuals (or errors)*
\begin{equation}
\mathbf{y} - \hat{\mathbf{y}}
\end{equation}
Write a function called `getLinModEstimates()` that takes in two arguments in the following order:
+ the `vector` of response data $\mathbf{y}$
+ the `matrix` of predictors $\mathbf{X}$.
Have it return a named `list` with three outputs inside:
+ the coefficient estimates as a `vector`,
+ a `vector` of fitted values, and
+ a `vector` of residuals.
The three elements of the returned list should have the names `coefficients`, `fitVals`, and `residuals`.
2.
Write a function called `monteCarlo` that
+ takes as an input a function `sim(n)` that simulates `n` scalar variables,
+ takes as an input a function that evaluates $f(x)$ on each random variable sample and that ideally takes in all of the random variables as a `vector`, and
+ returns a function that takes one integer-valued argument (`num_sims`) and outputs a length one `vector`.
Assume `sim(n)` only has one argument: `n`, which is the number of simulations desired. `sim(n)`'s output should be a length `n` `vector`.
The output of this returned function should be a Monte Carlo estimate of the expectation: $\mathbb{E}[f(X)] \approx \frac{1}{n}\sum_{i=1}^n f(X^i)$.
3.
Write a function called `myDFT()` that computes the **Discrete Fourier Transform** of a `vector` and returns another `vector`. Feel free to check your work against `spec.pgram()`, `fft()`, or `astsa::mvspec()`, but do not include calls to those functions in your submission. Also, you should be aware that different functions transform and scale the answer differently, so be sure to read the documentation of any function you use to test against.
Given data $x_1,x_2,\ldots,x_n$, $i = \sqrt{-1}$, and the **Fourier/fundamental frequencies** $\omega_j= j/n$ for $j=0,1,\ldots,n-1$, we define the discrete Fourier transform (DFT) as:
\begin{equation} \label{eq:DFT}
d(\omega_j)= n^{-1/2} \sum_{t=1}^n x_t e^{-2 \pi i \omega_j t}
\end{equation}
### Python Questions
1.
Estimating statistical models often involves some form of optimization, and often times, optimization is performed numerically. One of the most famous optimization algorithms is **Newton's method**.
Suppose you have a function $f(x)$ that takes a scalar-valued input and returns a scalar as well. Also, suppose you have the function's derivative $f'(x)$, its second derivative $f''(x)$, and a starting point guess for what the minimizing input of $f(x)$ is: $x_0$.
The algorithm repeatedly applies the following recursion:
\begin{equation}
x_{n+1} = x_{n} - \frac{f'(x_n)}{f''(x_{n})}.
\end{equation}
Under appropriate regularity conditions for $f$, after many iterations of the above recursion, when $\tilde{n}$ is very large, $x_{\tilde{n}}$ will be nearly the same as $x_{\tilde{n}-1}$, and $x_{\tilde{n}}$ is pretty close to $\text{argmin}_x f(x)$. In other words, $x_{\tilde{n}}$ is the minimizer of $f$, and a root of $f'$.
a) Write a function called `f` that takes a `float` `x` and returns $(x-42)^2 - 33$.
b) Write a function called `f_prime` that takes a `float` and returns the derivative of the above.
c) Write a function called `f_dub_prime` that takes a `float` and returns an evaluation of the second derivative of $f$.
d) Theoretically, what is the minimizer of $f$? Assign your answer to the variable `best_x`.
e) Write a function called `minimize()` that takes three arguments, and performs **ten iterations** of Newton's algorithm, after which it returns $x_{10}$. Don't be afraid of copy/pasting ten or so lines of code. We haven't learned loops yet, so that's fine. The ordered arguments are:
* the function that evaluates the derivative of the function you're interested in,
* the function that evaluates the second derivative of your objective function,
* an initial guess of the minimizer.
f) Test your function by plugging in the above functions, and use a starting point of $10$. Assign the output to a variable called `x_ten`.
2.
Write a function called `smw_inverse(A,U,C,V)` that returns the inverse of a matrix using the **Sherman-Morrison-Woodbury formula** [@woodbury]. Have it take the arguments $A$, $U$, $C$, and $V$ in that order and as Numpy `ndarray`s. Assume that `A` is a diagonal matrix.
\begin{equation}
(A + UCV)^{-1} = A^{-1} - A^{-1}U(C^{-1} + VA^{-1}U)^{-1}V A^{-1}
\end{equation}
Despite being difficult to remember, this formula can be quite handy for speeding up matrix inversions when $A$ and $C$ are easier to invert (e.g. if $A$ is diagonal and $C$ is a scalar). The formula often shows up a lot in applications where you multiply matrices together (there are many such examples).
To check your work, pick certain inputs, and make sure your formula corresponds with the naive, left-hand-side approach.