06-functions.Rmd

# Functions

This text has already covered how to *use* functions that come to us pre-made.\index{functions} At least we have discussed how to use them in a one-off way--just write the name of the function, write some parentheses after that name, and then plug in any requisite arguments by writing them in a comma-separated way between those two parentheses. This is how it works in both R and Python. 

In this section we take a look at how to *define* our own functions. This will not only help us to understand pre-made functions, but it will also be useful if we need some extra functionality that isn't already provided to us. 

Writing our own functions is also useful for "packaging up" computations. The utility of this will become apparent very soon. Consider the task of estimating a regression model. If you have a function that performs all of the required calculations, then

  * you can estimate models without having to think about lower-level details or write any code yourself, and
  * you can re-use this function every time you fit any model on any data set for any project.


## Defining R Functions

To create a function in R, we need another function called `function()`.\index{functions!creating functions in R} We give the output of `function()` a name in the same way we give names to any other variable in R, by using the assignment operator `<-` \index{assignment operator!assignment operator in R}. Here's an example of a toy function called `addOne()`. Here `myInput` is a placeholder that refers to whatever the user of the function ends up plugging in. 

```{r, collapse = TRUE}
addOne <- function(myInput){  # define the function
  myOutput <- myInput + 1
  return(myOutput)
}
addOne(41) # call/invoke/use the function 
```

Below the definition, the function is called with an input of `41`. When this happens, the following sequence of events occurs

- The value `41` is assigned to `myInput`
- `myOutput` is given the value `42`
- `myOutput`, which is `42`, is returned from the function
- the temporary variables `myInput` and `myOutput` are destroyed. 

We get the desired answer, and all the unnecessary intermediate variables are cleaned up and thrown away after they are no longer needed. 


::: {.rmd-caution data-latex=""}
If you are interested in writing a function, I recommend that you first write the logic outside of a function. This initial code will be easier to debug because your temporary variables will not be destroyed after the final result has been obtained. Once you are happy with the working code, you can copy and paste the logic into a function definition, and replace permanent variables with function inputs like `myInput` above. 
:::


## Defining Python Functions

To create a function in Python, we use the `def` statement (instead of the `function()` function in R).\index{functions!creating functions in Python} The desired name of the function comes next. After that, the formal parameters come, comma-separated inside parentheses, just like in R. 

Defining a function in Python is a little more concise. There is no assignment operator like there is in R, there are no curly braces, and `return` isn't a function like it is in R, so there is no need to use parentheses after it. There is one syntactic addition, though--we need a colon (`:`) at the end of the first line of the definition. 

Here is an example of a toy function called `add_one()`.

```{python, collapse = TRUE}
def add_one(my_input):  # define the function
    my_output = my_input + 1
    return my_output
add_one(41) # call/invoke/use the function 
```

Below the definition, the function is called with an input of `41`. When this happens, the following sequence of events occurs

- The value `41` is assigned to `my_input`
- `my_output` is given the value `42`
- `my_output`, which is `42`, is returned from the function
- the temporary variables `my_input` and `my_output` are destroyed. 

We get the desired answer, and all the unnecessary intermediate variables are cleaned up and thrown away after they are no longer needed. 

## More Details On R's User-Defined Functions

Technically, in R, functions are [defined as three things bundled together](https://cran.r-project.org/doc/manuals/r-release/R-lang.html#Function-objects): 

 1. a **formal argument list** (also known as *formals*), 
 2. a **body**, and 
 3. a **parent environment**.

The *formal argument list* is exactly what it sounds like. It is the list of arguments a function takes. You can access a function's formal argument list using the `formals()` function. Note that it is not the *actual* arguments a user will plug in--that isn't knowable at the time the function is created in the first place.

Here is another function that takes a parameter called `whichNumber` that comes with a **default argument** of `1`. If the caller of the function does not specify what she wants to add to `myInput`, `addNumber()` will use `1` as the default. This default value shows up in the output of `formals(addNumber)`.

```{r, collapse = TRUE}
addNumber <- function(myInput, whichNumber = 1){  
  myOutput <- myInput + whichNumber
  return(myOutput)
}
addNumber(3) # no second argument being provided by the user here
formals(addNumber)
```

The function's *body* is also exactly what it sounds like. It is the work that a function performs. You can access a function's body using the the `body()` function. 

```{r, collapse = TRUE}
addNumber <- function(myInput, whichNumber = 1){  
  myOutput <- myInput + whichNumber
  return(myOutput)
}
body(addNumber)
```

Every function you create also has a *parent environment*^[Primitive functions are functions that contain no R code and are internally implemented in C. These are the only type of function in R that don't have a parent environment.]. You can get/set this using the `environment()` function. Environments help a function know which variables it is allowed to use and how to use them. The parent environment of a function is where the function was *created*, and it contains variables outside of the body that the function can also use. The rules of which variables a function can use are called *scoping*. When you create functions in R, you are primarily using **lexical scoping**. This is discussed in more detail in section \@ref(function-scope-in-r).

::: {.rmd-details data-latex=""}
There is a lot more information about environments that isn't provided in this text. For instance, a user-defined function also has [binding, execution, and calling environments associated with it](http://adv-r.had.co.nz/Environments.html#function-envs), and environments are used in creating package namespaces, which are important when two packages each have a function with the same name.\index{environments in R}
:::


## More details on Python's user-defined functions


Roughly, Python functions have the same things R functions have. They have a **formal parameter list**, a body, and there are [namespaces](https://docs.python.org/3/tutorial/classes.html#python-scopes-and-namespaces) created that help organize which variables the function can access, as well as which pieces of code can call this new function. A namespace is just a "mapping from names to objects." 

These three concepts are analogous to those in R. The names are just a bit different sometimes, and it isn't organized in the same way. To access these bits of information, you need to access the *special attributes* of a function. User-defined functions in Python have a lot of pieces of information attached to them. If you'd like to see all of them, you can visit [this page of documentation](https://docs.python.org/3/reference/datamodel.html#objects-values-and-types).


<!-- Below is a table, taken straight from [the documentation](https://docs.python.org/3/reference/datamodel.html#objects-values-and-types), of all each user-defined function's .  -->

<!-- | Attribute |	Meaning | -->
<!-- |-----------|-------------------| -->
<!-- `__doc__`	| The function’s documentation string, or `None` if unavailable; not inherited by subclasses. -->
<!-- `__name__`	| The function’s name.	 -->
<!-- `__qualname__`	| The function’s qualified name.	 -->
<!-- `__module__` |	The name of the module the function was defined in, or None if unavailable. -->
<!-- `__defaults__`	| A tuple containing default argument values for those arguments that have defaults, or None if no arguments have a default value. -->
<!-- `__code__` |	The code object representing the compiled function body. -->
<!-- `__globals__`	| A reference to the dictionary that holds the function’s global variables — the global namespace of the module in which the function was defined. -->
<!-- `__dict__` |	The namespace supporting arbitrary function attributes. -->
<!-- `__closure__` | `None` or a tuple of cells that contain bindings for the function’s free variables. See below for information on the `cell_contents` attribute. -->
<!-- `__annotations__` | A dict containing annotations of parameters. The keys of the dict are the parameter names, and 'return' for the return annotation, if provided. -->
<!-- `__kwdefaults__` | A dict containing defaults for keyword-only parameters. -->


So, for instance, let's try to find the  *formal parameter list* of a user-defined function below. This is, again, the collection of inputs a function takes. Just like in R, this is not the *actual* arguments a user will plug in--that isn't knowable at the time the function is created.^[You might have noticed that Python uses two different words to prevent confusion. Unlike R, Python uses the word "parameter" (instead of "argument") to refer to the inputs a function takes, and "arguments" to the specific values a user plugs in.] Here we have another function called `add_number()` that takes a parameter `which_number` that is accompanied by a default argument of `1`. 

```{python, collapse = TRUE}
def add_number(my_input, which_number = 1): # define a function
    my_output = my_input + which_number
    return my_output
add_number(3) # no second argument being provided by the user here
add_number.__code__.co_varnames # note this also contains *my_output*
add_number.__defaults__
```

The `__code__` attribute has much more to offer. To see a list of names of all its contents, you can use `dir(add_number.__code__)`.

::: {.rmd-details data-latex=""}
Don't worry if the notation `add_number.__code__` looks strange. The dot (`.`) operator will become more clear in the future chapter on *object-oriented programming*. For now, just think of `__code__` as being an object *belonging to* `add_number`. Objects that belong to other objects are called **attributes** in Python. The dot operator helps us access attributes *inside* other objects. It also helps us access objects belonging to modules that we `import` into our scripts.
:::


## Function Scope in R

R uses **lexical scoping**. This means, in R, \index{scope!function scope in R}

  1. functions can use *local* variables that are defined inside themselves,  
  2. functions can use *global* variables defined in the environment where the function itself was *defined* in, and
  3. functions *cannot* necessarily use *global* variables defined in the environment where the function was *called* in, and
  4. functions will prefer *local* variables to *global* variables if there is a name clash. 
  
The first characteristic is obvious. The second and third are import to distinguish between. Consider the following code below. `sillyFunction()` can access `a` because `sillyFunction()` and `a` are defined in the same place. 

```{r, collapse = TRUE}
a <- 3
sillyFunction <- function(){
  return(a + 20) 
}
environment(sillyFunction) # the env. it was defined in contains a
sillyFunction()
```

On the other hand, the following example will not work because `a` and `anotherSillyFunc()` are not defined in the same place. Calling the function is not the same as defining a function.

```{r, collapse = TRUE}
anotherSillyFunc <- function(){
  return(a + 20) 
}
highLevelFunc <- function(){
  a <- 99
  # this isn't the global environment anotherSillyFunc() was defined in
  cat("environment inside highLevelFunc(): ", environment())
  anotherSillyFunc()
}
```

Finally, here is a demonstration of a function preferring one `a` over another. When `sillyFunction()` attempts to access `a`, it first looks in its own body, and so the innermost one gets used. On the other hand, `print(a)` shows `3`, the global variable.

```{r, collapse = TRUE}
a <- 3
sillyFunction <- function(){
  a <- 20
  return(a + 20) 
}
sillyFunction()
print(a)
```

The same concept applies if you create functions within functions. The inner function `innerFunc()` looks "inside-out" for variables, but only in the place it was defined. 

Below we call `outerFunc()`, which then calls `innerFunc()`. `innerFunc()` can refer to the variable `b`, because it lies in the same environment in which `innerFunc()` was created. Interestingly, `innerFunc()` can also refer to the variable `a`, because that variable was captured by `outerFunc()`, which provides access to `innerFunc()`. 

```{r, collapse = TRUE}
a <- "outside both"
outerFunc <- function(){
  b <- "inside one"
  innerFunc <- function(){
    print(a) 
    print(b)
  }
  return(innerFunc())
}
outerFunc()
```


Here's another interesting example. If we ask `outerFunc()` to return the function `innerFunc()` (not the return object of `innerFunct()`...functions are objects, too!), then we might be surprised to see that `innerFunc()` can still successfully refer to `b`, even though it doesn't exist inside the *calling environment.* But don't be surprised! What matters is what was available when the function was *created*. 


```{r, collapse = TRUE}
outerFuncV2 <- function(){
  b <- "inside one"
  innerFunc <- function(){
    print(b)
  }
  return(innerFunc) # note the missing inner parentheses!
}
myFunc <- outerFuncV2() # get a new function
ls(environment(myFunc)) # list all data attached to this function
myFunc()
```

We use this property all the time when we create functions that return other functions. This is discussed in more detail in chapter \@ref(an-introduction-to-functional-programming). In the above example, `outerFuncV2()`, the function that returned another function, is called a *function factory*.\index{functions!function factory}


::: {.rmd-details data-latex=""}
Sometimes people will refer to R's functions as **closures** to emphasize that they are capturing variables from the parent environment in which they were created, to emphasize the data that they are bundled with.\index{closures in R} 
:::


## Function Scope in Python

Python uses **lexical scoping** just like R.\index{scope!function scope in Python} This means, in Python, 

  1. functions can use *local* variables that are defined inside themselves,  
  2. functions have an order of preference for which variable to prefer in the case of a name clash, and 
  3. functions can sometimes use variables defined outside itself, but that ability depends on where the function and variable were *defined*, not where the function was *called*.

Regarding characteristics (2) and (3), there is a famous acronym that describes the rules Python follows when finding and choosing variables: **LEGB**.\index{scope!LEGB in Python}

- L: Local, 
- E: Enclosing, 
- G: Global, and 
- B: Built-in.

A Python function will search for a variable in these namespaces in this order.^[Functions aren't the only thing that get their own namespace. [Classes do, too](https://docs.python.org/3/tutorial/classes.html#a-first-look-at-classes). More information on classes is provided in Chapter \@ref(an-introduction-to-object-oriented-programming)]. 

"*Local*" refers to variables that are defined inside of the function's block. The function below uses the local `a` over the global one. 

```{python, collapse = TRUE}
a = 3
def silly_function():
    a = 22 # local a
    print("local variables are ", locals())
    return a + 20
silly_function()
silly_function.__code__.co_nlocals # number of local variables
silly_function.__code__.co_varnames # names of local variables
```

"*Enclosing*" refers to variables that were defined in the enclosing namespace, but not the global namespace. These variables are sometimes called **free variables.** In the example below, there is no local `a` variable for `inner_func()`, but there is a global one, and one in the enclosing namespace. `inner_func()` chooses the one in the enclosing namespace. Moreover, `inner_func()` has its own copy of `a` to use, even after `a` was initially destroyed upon the completion of the call to `outer_func()`. 

```{python, collapse = TRUE}
a = "outside both"
def outer_func():
    a = "inside one"
    def inner_func():
        print(a)
    return inner_func
my_new_func = outer_func()
my_new_func()
my_new_func.__code__.co_freevars
```

"*Global*" scope contains variables defined in the module-level namespace. If the code in the below example was the entirety of your script, then `a` would be a global variable.

```{python, collapse = TRUE}
a = "outside both"
def outer_func():
    b = "inside one"
    def inner_func():
        print(a) 
    inner_func()
outer_func()
```


Just like in R, Python functions **cannot** necessarily find variables where the function was *called*. For example, here is some code that mimics the above R example. Both `a` and `b` are accessible from within `inner_func()`. That is due to LEGB.

However, if we start using `outer_func()` inside another function, *calling* it in another function, when it was *defined* somewhere else, well then it doesn't have access to variables in the call site. You might be surprised at how the following code functions. Does this print the right string: `"this is the a I want to use now!"` No!


```{python}
a = "outside both"
def outer_func():
    b = "inside one"
    def inner_func():
        print(a) 
        print(b)
    return inner_func() 
def third_func():
    a = "this is the a I want to use now!"
    outer_func()
third_func() 
```


If you feel like you understand lexical scoping, great! You should be ready to take on chapter \@ref(an-introduction-to-functional-programming), then. If not, keep playing around with examples. Without understanding the scoping rules R and Python share, writing your own functions will persistently feel more difficult than it really is. 


## Modifying a Function's Arguments

Can/should we modify a function's argument? The flexibility to do this sounds empowering; however, not doing it is recommended because it makes programs easier to reason about. 

### Passing By Value In R

In R, it is *difficult* for a function to modify one of its argument.^[There are some exceptions to this, but it's generally true.] Consider the following code.

```{r, collapse=TRUE}
a <- 1
f <- function(arg){
  arg <- 2 # modifying a temporary variable, not a
  return(arg)
}
print(f(a))
print(a)
```

The function `f` has an argument called `arg`. When `f(a)` is performed, changes are made to a *copy* of `a`. When a function constructs a copy of all input variables inside its body, this is called **pass-by-value** semantics.\index{pass-by-value} This copy is a temporary intermediate value that only serves as a starting point for the function to produce a return value of `2`.

`arg` could have been called `a`, and the same behavior will take place. However, giving these two things different names is helpful to remind you and others that R copies its arguments.

It is still possible to modify `a`, but I don't recommend doing this either. I will discuss this more in subsection \@ref(modifying-a-functions-arguments).


### Passing By Assignment In Python

The story is more complicated in Python. Python functions have **pass-by-assignment** semantics.\index{pass-by-assignment} This is something that is very unique to Python. What this means is that your ability to modify the arguments of a function depends on 

- what the type of the argument is, and
- what you're trying to do to it. 

We will go throw some examples first, and then explain why this works the way it does. Here is some code that is analogous to the example above. 

```{python, collapse=TRUE}
a = 1
def f(arg):
    arg = 2
    return arg
print(f(a))
print(a)
```

In this case, `a` is not modified. That is because `a` is an `int`. `int`s are **immutable** in Python, which means that their [value](https://docs.python.org/3/reference/datamodel.html#objects-values-and-types) cannot be changed after they are created, either inside or outside of the function's scope. However, consider the case when `a` is a `list`, which is a **mutable** type. A mutable type is one that can have its value changed after its created.\index{mutable in Python}

```{python, collapse=TRUE}
a = [999]
def f(arg):
    arg[0] = 2
    return arg

print(f(a))
print(a) # not [999] anymore!
```

In this case `a` *is* modified. Changing the value of the argument *inside* the function effects changes to that variable outside of the function. 

Ready to be confused? Here is a tricky third example. What happens if we take in a list, but try to do something else with it. 

```{python, collapse=TRUE}
a = [999]
def f(arg):
    arg = [2]
    return arg

print(f(a))
print(a) # didn't change this time :(
```

That time `a` did not permanently change in the global scope. Why does this happen? I thought `list`s were mutable!

The reason behind all of this doesn't even have anything to do with functions, per se. Rather, it has to do with how Python manages, [objects, values, and types](https://docs.python.org/3/reference/datamodel.html#objects-values-and-types). It also has to do with what happens during [assignment](https://docs.python.org/3/reference/executionmodel.html#naming-and-binding).

Let's revisit the above code, but bring everything out of a function. Python is pass-by-assignment, so all we have to do is understand how assignment works. Starting with the immutable `int` example, we have the following.

```{python, collapse=TRUE}
# old code: 
# a = 1
# def f(arg):
#     arg = 2
#     return arg
a = 1    # still done in global scope
arg = a  # arg is a name that is bound to the object a refers to
arg = 2  # arg is a name that is bound to the object 2
print(arg is a)
print(id(a), id(arg)) # different!`
print(a)
```

::: {.rmd-details data-latex=""}
The [`id()`](https://docs.python.org/3/library/functions.html#id) function returns the **identity** of an object, which is kind of like its memory address. Identities of objects are unique and constant. If two variables, `a` and `b` say, have the same identity, `a is b` will evaluate to `True`. Otherwise, it will evaluate to `False`.
:::

In the first line, the *name* `a` is bound to the *object* `1`. In the second line, the name `arg` is bound to the *object* that is referred to by the *name* `a`. After the second line finishes, `arg` and `a` are two names for the same object (a fact that you can confirm by inserting `arg is a` immediately after this line). 

In the third line, `arg` is bound to `2`. The variable `arg` can be changed, but only by re-binding it with a separate object. Re-binding `arg` does not change the value referred to by `a` because `a` still refers to `1`, an object separate from `2`. There is no reason to re-bind `a` because it wasn't mentioned at all in the third line. 

If we go back to the first function example, it's basically the same idea. The only difference, however, is that `arg` is in its own scope. Let's look at a simplified version of our second code chunk that uses a mutable list.

```{python, collapse=TRUE}
a = [999]
# old code:
# def f(arg):
#     arg[0] = 2
#     return arg
arg = a
arg[0] = 2
print(arg)
print(a)
print(arg is a)
```

In this example, when we run `arg = a`, the name `arg` is bound to the same object that is bound to `a`. This much is the same. The only difference here, though, is that because lists are mutable, changing the first element of `arg` is done "in place", and all variables can access the mutated object.

Why did the third example produce unexpected results? The difference is in the line `arg = [2]`. This rebinds the name `arg` to a different variable. `list`s are still mutable, but this has nothing to do with re-binding--re-binding a name works no matter what type of object you're binding it to. In this case we are re-binding `arg` to a completely different list.  


## Accessing and Modifying Captured Variables

In the last section, we were talking about variables that were passed in as function arguments. Here we are talking about variables that are **captured**.\index{capturing variables} They are not passed in as variables, but they are still used inside a function. In general, even though it is possible to access and modify non-local captured variables in both languages, it is not a good idea. 

### Accessing Captured Variables in R

As Hadley Wickham writes in [his book](https://adv-r.hadley.nz/functions.html#dynamic-lookup), "[l]exical scoping determines where, but not when to look for values." R has **dynamic lookup**,\index{dynamic lookup} meaning code inside a function will only try to access a referred-to variable when the function is *running*, not when it is defined.

Consider the R code below. The `dataReadyForModeling()` function is created in the global environment, and the global environment contains a Boolean variable called `dataAreClean`. 

```{r, collapse=TRUE}
# R
dataAreClean <- TRUE
dataReadyForModeling <- function(){
  return(dataAreClean)
}
dataAreClean <- FALSE
# readyToDoSecondPart() # what happens if we call it now?
```


Now imagine sharing some code with a collaborator. Imagine, further, that your collaborator is the subject-matter expert, and knows little about R programming. Suppose that he changes `dataAreClean`, a global variable in the script, after he is done . Shouldn't this induce a relatively trivial change to the overall program? 

Let's explore this hypothetical further. Consider what could happen if any of the following (very typical) conditions are true:

- you or your collaborators aren't sure what `dataReadyForModeling()` will return because you don't understand dynamic lookup, or 
- it's difficult to visually keep track of all assignments to `dataAreClean` (e.g. your script is quite long or it changes often), or
- you are not running code sequentially (e.g. you are repeatedly testing chunks at a time instead of clearing out your memory and `source()`ing from scratch, over and over again).

In each of these situations, understanding of the program would be compromised. However, if you follow the above principle of never referring to non-local variables in function code, all members of the group could do their own work separately, minimizing the dependence on one another. 

Another reason violating this could be troublesome is if you define a function that refers to a nonexistent variable. *Defining* the function will never throw an error because R will assume that variable is defined in the global environment. *Calling* the function might throw an error, unless you accidentally defined the variable, or if you forgot to delete a variable whose name you no longer want to use. Defining `myFunc()` with the code below will not throw an error, even if you think it should!


```{r, collapse = TRUE}
# R
myFunc <- function(){
  return(varigbleNameWithTypo) #varigble?
}
```

### Accessing Captured Variables in Python

It is the same exact situation in Python. Consider `everything_is_safe()`, a function that is analogous to `dataReadyForModeling()`.

```{python, collapse=TRUE}
# python
missile_launch_codes_set = True
def everything_is_safe():
    return not missile_launch_codes_set

missile_launch_codes_set = False
everything_is_safe()
```

We can also define `my_func()`, which is analogous to `myFunc()`. Defining this function doesn't throw an error either.

```{python, collapse = TRUE}
# python
def my_func():
    return varigble_name_with_typo
```

So stay away from referring to variables outside the body of your function! 


### Modifying Captured Variables In R

Now what if we want to be extra bad, and in addition to *accessing* global variables, we *modify* them, too?

```{r, collapse=TRUE}
a <- 1
makeATwo <- function(arg){
  arg <- 2
  a <<- arg
}
print(makeATwo(a))
print(a)
```

In the program above, `makeATwo()` copies `a` into `arg`. It then assigns `2` to that copy. **Then it takes that `2` and writes it to the global `a` variable in the parent environment.** It does this using R's super assignment operator `<<-`\index{assignment operator!super assignment operator in R}. Regardless of the inputs passed in to this function, it will always assign exactly `2` to `a`, no matter what. 

This is problematic because you are pre-occupying your mind with one function: `makeATwo()`. Whenever you write code that depends on `a` (or on things that depend on `a`, or on things that depended on things that depend on `a`, or ...), you'll have to repeatedly interrupt your train of thought to *try* and remember if what you're doing is going to be okay with the current and future `makeATwo()` call sites. 

### Modifying Captured Variables In Python


There is something in Python that is similar to R's super assignment operator (`<<-`). It is the `global` keyword. This keyword will let you modify global variables from inside a function. 

::: {.rmd-details data-latex=""}
The upside to the `global` keyword is that it makes hunting for **side effects**\index{side effects} relatively easy (A function's side effects are changes it makes to non-local variables). Yes, this keyword should be used sparingly, even more sparingly than merely referring to global variables, but if you are ever debugging, and you want to hunt down places where variables are surprisingly being changed, you can hit `Ctrl-F` and search for the phrase "global."
:::

```{python, collapse = TRUE}
a = 1
def increment_a():
    global a
    a += 1
[increment_a() for _ in range(10)]
print(a)
```


## Exercises

### R Questions

1. 

Suppose you have a matrix $\mathbf{X} \in \mathbb{R}^{n \times p}$ and a column vector $\mathbf{y} \in \mathbb{R}^{n}$. To estimate the linear regression model 
\begin{equation} 
\mathbf{y} = \mathbf{X}\boldsymbol{\beta} + \epsilon,
\end{equation}
where $\boldsymbol{\beta} \in \mathbb{R}^p$ is a column vector of errors, you can use calculus instead of numerical optimization. The formula for the least squares estimate of $\boldsymbol{\beta}$ is 
\begin{equation} 
\hat{\boldsymbol{\beta}} = (\mathbf{X}^\intercal \mathbf{X})^{-1} \mathbf{X}^\intercal \mathbf{y}.
\end{equation}

Once this $p$-dimensional vector is found, you can also obtain the *predicted (or fitted) values*

\begin{equation} 
\hat{\mathbf{y}} := \mathbf{X}\hat{\boldsymbol{\beta}},
\end{equation}
and the *residuals (or errors)*

\begin{equation} 
\mathbf{y} - \hat{\mathbf{y}}
\end{equation}

Write a function called `getLinModEstimates()` that takes in two arguments in the following order:

  + the `vector` of response data $\mathbf{y}$
  + the `matrix` of predictors $\mathbf{X}$.

Have it return a named `list` with three outputs inside:
  
  + the coefficient estimates as a `vector`, 
  + a `vector` of fitted values, and
  + a `vector` of residuals. 

The three elements of the returned list should have the names `coefficients`, `fitVals`, and `residuals`.


2. 

Write a function called `monteCarlo` that

  + takes as an input a function `sim(n)` that simulates `n` scalar variables,   
  + takes as an input a function that evaluates $f(x)$ on each random variable sample and that ideally takes in all of the random variables as a `vector`, and
  + returns a function that takes one integer-valued argument (`num_sims`) and outputs a length one `vector`. 

Assume `sim(n)` only has one argument: `n`, which is the number of simulations desired. `sim(n)`'s output should be a length `n` `vector`. 

The output of this returned function should be a Monte Carlo estimate of the expectation: $\mathbb{E}[f(X)] \approx \frac{1}{n}\sum_{i=1}^n f(X^i)$.
  

3. 

Write a function called `myDFT()` that computes the **Discrete Fourier Transform** of a `vector` and returns another `vector`. Feel free to check your work against `spec.pgram()`, `fft()`, or `astsa::mvspec()`, but do not include calls to those functions in your submission. Also, you should be aware that different functions transform and scale the answer differently, so be sure to read the documentation of any function you use to test against.

Given data $x_1,x_2,\ldots,x_n$, $i = \sqrt{-1}$, and the **Fourier/fundamental frequencies** $\omega_j= j/n$ for $j=0,1,\ldots,n-1$, we define the discrete Fourier transform (DFT) as:

\begin{equation} \label{eq:DFT}
d(\omega_j)= n^{-1/2} \sum_{t=1}^n x_t e^{-2 \pi i \omega_j t}
\end{equation}


### Python Questions

1. 

Estimating statistical models often involves some form of optimization, and often times, optimization is performed numerically. One of the most famous optimization algorithms is **Newton's method**. 

Suppose you have a function $f(x)$ that takes a scalar-valued input and returns a scalar as well. Also, suppose you have the function's derivative $f'(x)$, its second derivative $f''(x)$,  and a starting point guess for what the minimizing input of $f(x)$ is: $x_0$.

The algorithm repeatedly applies the following recursion:

\begin{equation} 
x_{n+1} = x_{n} - \frac{f'(x_n)}{f''(x_{n})}.
\end{equation}
Under appropriate regularity conditions for $f$, after many iterations of the above recursion, when $\tilde{n}$ is very large, $x_{\tilde{n}}$ will be nearly the same as $x_{\tilde{n}-1}$, and $x_{\tilde{n}}$ is pretty close to  $\text{argmin}_x f(x)$. In other words, $x_{\tilde{n}}$ is the minimizer of $f$, and a root of $f'$.


  a) Write a function called `f` that takes a `float` `x` and returns $(x-42)^2 - 33$. 
  b) Write a function called `f_prime` that takes a `float` and returns the derivative of the above.
  c) Write a function called `f_dub_prime` that takes a `float` and returns an evaluation of the second derivative of $f$.
  d) Theoretically, what is the minimizer of $f$? Assign your answer to the variable `best_x`.
  e) Write a function called `minimize()` that takes three arguments, and performs **ten iterations** of Newton's algorithm, after which it returns $x_{10}$. Don't be afraid of copy/pasting ten or so lines of code. We haven't learned loops yet, so that's fine. The ordered arguments are:
      * the function that evaluates the derivative of the function you're interested in,
      * the function that evaluates the second derivative of your objective function,
      * an initial guess of the minimizer.
  f) Test your function by plugging in the above functions, and use a starting point of $10$. Assign the output to a variable called `x_ten`. 
  
  
2.

Write a function called `smw_inverse(A,U,C,V)` that returns the inverse of a matrix using the **Sherman-Morrison-Woodbury formula** [@woodbury]. Have it take the arguments $A$, $U$, $C$, and $V$ in that order and as Numpy `ndarray`s. Assume that `A` is a diagonal matrix.

\begin{equation} 
(A + UCV)^{-1} = A^{-1} - A^{-1}U(C^{-1} + VA^{-1}U)^{-1}V A^{-1}
\end{equation}
Despite being difficult to remember, this formula can be quite handy for speeding up matrix inversions when $A$ and $C$ are easier to invert (e.g. if $A$ is diagonal and $C$ is a scalar). The formula often shows up a lot in applications where you multiply matrices together (there are many such examples).

To check your work, pick certain inputs, and make sure your formula corresponds with the naive, left-hand-side approach.