generated from rstudio/bookdown-demo
-
Notifications
You must be signed in to change notification settings - Fork 16
/
05-lists_and_dictionaries.Rmd
149 lines (94 loc) · 7.34 KB
/
05-lists_and_dictionaries.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
# R's `list`s Versus Python's `list`s and `dict`s
When you need to store elements in a container, but you can't guarantee that these elements all have the same type, or you can't guarantee that they all have the same size, then you need a `list` in R. In Python, you might need a `list` or `dict` \index{dicts} (short for dictionary) [@Lutz13].
## `list`s In R
`list`s are one of the most flexible data types in R.\index{lists!lists in R} You can access individual elements in many different ways, each element can be of different size, and each element can be of a different type.
```{r, collapse = TRUE}
myList <- list(c(1,2,3), "May 5th, 2021", c(TRUE, TRUE, FALSE))
myList[1] # length-1 list; first element is length 3 vector
myList[[1]] # length-3 vector
```
If you want to extract an element, you need to decide between using single square brackets or double square brackets. The former returns a `list`, while the second returns the type of the individual element.
You can also name the elements of a list. This can lead to more readable code. To see why, examine the example below that makes use of spme data about cars [@sas_cars]. The `lm()` function estimates a linear regression model. It returns a `list` with plenty of components.
```{r, collapse = TRUE}
dataSet <- read.csv("data/cars.csv")
results <- lm(log(Horsepower) ~ Type, data = dataSet)
length(results)
# names(results) # try this <-
results$contrasts
results['rank']
```
::: {.rmd-caution data-latex=""}
`results` is a `list` (because `is.list(results)` returns `TRUE`), but to be more specific, it is an S3 object of class `lm`. If you do not know what this means, do not worry! S3 classes are discussed more in a later chapter. Why is this important? For one, I mention it so that you aren't confused if you type `class(results)` and see `lm` instead of `list`. Second, the fact that the authors of `lm()` wrote code that returns `result` as a "fancy list" suggests that they are encouraging another way to access elements of the `results`: to use specialized functions! For example, you can use `residuals(results)`, `coefficients(results)`, and `fitted.values(results)`. These functions do not work for all lists in R, but when they do work (for `lm` and `glm` objects only), you can be sure you are writing the kind of code that is encouraged by the authors of `lm()`.
:::
## `list`s In Python
[Python `list`s](https://docs.python.org/3/library/stdtypes.html#lists) are very flexible, too.\index{lists!lists in Python} There are fewer choices for accessing and modifying elements of lists in Python--you'll most likely end up using the square bracket operator. Elements can be different sizes and types, just like they were with R's lists.
Unlike in R, however, you cannot name elements of lists. If you want a container that allows you to access elements by name, look into Python [dictionaries](https://docs.python.org/3/library/stdtypes.html#mapping-types-dict) (see section \@ref(dictionaries-in-python)) or Pandas\index{Pandas}' `Series` objects (see section \@ref(overview-of-python)).
From the example below, you can see that we've been introduced to lists already. We have been constructing Numpy arrays from them.
```{python, collapse = TRUE}
import numpy as np
another_list = [np.array([1,2,3]), "May 5th, 2021", True, [42,42]]
another_list[2]
another_list[2] = 100
another_list
```
Python lists have [methods attached to them](https://docs.python.org/3/tutorial/datastructures.html#more-on-lists), which can come in handy.
```{python, collapse = TRUE}
another_list
another_list.append('new element')
another_list
```
Creating lists can be done as above, with the square bracket operators. They can also be created with the `list()` function, and by creating a *list comprehension*. List comprehensions are discussed more in \@ref(loops).
```{python, collapse = TRUE}
my_list = list(('a','b','c')) # converting a tuple to a list
your_list = [i**2 for i in range(3)] # list comprehension
my_list
your_list
```
The code above makes reference to a type that is not extensively discussed in this text: [`tuple`s](https://docs.python.org/3.3/library/stdtypes.html?highlight=tuple#tuple).\index{tuples in Python}
## Dictionaries In Python
[**Dictionaries**](https://docs.python.org/3/tutorial/datastructures.html#dictionaries) in Python provide a container of key-value pairs. The keys are *unique*, and they must be *immutable*. `string`s are the most common key type, but `int`s can be used as well.
Here is an example of creating a `dict` with curly braces (i.e. `{}`). This `dict` stores the current price of a few popular cryptocurrencies. Accessing an individual element's value using its key is done with the square bracket operator (i.e. `[]`), and deleting elements is done with the `del` keyword.
```{python, collapse = TRUE}
crypto_prices = {'BTC': 38657.14, 'ETH': 2386.54, 'DOGE': .308122}
crypto_prices['DOGE'] # get the current price of Dogecoin
del crypto_prices['BTC'] # remove the current price of Bitcoin
crypto_prices.keys()
crypto_prices.values()
```
You can also create `dict`s using **dictionary comprehensions**. Just like list comprehensions, these are discussed more in \@ref(loops).
```{python, collapse= TRUE}
incr_cryptos = {key:val*1.1 for (key,val) in crypto_prices.items()}
incr_cryptos
```
Personally, I don't use dictionaries as much as lists. If I have a dictionary, I usually convert it to a Pandas\index{Pandas} data frame (more information on those in \@ref(data-frames-in-python)).
```{python, collapse= TRUE}
import pandas as pd
a_dict = { 'col1': [1,2,3], 'col2' : ['a','b','c']}
df_from_dict = pd.DataFrame(a_dict)
df_from_dict
```
## Exercises
### R Questions
1.
Consider the data sets `"adult.data"`, `"car.data"`, `"hungarian.data"`, `"iris.data"`, `"long-beach-va.data"` and `"switzerland.data"` [@misc_heart_disease_45], [@misc_iris_53], [@misc_adult_2] and [@misc_car_evaluation_19] hosted by [@uci_data]. Read all of these in and store them all as a `list` of `data.frame`s. Call the list `listDfs`.
2.
Here are two lists in R:
```{R, collapse = TRUE}
l1 <- list(first="a", second=1)
l2 <- list(first=c(1,2,3), second = "statistics")
```
a) Make a new `list` that is these two lists above "squished together." It has to be length $4$, and each element is one of the elements of `l1` and `l2`. Call this list `l3`. Make sure to delete all the "tags" or "names" of these four elements.
b) Extract the third element of `l3` as a length one `list` and assign it to the name `l4`.
c) Extract the third element of `l3` as a `vector` and assign it to the name `v1`.
### Python Questions
1.
Read in `car.data` with `pd.read_csv()`, and use a `DataFrame` method to convert that to a `dict`. Store your answer as `car_dict`.
2.
Here are two `dict`s in Python:
```{python, collapse = TRUE}
d1 = { "first" : "a", "second" : 1}
d2 = { "first" : [1,2,3], "second" : "statistics"}
```
a) Make a new `list` that is these two `dict`s above "squished together" (why can't it be another `dict`?) It has to be length $4$, and each value is one of the values of $d1$ and $d2$. Call this list `my_list`.
b) Use a list comprehension to create a list called `special_list` of all numbers starting from zero, up to (and including) one million, but don't include numbers that are divisible by any prime number less than seven.
c) Assign the average of all elements in the above list to the variable `special_ave`.