Skip to content

Commit

Permalink
more typos
Browse files Browse the repository at this point in the history
  • Loading branch information
alexnones committed Dec 23, 2023
1 parent 46acc8e commit 89c6f24
Show file tree
Hide file tree
Showing 10 changed files with 76 additions and 76 deletions.
10 changes: 5 additions & 5 deletions highdim/dimension-reduction.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -440,7 +440,7 @@ plot(dist(x), dist(z[,1]))
abline(0,1, col = "red")
```
We also notice that the two groups, adults and children, can be clearly observed with the one number summary, better than with any of the two orginal dimesions.
We also notice that the two groups, adults and children, can be clearly observed with the one number summary, better than with any of the two original dimensions.
```{r}
#| echo: false
Expand Down Expand Up @@ -515,7 +515,7 @@ $$
\mathbf{Z} = \mathbf{X}\mathbf{V}
$$
The ideas of distance preservation extends to higher dimensions. For a multidimensional matrix with $p$ columns, the $\mathbf{A}$ transformation preserves the distance between rows, but with the variance exaplined by the columns in decreasing order.
The ideas of distance preservation extends to higher dimensions. For a multidimensional matrix with $p$ columns, the $\mathbf{A}$ transformation preserves the distance between rows, but with the variance explained by the columns in decreasing order.
If the variances of the columns $\mathbf{Z}_j$, $j>k$ are very small, these dimensions have little to contribute to the distance calculation and we can approximate the distance between any two points with just $k$ dimensions. If $k$ is much smaller than $p$, then we can achieve a very efficient summary of our data.
Expand All @@ -539,7 +539,7 @@ We can see that columns of the `pca$rotation` are indeed the rotation obtained w
pca$rotation
```
The sqaure root of the variation of each column is included in the `pca$sdev` component. This implies we can compute the variance explained by each PC using:
The square root of the variation of each column is included in the `pca$sdev` component. This implies we can compute the variance explained by each PC using:
```{r}
pca$sdev^2/sum(pca$sdev^2)
Expand Down Expand Up @@ -600,7 +600,7 @@ pca <- prcomp(x)
summary(pca)
```
The first two dimensions account for almot 98% of the variability. Thus, we should be able to approximate the distance very well with two dimensions. We confirm this by computing the distance from first two dimensions and comparing to the original:
The first two dimensions account for almost 98% of the variability. Thus, we should be able to approximate the distance very well with two dimensions. We confirm this by computing the distance from first two dimensions and comparing to the original:
```{r, eval = FALSE}
d_approx <- dist(pca$x[, 1:2])
Expand Down Expand Up @@ -694,7 +694,7 @@ tmp |>
facet_wrap(~label, nrow = 1)
```
We can clearly see that first PC appears to be separating the 1s (red) from the 0s (blue). We can vaguely discern numbers in the other three PCs as well. By looking at the PCs stratified by digits, we get futher insights. For example, we see that the second PC separates 4s, 7s, and 9s from the rest:
We can clearly see that first PC appears to be separating the 1s (red) from the 0s (blue). We can vaguely discern numbers in the other three PCs as well. By looking at the PCs stratified by digits, we get further insights. For example, we see that the second PC separates 4s, 7s, and 9s from the rest:
```{r digit-pc-boxplot}
#| echo: false
Expand Down
2 changes: 1 addition & 1 deletion highdim/matrices-in-R.qmd
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# Matrices in R

When the number of variables associated with each observation is large and they can all be represented as a number, it is often more convenient to store them in a matrix and perform the analysis with linear algebra operations, rather than storing them in a data frame and performing the analysis with **tidyverse** or **data.table** functions. With matrices, variables for each observation are stored in a row, resulting in a matrix with as many columns as variables. In statistics, we refer to values represented in the rows of the matrix as the *covariates* or *pedictors* and, in machine learning, we refer to them as the *features*.
When the number of variables associated with each observation is large and they can all be represented as a number, it is often more convenient to store them in a matrix and perform the analysis with linear algebra operations, rather than storing them in a data frame and performing the analysis with **tidyverse** or **data.table** functions. With matrices, variables for each observation are stored in a row, resulting in a matrix with as many columns as variables. In statistics, we refer to values represented in the rows of the matrix as the *covariates* or *predictors* and, in machine learning, we refer to them as the *features*.

In linear algebra, we have three types of objects: scalars, vectors, and matrices. We have already learned about vectors in R, and, although there is no data type for scalars, we can represent them as vectors of length 1. In this chapter, we learn how to work with matrices in R and relate them to linear algebra notation and concepts.

Expand Down
2 changes: 1 addition & 1 deletion highdim/regularization.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -445,7 +445,7 @@ if (knitr::is_html_output()) {
1\. For the `movielens` data, compute the number of ratings for each movie and then plot it against the year the movie was released. Use the square root transformation on the counts.


2\. We see that, on average, movies that were releaed after 1993 get more ratings. We also see that with newer movies, starting in 1993, the number of ratings decreases with year: the more recent a movie is, the less time users have had to rate it.
2\. We see that, on average, movies that were released after 1993 get more ratings. We also see that with newer movies, starting in 1993, the number of ratings decreases with year: the more recent a movie is, the less time users have had to rate it.

Among movies that came out in 1993 or later, what are the 25 movies with the most ratings per year? Also, report their average rating.

Expand Down
Loading

0 comments on commit 89c6f24

Please sign in to comment.