Correlation causation #148

ben-herbst · 2023-10-25T18:30:01Z

Incomplete

* On gravity * On athletic/IQ scores

…esampling-with into correlation_causation

* Need to iterate on this, not add to it.

* Lightly edit the rest.

stefanv · 2023-11-02T16:45:05Z

source/correlation_causation.Rmd

+(column) vector. Note that we want to find a solution for any such system - there are no conditions other than that the
+number of rows of $\mathbf{x}$ must be the same as the number of columns of $A$, and the number of rows of $\mathbf{y}
+must be the same as the number of rows of $A$. Note in particular that $m$ need not equal $n$, and if $m=n$ we don't
+require that the determinant of $A$ be non-zero. Let's give examples of the typical situations that one encounters in general,


the determinant is sure to confuse

stefanv · 2023-11-02T16:46:22Z

source/correlation_causation.Rmd

+
+
+1.  The first example represents all systems of equations where $m=n$ with non-zero determinant. In all these cases the equation is *solvable* by perhaps using
+    Gaussian elimination with partial pivoting. (Not Cramer's rule!)


Those new to linear algebra won't know about Gaussian elimination or pivoting

stefanv · 2023-11-02T16:49:24Z

source/correlation_causation.Rmd

+    $$
+    This equation represents all equations where there are more equations than unknowns, i.e. all *over determined* systems.
+    Since this system cannot be solved, we look
+    instead for a solution that best fits the system in a sense that we'll explain later. Please take our word for it, for now,


"take our word" is my least favorite expression! Perhaps we can give a quick intuitive version of the answer, e.g., we have to pick a value, and it looks like that value will have to be somewhere between 1 and 1.2. A best guess turns out to be , and we'll soon learn why.

stefanv · 2023-11-02T16:49:49Z

source/correlation_causation.Rmd

+     that the solution is given by the *normal equations*,
+    $$
+    A^T A \mathbf{x} = A^T \mathbf{y}.
+    $$
+    Here $A^T$ is the transpose of $A$, $A^TA$ is an $n\times n$, square, symmetrid matrix and $A^T \mathbf{y}$ is an $n\times 1$ (column)
+    vector. Moreover, if the columns of $A$ are linearly independent, it can be shown that $A^TA$ has an inverse,
+    a situation that is almost always true.


I'm not sure this is helpful until the reader can comprehend what it says.

stefanv · 2023-11-02T16:51:40Z

source/correlation_causation.Rmd

+    to identify a natural solution among the infinity of available solutions. To find this solution one has to calculate
+    the generalized inverse that will take us too far from our core focus. But it turns out that it can again be cast as an


Again, I think simply mentioning these terms will lead to confusion. We'd need to find an accessible way to explain that a solution is possible, but that you need to go about it carefully.

stefanv · 2023-11-02T16:52:08Z

source/correlation_causation.Rmd

+    It should be obvious that this is indeed a solution of the equation. What is more, it is the solution
+    that is the closest to the origin, i.e. out of the infinite number of solutions, this is the one with the
+    shortest length.


This type of language, e.g., is good for beginner level.

stefanv · 2023-11-02T16:56:38Z

source/correlation_causation.Rmd

+Returning to our question above, we want to identify that value of $\mathbf{x}$ that will minimize the errors,
+$e_1, \ldots, e_m$. We are back at the question, minimize in what sense? A generally used measure for the error is,
+$$
+\mathbf{e}^T\mathbf{e} = e_1^2 + \cdots + e_m^2,


I'd swap these around, since the student is more likely to understand the latter.

stefanv · 2023-11-02T16:56:51Z

source/correlation_causation.Rmd

+
+Armed with the normal equations we can explain the linear correlation between variables.
+
+:::{.callout-note}


I like this callout a lot.

stefanv · 2023-11-02T16:57:49Z

source/correlation_causation.Rmd

+\mathbf{y}^T \mathbf{y}.
+$$
+In order to find the values of $\mathbf{x}$ that will minimize the sum of the squares of the errors, we need to set the
+partial derivatives to all the components, $x_1, \ldots, x_n$ in the equation to zero. The detailed calculations are messy


Is it also messy to derive it from e_1^2 + e_2^2 ...?

I.e., can we get a sense of least squares without matrix formulation, and in the end just state that the solution can also be written as ... using matrices?

matthew-brett · 2023-11-02T18:00:13Z

Here's the slope with assumed intercept of 0 : https://lisds.github.io/textbook/mean-slopes/mean_and_slopes.html

matthew-brett · 2023-11-02T18:01:14Z

Source at : https://github.com/lisds/textbook/ including datasets.

matthew-brett and others added 4 commits August 4, 2023 15:32

Updating correlation_causation

8c487ba

Start code for correlation causation

748f46a

Working through correlation / causation

fe9ef78

Add code and more sections"

980ade5

ben-herbst requested review from stefanv and matthew-brett October 25, 2023 18:30

stefanv and others added 6 commits October 25, 2023 11:46

Remove draft banner; fix inline math

11d1abe

* Slight improvement on formulation

cfc29a3

* On gravity * On athletic/IQ scores

Merge branch 'correlation_causation' of github.com:resampling-stats/r…

91f13ab

…esampling-with into correlation_causation

* Add to causality

628c2f6

* Need to iterate on this, not add to it.

* Add more linear algebra explanation

cfe4b35

* Lightly edit the rest.

Light editing

f789701

stefanv reviewed Nov 2, 2023

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Correlation causation #148

Correlation causation #148

ben-herbst commented Oct 25, 2023

stefanv Nov 2, 2023

stefanv Nov 2, 2023

stefanv Nov 2, 2023

stefanv Nov 2, 2023

stefanv Nov 2, 2023

stefanv Nov 2, 2023

stefanv Nov 2, 2023

stefanv Nov 2, 2023

stefanv Nov 2, 2023

stefanv Nov 2, 2023

matthew-brett commented Nov 2, 2023

matthew-brett commented Nov 2, 2023



		1. The first example represents all systems of equations where $m=n$ with non-zero determinant. In all these cases the equation is solvable by perhaps using
		Gaussian elimination with partial pivoting. (Not Cramer's rule!)

		to identify a natural solution among the infinity of available solutions. To find this solution one has to calculate
		the generalized inverse that will take us too far from our core focus. But it turns out that it can again be cast as an


		Armed with the normal equations we can explain the linear correlation between variables.

		:::{.callout-note}

Correlation causation #148

Are you sure you want to change the base?

Correlation causation #148

Conversation

ben-herbst commented Oct 25, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

matthew-brett commented Nov 2, 2023

matthew-brett commented Nov 2, 2023