Built site for gh-pages

mattblackwell · Jun 9, 2024 · f41f01d · f41f01d
1 parent 03c6afa
commit f41f01d
Show file tree

Hide file tree

Showing 5 changed files with 3 additions and 3 deletions.
diff --git a/.nojekyll b/.nojekyll
@@ -1 +1 @@
-7b077c52
+39cba3b6
diff --git a/ols_properties.html b/ols_properties.html
@@ -373,7 +373,7 @@ <h1 class="title"><span id="sec-ols-statistics" class="quarto-section-identifier
 <section id="large-sample-properties-of-ols" class="level2" data-number="7.1">
 <h2 data-number="7.1" class="anchored" data-anchor-id="large-sample-properties-of-ols"><span class="header-section-number">7.1</span> Large-sample properties of OLS</h2>
 <p>As we saw in <a href="asymptotics.html" class="quarto-xref"><span>Chapter 3</span></a>, we need two key ingredients to conduct statistical inference with the OLS estimator: (1) a consistent estimate of the variance of <span class="math inline">\(\bhat\)</span> and (2) the approximate distribution of <span class="math inline">\(\bhat\)</span> in large samples. Remember that, since <span class="math inline">\(\bhat\)</span> is a vector, the variance of that estimator will actually be a variance-covariance matrix. To obtain the two key ingredients, we first establish the consistency of OLS and then use the central limit theorem to derive its asymptotic distribution, which includes its variance.</p>
-<p>We begin by setting out the assumptions needed for establishing the large-sample properties of OLS, which are the same as the assumptions needed to ensure that the best linear predictor, <span class="math inline">\(\bhat = \E[\X_{i}\X_{i}']^{-1}\E[\X_{i}Y_{i}]\)</span>, is well-defined and unique.</p>
+<p>We begin by setting out the assumptions needed for establishing the large-sample properties of OLS, which are the same as the assumptions needed to ensure that the best linear predictor, <span class="math inline">\(\bfbeta = \E[\X_{i}\X_{i}']^{-1}\E[\X_{i}Y_{i}]\)</span>, is well-defined and unique.</p>
 <div class="callout callout-style-default callout-note callout-titled">
 <div class="callout-header d-flex align-content-center">
 <div class="callout-icon-container">

diff --git a/ols_properties_files/figure-pdf/fig-wald-1.pdf b/ols_properties_files/figure-pdf/fig-wald-1.pdf
diff --git a/search.json b/search.json
@@ -792,7 +792,7 @@
     "href": "ols_properties.html",
     "title": "7  The statistics of least squares",
     "section": "",
-    "text": "7.1 Large-sample properties of OLS\nAs we saw in Chapter 3, we need two key ingredients to conduct statistical inference with the OLS estimator: (1) a consistent estimate of the variance of \\(\\bhat\\) and (2) the approximate distribution of \\(\\bhat\\) in large samples. Remember that, since \\(\\bhat\\) is a vector, the variance of that estimator will actually be a variance-covariance matrix. To obtain the two key ingredients, we first establish the consistency of OLS and then use the central limit theorem to derive its asymptotic distribution, which includes its variance.\nWe begin by setting out the assumptions needed for establishing the large-sample properties of OLS, which are the same as the assumptions needed to ensure that the best linear predictor, \\(\\bhat = \\E[\\X_{i}\\X_{i}']^{-1}\\E[\\X_{i}Y_{i}]\\), is well-defined and unique.\nRecall that these are mild conditions on the joint distribution of \\((Y_{i}, \\X_{i})\\) and in particular, we are not assuming linearity of the CEF, \\(\\E[Y_{i} \\mid \\X_{i}]\\), nor are we assuming any specific distribution for the data.\nWe can helpfully decompose the OLS estimator into the actual BLP coefficient plus estimation error as \\[\n\\bhat = \\left( \\frac{1}{n} \\sum_{i=1}^n \\X_i\\X_i' \\right)^{-1} \\left( \\frac{1}{n} \\sum_{i=1}^n \\X_iY_i \\right) = \\bfbeta + \\underbrace{\\left( \\frac{1}{n} \\sum_{i=1}^n \\X_i\\X_i' \\right)^{-1} \\left( \\frac{1}{n} \\sum_{i=1}^n \\X_ie_i \\right)}_{\\text{estimation error}}.\n\\]\nThis decomposition will help us quickly establish the consistency of \\(\\bhat\\). By the law of large numbers, we know that sample means will converge in probability to population expectations, so we have \\[\n\\frac{1}{n} \\sum_{i=1}^n \\X_i\\X_i' \\inprob \\E[\\X_i\\X_i'] \\equiv \\mb{Q}_{\\X\\X} \\qquad \\frac{1}{n} \\sum_{i=1}^n \\X_ie_i \\inprob \\E[\\X_{i} e_{i}] = \\mb{0},\n\\] which implies by the continuous mapping theorem (the inverse is a continuous function) that \\[\n\\bhat \\inprob \\bfbeta + \\mb{Q}_{\\X\\X}^{-1}\\E[\\X_ie_i] = \\bfbeta,\n\\] The linear projection assumptions ensure that the LLN applies to these sample means and that \\(\\E[\\X_{i}\\X_{i}']\\) is invertible.\nThus, OLS should be close to the population linear regression in large samples under relatively mild conditions. Remember that this may not equal the conditional expectation if the CEF is nonlinear. What we can say is that OLS converges to the best linear approximation to the CEF. Of course, this also means that, if the CEF is linear, then OLS will consistently estimate the coefficients of the CEF.\nTo emphasize, the only assumptions made about the dependent variable are that it (1) has finite variance and (2) is iid. Under this assumption, the outcome could be continuous, categorical, binary, or event count.\nNext, we would like to establish an asymptotic normality result for the OLS coefficients. We first review some key ideas about the Central Limit Theorem.\nWe now manipulate our decomposition to arrive at the stabilized version of the estimator, \\[\n\\sqrt{n}\\left( \\bhat - \\bfbeta\\right) = \\left( \\frac{1}{n} \\sum_{i=1}^n \\X_i\\X_i' \\right)^{-1} \\left( \\frac{1}{\\sqrt{n}} \\sum_{i=1}^n \\X_ie_i \\right).\n\\] Recall that we stabilize an estimator to ensure it has a fixed variance as the sample size grows, allowing it to have a non-degenerate asymptotic distribution. The stabilization works by asymptotically centering it (that is, subtracting the value to which it converges) and multiplying by the square root of the sample size. We have already established that the first term on the right-hand side will converge in probability to \\(\\mb{Q}_{\\X\\X}^{-1}\\). Notice that \\(\\E[\\X_{i}e_{i}] = 0\\), so we can apply Equation 7.1 to the second term. The covariance matrix of \\(\\X_ie_{i}\\) is \\[\n\\mb{\\Omega} = \\V[\\X_{i}e_{i}] = \\E[\\X_{i}e_{i}(\\X_{i}e_{i})'] = \\E[e_{i}^{2}\\X_{i}\\X_{i}'].\n\\] The CLT will imply that \\[\n\\frac{1}{\\sqrt{n}} \\sum_{i=1}^n \\X_ie_i \\indist \\N(0, \\mb{\\Omega}).\n\\] Combining these facts with Slutsky’s Theorem implies the following theorem.\nThus, with a large enough sample size we can approximate the distribution of \\(\\bhat\\) with a multivariate normal distribution with mean \\(\\bfbeta\\) and covariance matrix \\(\\mb{V}_{\\bfbeta}/n\\). In particular, the square root of the \\(j\\)th diagonals of this matrix will be standard errors for \\(\\widehat{\\beta}_j\\). Knowing the shape of the OLS estimator’s multivariate distribution will allow us to conduct hypothesis tests and generate confidence intervals for both individual coefficients and groups of coefficients. But, first, we need an estimate of the covariance matrix.",
+    "text": "7.1 Large-sample properties of OLS\nAs we saw in Chapter 3, we need two key ingredients to conduct statistical inference with the OLS estimator: (1) a consistent estimate of the variance of \\(\\bhat\\) and (2) the approximate distribution of \\(\\bhat\\) in large samples. Remember that, since \\(\\bhat\\) is a vector, the variance of that estimator will actually be a variance-covariance matrix. To obtain the two key ingredients, we first establish the consistency of OLS and then use the central limit theorem to derive its asymptotic distribution, which includes its variance.\nWe begin by setting out the assumptions needed for establishing the large-sample properties of OLS, which are the same as the assumptions needed to ensure that the best linear predictor, \\(\\bfbeta = \\E[\\X_{i}\\X_{i}']^{-1}\\E[\\X_{i}Y_{i}]\\), is well-defined and unique.\nRecall that these are mild conditions on the joint distribution of \\((Y_{i}, \\X_{i})\\) and in particular, we are not assuming linearity of the CEF, \\(\\E[Y_{i} \\mid \\X_{i}]\\), nor are we assuming any specific distribution for the data.\nWe can helpfully decompose the OLS estimator into the actual BLP coefficient plus estimation error as \\[\n\\bhat = \\left( \\frac{1}{n} \\sum_{i=1}^n \\X_i\\X_i' \\right)^{-1} \\left( \\frac{1}{n} \\sum_{i=1}^n \\X_iY_i \\right) = \\bfbeta + \\underbrace{\\left( \\frac{1}{n} \\sum_{i=1}^n \\X_i\\X_i' \\right)^{-1} \\left( \\frac{1}{n} \\sum_{i=1}^n \\X_ie_i \\right)}_{\\text{estimation error}}.\n\\]\nThis decomposition will help us quickly establish the consistency of \\(\\bhat\\). By the law of large numbers, we know that sample means will converge in probability to population expectations, so we have \\[\n\\frac{1}{n} \\sum_{i=1}^n \\X_i\\X_i' \\inprob \\E[\\X_i\\X_i'] \\equiv \\mb{Q}_{\\X\\X} \\qquad \\frac{1}{n} \\sum_{i=1}^n \\X_ie_i \\inprob \\E[\\X_{i} e_{i}] = \\mb{0},\n\\] which implies by the continuous mapping theorem (the inverse is a continuous function) that \\[\n\\bhat \\inprob \\bfbeta + \\mb{Q}_{\\X\\X}^{-1}\\E[\\X_ie_i] = \\bfbeta,\n\\] The linear projection assumptions ensure that the LLN applies to these sample means and that \\(\\E[\\X_{i}\\X_{i}']\\) is invertible.\nThus, OLS should be close to the population linear regression in large samples under relatively mild conditions. Remember that this may not equal the conditional expectation if the CEF is nonlinear. What we can say is that OLS converges to the best linear approximation to the CEF. Of course, this also means that, if the CEF is linear, then OLS will consistently estimate the coefficients of the CEF.\nTo emphasize, the only assumptions made about the dependent variable are that it (1) has finite variance and (2) is iid. Under this assumption, the outcome could be continuous, categorical, binary, or event count.\nNext, we would like to establish an asymptotic normality result for the OLS coefficients. We first review some key ideas about the Central Limit Theorem.\nWe now manipulate our decomposition to arrive at the stabilized version of the estimator, \\[\n\\sqrt{n}\\left( \\bhat - \\bfbeta\\right) = \\left( \\frac{1}{n} \\sum_{i=1}^n \\X_i\\X_i' \\right)^{-1} \\left( \\frac{1}{\\sqrt{n}} \\sum_{i=1}^n \\X_ie_i \\right).\n\\] Recall that we stabilize an estimator to ensure it has a fixed variance as the sample size grows, allowing it to have a non-degenerate asymptotic distribution. The stabilization works by asymptotically centering it (that is, subtracting the value to which it converges) and multiplying by the square root of the sample size. We have already established that the first term on the right-hand side will converge in probability to \\(\\mb{Q}_{\\X\\X}^{-1}\\). Notice that \\(\\E[\\X_{i}e_{i}] = 0\\), so we can apply Equation 7.1 to the second term. The covariance matrix of \\(\\X_ie_{i}\\) is \\[\n\\mb{\\Omega} = \\V[\\X_{i}e_{i}] = \\E[\\X_{i}e_{i}(\\X_{i}e_{i})'] = \\E[e_{i}^{2}\\X_{i}\\X_{i}'].\n\\] The CLT will imply that \\[\n\\frac{1}{\\sqrt{n}} \\sum_{i=1}^n \\X_ie_i \\indist \\N(0, \\mb{\\Omega}).\n\\] Combining these facts with Slutsky’s Theorem implies the following theorem.\nThus, with a large enough sample size we can approximate the distribution of \\(\\bhat\\) with a multivariate normal distribution with mean \\(\\bfbeta\\) and covariance matrix \\(\\mb{V}_{\\bfbeta}/n\\). In particular, the square root of the \\(j\\)th diagonals of this matrix will be standard errors for \\(\\widehat{\\beta}_j\\). Knowing the shape of the OLS estimator’s multivariate distribution will allow us to conduct hypothesis tests and generate confidence intervals for both individual coefficients and groups of coefficients. But, first, we need an estimate of the covariance matrix.",
     "crumbs": [
       "Regression",
       "<span class='chapter-number'>7</span>  <span class='chapter-title'>The statistics of least squares</span>"

diff --git a/users-guide.pdf b/users-guide.pdf