MathStat355Notes/consistency.qmd at main · Mac-STAT/MathStat355Notes · GitHub

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
# Consistency

The properties of estimators that we've covered thus far (with the exception of asymptotic unbiasedness) have all been what are called *finite sample* properties. All we mean by this is that our sample size ($n$) is less than infinity (hence: finite). In practice of course, our sample size will always be less than infinity; however, it is still nice to know what happens to our estimators (in terms of bias, variance, etc.) as our sample size gets *really really large*. Much of Frequentist statistics relies on asymptotic (read: infinite sample size) theory to quantify our uncertainty via confidence intervals, for example.

Consistency is one such asymptotic property that we care about. As opposed to asymptotic unbiasedness, consistency tells us something about the *entire shape* of a distribution, as opposed to just the *center* of our distribution. Prof. Brianna Heggeseth and Prof. Kelsey Grinde have a great visual for this below:

```{r echo = FALSE, message = FALSE, warning = FALSE}
library(latex2exp)
library(ggplot2)
library(dplyr)
x <- rep(seq(-4, 4, by = 0.001),2)
y2 <- dnorm(x, 0, 1.2) + 2
y1 <- dnorm(x, 0, 0.5) + 3
group <- rep(c("a","b"), each = length(x)/2)

df <- data.frame(x = x, y = c(y2, y1), group = group)

data.frame(x = x, y = c(y2, y1), group = group) %>%
  ggplot(aes(x, y)) +
  geom_point(size = 0.3) +
  geom_vline(xintercept = 2, linetype = "longdash") +
  geom_vline(xintercept = -2, linetype = "longdash") +
  theme_classic() +
  scale_x_continuous(breaks = c(-2,0,2),
                     labels = c(TeX("$\\theta - \\epsilon$"), TeX("$\\theta$"), TeX("$\\theta + \\epsilon$"))) +
  xlab("Parameter value") +
  ylab("Sample size") +
  scale_y_continuous(breaks = c(2, 3), labels = c("2", "n")) +
  geom_ribbon(data=subset(data.frame(x = x, y = y2), x < 2 & x > -2), aes(x=x, ymax=y), ymin=2, alpha=0.3, fill = "blue") +
  geom_ribbon(data=subset(data.frame(x = x, y = y1), x < 2 & x > -2), aes(x=x, ymax=y), ymin=3, alpha=0.3, fill = "blue") +
  coord_flip(xlim = c(-4.5, 4.5)) +
  theme(axis.text = element_text(size = 14),
        axis.title = element_text(size = 16),
        title = element_text(size = 12)) +
  annotate("text", x = 2.5, y = 2.3, label = TeX("$\\Pr(| \\hat{\\theta}_2 - \\theta | < \\epsilon)$")) +
  annotate("text", x = 2.5, y = 3.3, label = TeX("$\\Pr(| \\hat{\\theta}_n - \\theta | < \\epsilon)$")) +
  ggtitle(TeX("Sampling Distribution for $\\hat{\\theta}$ at Different Sample Sizes"))
```

The idea here is that as our sample size gets large (as we move to the right on the x-axis), consistency tells us something about the entire *distribution* of our estimator being within some boundary. Asymptotic unbiasedness, on the other hand, only tells us about whether the center of our distribution (a single point!) lies where it "should" (at the truth).

Why do we care about consistency? Because we care about uncertainty! It would be really unfortunate if, in collecting more and more data, we didn't get any more certain about the true parameter we're trying to estimate. Intuitively, we want to be *more confident* (less uncertain) in our estimators when we have larger sample sizes. This is exactly what consistency is concerned with. How do we prove whether or not an estimator is consistent? (Typically) Chebyshev's Inequality, which we state and prove below.

## Learning Objectives

By the end of this chapter, you should be able to...

-   Distinguish between finite sample properties and asymptotic properties of estimators
-   Prove (using Chebyshev's inequality) whether or not an estimator is consistent

## Concept Questions

1.  What is the distinction between a fixed sample property and an asymptotic property of an estimator?

2.  Describe, in your own words, what it means for an estimator to be consistent.

3.  How can we use Chebyshev's inequality to show that an estimator is consistent?

4.  Which of the estimation techniques we've seen so far yield consistent estimators?

## Definitions

You are expected to know the following definitions:

**Asymptotically Unbiased**

An estimator $\hat{\theta} = g(X_1, \dots, X_n)$ is an *asymptotically* unbiased estimator for $\theta$ if $\underset{n \to \infty}{\text{lim}} E[\hat{\theta}] = \theta$.

**Consistent**

An estimator $\hat{\theta}_n = h(X_1, \dots, X_n)$ is consistent for $\theta$ if it converges in probability to $\theta$. That is, for all $\epsilon > 0$,

$$
\underset{n \to \infty}{\text{lim}} \Pr(| \hat{\theta}_n - \theta | < \epsilon) = 1
$$

Note that we write our estimator with a subscript $n$ here to clarify that our estimator depends on our sample size. There is an alternative $\epsilon-\delta$ definition of consistency, but we won't focus on it for this course.

**Weak Law of Large Numbers**

For independent and identically distributed random variables $X_1, \dots, X_n$ with finite expectation $\mu < \infty$,

$$
\underset{n \to \infty}{\text{lim}} \Pr(| \overline{X} - \mu | < \epsilon) = 1
$$

Alternatively, we can write that as $n \to \infty$, $\overline{X} \overset{p}{\to} \mu$, where "$\overset{p}{\to}$" denotes convergence in probability.

## Theorems

**Chebyshev's Inequality**

Let $W$ be a random variable with mean $\mu$ and variance $\sigma^2$. Then for any $\epsilon > 0$,

$$
\Pr(|W - \mu| < \epsilon) \geq 1 - \frac{\sigma^2}{\epsilon^2},
$$

or, equivalently,

$$
\Pr(|W - \mu| \geq \epsilon) \leq \frac{\sigma^2}{\epsilon^2}.
$$

<details>

<summary>Proof.</summary>

Let $\epsilon > 0$. Then

\begin{align*}
\sigma^2 & = \text{Var}(W) \\
& = \int_{-\infty}^\infty (w-\mu)^2 f_W(w)dw \\
& = \int_{-\infty}^{\mu - \epsilon} (w-\mu)^2 f_W(w)dw + \int_{\mu - \epsilon}^{\mu + \epsilon} (w-\mu)^2 f_W(w)dw + \int_{\mu + \epsilon}^\infty (w-\mu)^2 f_W(w)dw \\
& \ge \int_{-\infty}^{\mu - \epsilon} (w-\mu)^2 f_W(w)dw + 0  + \int_{\mu + \epsilon}^\infty (w-\mu)^2 f_W(w)dw \\
&= \int_{|w-\mu|\ge \epsilon} (w-\mu)^2 f_W(w)dw \\
& \ge \int_{|w-\mu|\ge \epsilon} \epsilon^2 f_W(w)dw \\
& = \epsilon^2 \int_{|w-\mu|\ge \epsilon} f_W(w)dw\\
& = \epsilon^2 P(|W-\mu| \ge \epsilon)
\end{align*}

and rearranging yields

\begin{align*}
\sigma^2 & \geq \epsilon^2 P(|W-\mu| \ge \epsilon) \\
P(|W-\mu| \geq \epsilon) & \leq \frac{\sigma^2}{\epsilon^2}
\end{align*}

as desired.

</details>

**Corollary 1:** If $\hat{\theta}_n$ is an unbiased estimator for $\theta$ and $\underset{n \to \infty}{\text{lim}} Var(\hat{\theta}_n) = 0$, then $\hat{\theta}_n$ is consistent for $\theta$. (You'll prove this corollary on a problem set!)

**Corollary 2:** If $\hat{\theta}_n$ is an asymptotically unbiased estimator for $\theta$ and $\underset{n \to \infty}{\text{lim}} Var(\hat{\theta}_n) = 0$, then $\hat{\theta}_n$ is consistent for $\theta$.

Note that the second corollary is a bit stronger than the first one, in that the first corollary actually *implies* the second. If an estimator is unbiased, then it is certainly asymptotically unbiased as well.

## Worked Examples

**Problem 1:** Suppose $X_1, \dots, X_n \overset{iid}{\sim} N(\mu, \sigma^2)$. Show that the MLE for $\sigma^2$ is *asymptotically* unbiased.

<details>

<summary>Solution:</summary>

The MLE for $\sigma^2$ is given by $\frac{1}{n} \sum_{i = 1}^n (X_i - \overline{X})^2$ (see the MLE section of the course notes, worked example problem 2), and has expectation $\left( \frac{n-1}{n} \right)\sigma^2$ (see the Properties section of the course notes, worked example problem 6). To show that this estimator is asymptotically unbiased, note that we have

\begin{align*}
    \underset{n\to \infty}{\text{lim}} E[\hat{\sigma^2}_{MLE}] & = \underset{n\to \infty}{\text{lim}} \left( \frac{n-1}{n} \right) \sigma^2 \\
    & = \left( 1 \right) \sigma^2 \\
    & = \sigma^2
\end{align*}

and therefore, the MLE for $\sigma^2$ is asymptotically unbiased.

</details>

**Problem 2:** Suppose $Y_1, \dots, Y_n \overset{iid}{\sim} Uniform(0, \theta)$, and recall that $\hat{\theta}_{MLE} = Y_{(n)}$ with $f_{Y_{(n)}}(y \mid \theta) = \frac{n}{\theta^n} y^{n-1}$, $0 \leq y \leq \theta$. Prove that $\hat{\theta}_{MLE}$ is a consistent estimator for $\theta$.

<details>

<summary>Solution:</summary>

To prove that $\hat{\theta}_{MLE}$ is consistent, we must first show that $\hat{\theta}_{MLE}$ is (either) unbiased or asymptotically unbiased, and then we must show that the variance of $\hat{\theta}_{MLE}$ tends to zero as $n \to \infty$. To begin, note that

\begin{align*}
    E\left[\hat{\theta}_{MLE}\right] & = \int_{0}^\theta y f_{Y_{(n)}} (y \mid \theta) dy \\
    & = \int_{0}^\theta y \left( \frac{n}{\theta^n} y^{n-1} \right) dy \\
    & = \frac{n}{\theta^n} \int_0^\theta y^n dy \\
    & = \frac{n}{\theta^n} \left( \frac{y^{n + 1}}{n + 1} \bigg|_0^\theta \right) \\
    & = \frac{n}{\theta^n} \left( \frac{\theta^{n + 1}}{n + 1} \right) \\
    & = \left( \frac{n}{n + 1} \right) \theta
\end{align*}

and so $\hat{\theta}_{MLE}$ is biased. It is, however, asymptotically *unbiased*. Note that $\left( \frac{n}{n + 1} \right) \overset{n \to \infty}{\to} 1$, and therefore $E\left[\hat{\theta}_{MLE}\right] \overset{n \to \infty}{\to} \theta$.

All that's left is to show that $Var\left[\hat{\theta}_{MLE}\right] \overset{n \to \infty}{\to} 0$. We can write

\begin{align*}
        E\left[Y_{(n)}^2\right] & = \int_0^\theta y^2 \frac{ny^{n-1}}{\theta^n} dy \\
        & = \frac{n}{\theta^n} \int_0^\theta y^{n + 1} dy \\
        & = \frac{n}{\theta^n} \left( \frac{y^{n + 2}}{n + 2} \bigg|_0^\theta \right) \\
        & = \frac{n}{\theta^n} \left( \frac{\theta^{n + 2}}{n + 2}\right) \\
        & = \left( \frac{n}{n + 2} \right) \theta^2
    \end{align*}

and therefore

\begin{align*}
    \underset{n \to \infty}{\text{lim}} Var\left[\hat{\theta}_{MLE}\right] & = \underset{n \to \infty}{\text{lim}} \left[ E\left[Y_{(n)}^2\right] - E\left[Y_{(n)}\right]^2 \right]\\
    & = \underset{n \to \infty}{\text{lim}} \left[ \left( \frac{n}{n + 2} \right) \theta^2 - \left( \frac{n}{n + 1} \right)^2 \theta^2 \right] \\
    & = \theta^2 \underset{n \to \infty}{\text{lim}} \left[ \left( \frac{n}{n + 2} \right)  - \left( \frac{n}{n + 1} \right)^2\right] \\
    & = \theta^2 \underset{n \to \infty}{\text{lim}} \left[  \frac{n}{n + 2}   - \frac{n^2}{(n + 1)^2} \right] \\
    & = \theta^2 \underset{n \to \infty}{\text{lim}} \left[  \frac{n(n + 1)^2 - n^2 (n + 2)}{(n + 2)(n + 1)^2} \right] \\
    & = \theta^2 \underset{n \to \infty}{\text{lim}} \left[  \frac{n(n^2 + 2n + 1) - n^3 -2n^2}{(n + 2)(n^2 + 2n + 1)} \right] \\
    & = \theta^2 \underset{n \to \infty}{\text{lim}} \left[  \frac{n^3 + 2n^2 + n - n^3 -2n^2}{n^3 + 2n^2 + 2n^2 + 5n + 2} \right] \\
    & = \theta^2 \underset{n \to \infty}{\text{lim}} \left[  \frac{n}{n^3 + 4n^2  + 5n + 2} \right] \\
    & = 0
\end{align*}

where the last term goes to zero because $\frac{n}{n^3} \to 0$ as $n \to \infty$. Therefore, $\hat{\theta}_{MLE}$ is a consistent estimator for $\theta$.

</details>