Bias

The bias of an estimate is the difference between the expectation value of the point estimate and value of the parameter.

\[\begin{aligned} \text{bias}_F(\hat{\theta}, \theta) = \langle \hat{\theta} \rangle - \theta = \int\mathrm{d}x\, \hat{\theta}f(x) - T(F). \end{aligned}\]

Note that the expectation value of \(\hat{\theta}\) is computed over the (unknown) generative distribution whose PDF is \(f(x)\).

Bias of the plug-in estimate for the mean

We often want a small bias because we want to choose estimates that give us back the parameters we expect. Let’s first investigate the bias of the plug-in estimate of the mean. As a reminder, the plug-in estimate is

\[\begin{aligned} \hat{\mu} = \bar{x}, \end{aligned}\]

where \(\bar{x}\) is the arithmetic mean of the observed data. To compute the bias of the plug-in estimate, we need to compute \(\langle \hat{\mu}\rangle\) and compare it to \(\mu\).

\[\begin{aligned} \langle \hat{\mu}\rangle = \langle \bar{x}\rangle = \frac{1}{n}\left\langle\sum_i x_i\right\rangle = \frac{1}{n}\sum_i \left\langle x_i\right\rangle = \langle x\rangle = \mu. \end{aligned}\]

Because \(\langle \hat{\mu}\rangle = \mu\), the bias in the plug-in estimate for the mean is zero. It is said to be unbiased.

Bias of the plug-in estimate for the variance

To compute the bias of the plug-in estimate for the variance, first recall that the variance, as the second central moment, is computed as

\[\begin{aligned} \sigma^2 = \langle x^2 \rangle - \langle x\rangle^2. \end{aligned}\]

So, the expectation value of the plug-in estimate is

\[\begin{split}\begin{aligned} \left\langle \hat{\sigma}^2 \right\rangle &= \left\langle\frac{1}{n}\sum_i x_i^2 - \bar{x}^2\right\rangle \\ &= \left\langle\frac{1}{n}\sum_i x_i^2\right\rangle - \left\langle\bar{x}^2\right\rangle\\ &= \frac{1}{n}\sum_i \left\langle x_i^2\right\rangle - \left\langle\bar{x}^2\right\rangle \\ &= \langle x^2 \rangle - \left\langle\bar{x}^2\right\rangle\\ &= \mu^2 + \sigma^2 - \left\langle\bar{x}^2\right\rangle. \end{aligned}\end{split}\]

We now need to compute \(\left\langle\bar{x}^2\right\rangle\), which is a little trickier. We will use the fact that the measurements are independent, so \(\left\langle x_i x_j\right\rangle = \langle x_i \rangle \langle x_j\rangle\) for \(i\ne j\).

\[\begin{split}\begin{aligned} \left\langle\bar{x}^2\right\rangle &= \left\langle\left(\frac{1}{n}\sum_ix_i\right)^2\right\rangle \\ &= \frac{1}{n^2}\left\langle\left(\sum_ix_i\right)^2 \right\rangle \\ &= \frac{1}{n^2}\left\langle\sum_i x_i^2 + 2\sum_i\sum_{j>i}x_i x_j\right\rangle \nonumber \\ &= \frac{1}{n^2}\left(\sum_i \left\langle x_i^2\right\rangle + 2\sum_i\sum_{j>i}\left\langle x_i x_j\right\rangle \right) \\ &= \frac{1}{n^2}\left(n(\sigma^2 + \mu^2) + 2\sum_i\sum_{j>i}\langle x_i\rangle \langle x_j\rangle\right) \nonumber \\ &=\frac{1}{n^2}\left(n(\sigma^2 + \mu^2) + n(n-1)\langle x\rangle^2\right)\\ &= \frac{1}{n^2}\left(n\sigma^2 + n^2\mu^2\right) \\ &= \frac{\sigma^2}{n} + \mu^2. \end{aligned}\end{split}\]

Thus, we have

\[\begin{aligned} \left\langle \hat{\sigma}^2 \right\rangle = \left(1-\frac{1}{n}\right)\sigma^2. \end{aligned}\]

Therefore, the bias is

\[\begin{aligned} \text{bias} = -\frac{\sigma^2}{n}. \end{aligned}\]

If \(\hat{\sigma}^2\) is the plug-in estimate for the variance, an unbiased estimator would instead be

\[\begin{aligned} \frac{n}{n-1}\,\hat{\sigma}^2 = \frac{1}{n-1}\sum_{i=1}^n (x_i - \bar{x})^2. \end{aligned}\]

Justification of using plug-in estimates.

Despite the apparent bias in the plug-in estimate for the variance, we will normally just use plug-in estimates going forward. (We will use the hat, e.g. \(\hat{\theta}\), to denote an estimate, which can be either a plug-in estimate or not.) Note that the bootstrap procedures we lay out in what follows do not need to use plug-in estimates, but we will use them for convenience. Why do this? The bias is typically small. We just saw that the biased and unbiased estimators of the variance differ by a factor of \(n/(n-1)\), which is negligible for large \(n\). In fact, plug-in estimates tend to have much smaller error than the confidence intervals for the parameter estimate, which we will discuss next.