Bias
----

The **bias** of an estimate is the difference between the expectation
value of the point estimate and value of the parameter.

.. math::

    \begin{aligned}
       \text{bias}_F(\hat{\theta}, \theta) = \langle \hat{\theta} \rangle - \theta
       = \int\mathrm{d}x\, \hat{\theta}f(x) - T(F).
    \end{aligned}

Note that the expectation value of :math:`\hat{\theta}` is computed over the (unknown) generative distribution whose PDF is :math:`f(x)`.


Bias of the plug-in estimate for the mean
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

We often want a small bias because we want to choose estimates that
give us back the parameters we expect. Let's first investigate the bias of the plug-in estimate of the mean. As a reminder, the plug-in estimate is

.. math::

    \begin{aligned}
       \hat{\mu} = \bar{x},
    \end{aligned}

where :math:`\bar{x}` is the arithmetic mean of the observed data. To compute the bias of the plug-in estimate, we need to compute :math:`\langle \hat{\mu}\rangle` and compare it to :math:`\mu`.

.. math::

    \begin{aligned}
       \langle \hat{\mu}\rangle = \langle \bar{x}\rangle = \frac{1}{n}\left\langle\sum_i x_i\right\rangle
       = \frac{1}{n}\sum_i \left\langle x_i\right\rangle
       = \langle x\rangle
       = \mu.
    \end{aligned}

Because :math:`\langle \hat{\mu}\rangle = \mu`, the bias in the plug-in estimate for the mean is zero. It is said to be **unbiased**.


Bias of the plug-in estimate for the variance
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

To compute the bias of the plug-in estimate for the variance, first recall that the variance, as the second central moment, is computed as

.. math::

    \begin{aligned}
       \sigma^2 = \langle x^2 \rangle - \langle x\rangle^2.
    \end{aligned}

So, the expectation value of the plug-in estimate is

.. math::

  \begin{aligned}
   \left\langle \hat{\sigma}^2 \right\rangle &= \left\langle\frac{1}{n}\sum_i x_i^2 - \bar{x}^2\right\rangle \\[1em]
   &= \left\langle\frac{1}{n}\sum_i x_i^2\right\rangle - \left\langle\bar{x}^2\right\rangle\\[1em]
   &= \frac{1}{n}\sum_i \left\langle x_i^2\right\rangle  - \left\langle\bar{x}^2\right\rangle \\[1em]
   &= \langle x^2 \rangle - \left\langle\bar{x}^2\right\rangle\\[1em]
   &= \mu^2 + \sigma^2 - \left\langle\bar{x}^2\right\rangle.
  \end{aligned}

We now need to compute :math:`\left\langle\bar{x}^2\right\rangle`,
which is a little trickier. We will use the fact that the measurements
are independent, so
:math:`\left\langle x_i x_j\right\rangle = \langle x_i \rangle \langle x_j\rangle`
for :math:`i\ne j`.

.. math::

    \begin{aligned}
    \left\langle\bar{x}^2\right\rangle
    &= \left\langle\left(\frac{1}{n}\sum_ix_i\right)^2\right\rangle \\[1em]
    &= \frac{1}{n^2}\left\langle\left(\sum_ix_i\right)^2 \right\rangle \\[1em]
    &= \frac{1}{n^2}\left\langle\sum_i x_i^2 + 2\sum_i\sum_{j>i}x_i x_j\right\rangle \nonumber \\[1em]
    &= \frac{1}{n^2}\left(\sum_i \left\langle x_i^2\right\rangle
    + 2\sum_i\sum_{j>i}\left\langle x_i x_j\right\rangle \right) \\[1em]
    &= \frac{1}{n^2}\left(n(\sigma^2 + \mu^2)
    + 2\sum_i\sum_{j>i}\langle x_i\rangle \langle x_j\rangle\right) \nonumber \\[1em]
    &=\frac{1}{n^2}\left(n(\sigma^2 + \mu^2) + n(n-1)\langle x\rangle^2\right)\\[1em]
    &= \frac{1}{n^2}\left(n\sigma^2 + n^2\mu^2\right) \\[1em]
    &= \frac{\sigma^2}{n} + \mu^2.
    \end{aligned}

Thus, we have

.. math::

    \begin{aligned}
    \left\langle \hat{\sigma}^2 \right\rangle = \left(1-\frac{1}{n}\right)\sigma^2.
    \end{aligned}

Therefore, the bias is

.. math::

    \begin{aligned}
       \text{bias} = -\frac{\sigma^2}{n}.
    \end{aligned}

If :math:`\hat{\sigma}^2` is the plug-in estimate for the variance, an unbiased estimator would instead be

.. math::

    \begin{aligned}
      \frac{n}{n-1}\,\hat{\sigma}^2 = \frac{1}{n-1}\sum_{i=1}^n (x_i - \bar{x})^2.
    \end{aligned}


Justification of using plug-in estimates.
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Despite the apparent bias in the plug-in estimate for the variance, we
will normally just use plug-in estimates going forward. (We will use the
hat, e.g. :math:`\hat{\theta}`, to denote an estimate, which can be
either a plug-in estimate or not.) Note that the bootstrap procedures we
lay out in what follows do not *need* to use plug-in estimates, but we
will use them for convenience. Why do this? The bias is typically small.
We just saw that the biased and unbiased estimators of the variance
differ by a factor of :math:`n/(n-1)`, which is negligible for large
:math:`n`. In fact, plug-in estimates tend to have much smaller error
than the confidence intervals for the parameter estimate, which we will
discuss next.