Marginalization¶
I mentioned that the evidence can be computed from the likelihood and the prior. To see this, we apply the sum rule to the posterior probability. Let \(\theta_i\) be a particular possible value of a parameter or hypothesis. Then,
\[\begin{aligned}
1 = g(\theta_j\mid y) + g(\theta^c_j | y) \nonumber = g(\theta_j\mid y) + \sum_{i\ne j}g(\theta_i\mid y) = \sum_i g(\theta_i\mid y),
\end{aligned}\]
Now, Bayes’s theorem gives us an expression for \(g(\theta_i\mid y)\), so we can compute the sum.
\[\begin{aligned}
\sum_i g(\theta_i\mid y) = \sum_i\frac{f(y \mid \theta_i)\, g(\theta_i)}{f(y)} = \frac{1}{f(y)}\sum_i f(y \mid \theta_i)\, g(\theta_i) = 1.
\end{aligned}\]
Therefore, we can compute the evidence by summing over the priors and likelihoods of all possible hypotheses or parameter values.
\[\begin{aligned}
f(y) = \sum_i f(y \mid \theta_i)\, g(\theta_i).
\end{aligned}\]
Using the joint probability, we also have
\[\begin{aligned}
f(y) = \sum_i \pi(y, \theta_i).
\end{aligned}\]
This process of eliminating a variable (in this case \(\theta_i\)) from a probability by summing is called marginalization. This will prove useful in finding the probability distribution of a single parameter among many.