Generalization of hierarchical models¶
The worm reversal problem is easily generalized. You can imagine having more levels of the hierarchy. This is just more steps in the chain of dependencies that are factored in the prior. For general parameters \(\theta\) and hyperparameters \(\phi\), we have, for data set \(y\),
\begin{align} g(\theta, \phi \mid y) = \frac{f(y\mid \theta)\, g(\theta \mid \phi)\,g(\phi)}{f(y)} \end{align}
for a two-level hierarchical model. For a three-level hierarchical model, we can consider hyperparameters \(\xi\) that condition \(\phi\) which in turn condition \(\theta\), giving
\begin{align} g(\theta, \phi, \xi \mid y) = \frac{f(y\mid \theta)\, g(\theta \mid \phi)\,g(\phi\mid \xi)\,g(\xi)}{f(y)}, \end{align}
and so on for four, five, etc., level hierarchical models. As we have seen in the course, the work is all in coming up with the models for the likelihood \(f(y\mid \theta)\), and prior, \(g(\theta \mid \phi)\,g(\phi)\), in the case of a two-level hierarchical model. For coming up with the conditional portion of the prior, \(g(\theta \mid \phi)\), we often assume a Gaussian distribution because this often describes experiment-to-experiment variability. (The Beta distribution we used in the worm reversal example is approximately Gaussian and has the convenient feature that it is defined on the interval \([0,1]\).) Bayes’s theorem gives you the posterior, and it is then “just” a matter of computing it by sampling from it. In coming lessons, we will use Stan to sample out of hierarchical models and discuss the difficulties involved with doing that.