Choosing a hierarchical prior


Choice of a hierarchical prior is not always as straightforward as for priors we are used to considering because we have to specify the hyperprior and all conditional priors, \(g(\theta\mid \phi)\).

Exchangeability

The conditional probability, \(g(\theta\mid \phi)\), can take any reasonable form. In the case where we have no reason to believe that we can distinguish any one \(\theta_i\) from another prior to the experiment, then the label “\(i\)” applied to the experiment may be exchanged with the label of any other experiment. I.e., \(g(\theta_1, \theta_2, \ldots, \theta_k \mid \phi)\) is invariant to permutations of the indices. Parameters behaving this way are said to be exchangeable. A common (simple) exchangeable distribution is

\begin{align} g(\theta\mid \phi) = \prod_{i=1}^k g(\theta_i\mid \phi), \end{align}

which means that each of the parameters is an independent sample out of a distribution \(g(\theta_i\mid \phi)\), which we often take to be the same for all \(i\). This is reasonable to do in the worm reversal example.

Choice of the conditional distribution

We need to specify our prior, which for this hierarchical model means that we have to specify the conditional distribution, \(g(\theta_i\mid \phi)\), as well as \(g(\phi)\). We could assume a Beta prior for \(\phi\); the one we chose in our original nonhierarchical model would be a good choice.

\begin{align} \phi \sim \text{Beta}(1.1, 1.1). \end{align}

For the conditional distribution \(g(\theta_i\mid \phi)\), we might also assume it is Beta-distributed. This necessitates another parameter because the Beta distribution has two parameters.

The Beta distribution is typically written as

\begin{align} g(\theta\mid \alpha, \beta) = \frac{\Gamma(\alpha+\beta)}{\Gamma(\alpha)\Gamma(\beta)}\, \theta^{\alpha-1}(1-\theta)^{\beta-1}, \end{align}

where it is parametrized by positive constants \(\alpha\) and \(\beta\). The Beta distribution has mean and concentration, respectively, of

\begin{align} \phi &= \frac{\alpha}{\alpha + \beta}, \\[1em] \kappa &= \alpha + \beta. \end{align}

The concentration \(\kappa\) is a measure of how sharp the distribution is. The bigger \(\kappa\) is, the most sharply peaked the distribution is. Since we would like to parametrize our Beta distribution with its mean \(\phi\), we could use \(\kappa\) as our other parameter. So, our expression for the posterior is

\begin{align} g(\theta, \phi, \kappa \mid n, N) = \frac{f(n,N\mid \theta)\,\left( \prod_{i=1}^k g(\theta_i\mid \phi, \kappa)\right)\,g(\phi, \kappa)}{f(n, N)}. \end{align}

We are left to specify the hyperprior \(g(\phi, \kappa)\). We will take \(\phi\) to come from a Beta distribution and \(\kappa\) to come from an weakly informative Half-Normal. Note that to switch from a parametrization using \(\phi\) and \(\kappa\) to one using \(\alpha\) and \(\beta\), we can use

\begin{align} &\alpha = \phi \kappa\\[1em] &\beta = (1-\phi)\kappa. \end{align}

With all of this, we can now put together our model.

\begin{align} &\phi \sim \text{Beta}(1.1, 1.1), \\[1em] &\kappa \sim \text{HalfNorm}(0, 10), \\[1em] &\alpha = \phi \kappa, \\[1em] &\beta = (1-\phi)\kappa,\\[1em] &\theta_i \sim \text{Beta}(\alpha, \beta) \;\;\forall i,\\[1em] &n_i \sim \text{Binom}(N_i, \theta_i)\;\;\forall i. \end{align}

This is a complete specification of a hierarchical model.