Lecture 9: Hierarchical models

(c) 2017 Justin Bois. This work is licensed under a Creative Commons Attribution License CC-BY 4.0. All code contained therein is licensed under an MIT license.

This homework was generated from an Jupyter notebook. You can download the notebook here.

In [66]:
import numpy as np
import scipy.stats as st

import pymc3 as pm

import bebi103

import bokeh
colors = bokeh.palettes.d3['Category10'][10]
bokeh.io.output_notebook()
Loading BokehJS ...

A sample calculation

We will do MCMC on the hierarchical posterior derived in class for worm reversal data. Specifically,

\begin{align} g(\alpha, \beta,\mathbf{p}\mid \mathbf{r}, \mathbf{n}) \propto \frac{1}{(\alpha+\beta)^{5/2}} \prod_{i=1}^k \frac{\Gamma(\alpha+\beta)}{\Gamma(\alpha)\Gamma(\beta)}\,p_i^{r_i+\alpha-1}\,(1-p_i)^{n_i-r_i+\beta-1}. \end{align}

It is perhaps more convenient to parametrize this model in terms of $q = \alpha/(\alpha + \beta)$ and $\omega = 1/\sqrt{\alpha + \beta}$. (Comparing to the lecture notes, $\omega = 1/\sqrt{\kappa}$). With the parametrization, the model is

\begin{align} q &\sim \text{Uniform}(0, 1) \\[1em] \omega &\sim \text{Uniform}(0, \infty)\\[1em] \alpha &= q/\omega^2 \\[1em] \beta &= (1-q)/\omega^2 \\[1em] p_i &\sim \text{Beta}(\alpha, \beta) \;\;\forall i \\[1em] r_i \mid n_i &\sim \text{Binom}(n_i, p_i) \;\;\forall i. \end{align}

To demonstrate how the hierarchical model works, we will consider the data sets from 2015, 2016, and 2017, and an additional three experiments that have 14/40, 5/34, and 110/660 reversals, respectively. These three experiments were not actually performed; I am using them here to demonstrate some effects we see in hierarchical models. In particular, the last experiment has many more trials and would dominate pooled data if we were not using a hierarchical model.

In [37]:
# Data
r = np.array([9, 12, 18, 14, 5, 110])
n = np.array([35, 35, 54, 40, 34, 660])

Next, as usual, we define our model. Hierarchical models are quite easy to specify with PyMC3. We do our usual setup, taking care to specify the shape kwargs on priors that are not hyperpriors.

In [39]:
with pm.Model() as model:
    # Hyperprior
    q = pm.Uniform('q', lower=0, upper=1)
    omega = pm.HalfFlat('omega')

    # alpha and beta from q and omega
    alpha = pm.Deterministic('alpha', q / omega**2)
    beta = pm.Deterministic('beta', (1-q) / omega**2)

    # Prior on p's
    p = pm.Beta('p', alpha=alpha, beta=beta, shape=len(r))
    
    # Likelihood
    r_obs = pm.Binomial('r_obs', n=n, p=p, observed=r)

Ok, now we're ready to do some sampling!

In [40]:
with model:
    trace = pm.sample(tune=50000, draws=10000, njobs=4, target_accept=0.9)
Auto-assigning NUTS sampler...
Initializing NUTS using jitter+adapt_diag...
 97%|█████████▋| 58466/60000 [01:52<00:02, 521.89it/s]/Users/Justin/anaconda3/lib/python3.6/site-packages/pymc3/step_methods/hmc/nuts.py:467: UserWarning: Chain 3 contains 26 diverging samples after tuning. If increasing `target_accept` does not help try to reparameterize.
  % (self._chain_id, n_diverging))
 98%|█████████▊| 58687/60000 [01:52<00:02, 521.92it/s]/Users/Justin/anaconda3/lib/python3.6/site-packages/pymc3/step_methods/hmc/nuts.py:467: UserWarning: Chain 2 contains 13 diverging samples after tuning. If increasing `target_accept` does not help try to reparameterize.
  % (self._chain_id, n_diverging))
 99%|█████████▊| 59115/60000 [01:53<00:01, 522.44it/s]/Users/Justin/anaconda3/lib/python3.6/site-packages/pymc3/step_methods/hmc/nuts.py:467: UserWarning: Chain 1 contains 17 diverging samples after tuning. If increasing `target_accept` does not help try to reparameterize.
  % (self._chain_id, n_diverging))
100%|█████████▉| 59984/60000 [01:54<00:00, 523.86it/s]/Users/Justin/anaconda3/lib/python3.6/site-packages/pymc3/step_methods/hmc/nuts.py:467: UserWarning: Chain 0 contains 36 diverging samples after tuning. If increasing `target_accept` does not help try to reparameterize.
  % (self._chain_id, n_diverging))
100%|██████████| 60000/60000 [01:54<00:00, 523.84it/s]

Now, let's look a corner plot to see how we did with the sampling.

In [41]:
g = bebi103.viz.corner(trace,
                       vars=['q', 'p__0', 'p__1', 'p__2', 'p__5'],
                       plot_ecdf=True)
bokeh.io.show(g)

We can see the results more clearly if we plot the marginalized distributions on top of each other. This also allows us to see how the respective experiments serve to determine q, the master reversal probability. We plot the most probable value of p_i in the independent experiment model (i.e., all experiments are independent) as a diamond. We also plot the most probable reversal probability of the pooled model (all experiments pooled together) as a black diamond.

In [67]:
# Convert trace to DataFrame
df_mcmc = bebi103.pm.trace_to_dataframe(trace)

# Plot ECDF of q samples (only plot 1/10 of them to keep file size down)
p = bebi103.viz.ecdf(df_mcmc['q'].values[::10],
                     plot_width=700,
                     formal=True,
                     line_width=2,
                     color='black',
                     legend='q')

# Plot the ECDFs for each p_i
for i in range(len(r)):
    p = bebi103.viz.ecdf(df_mcmc['p__' + str(i)].values[::10],
                         color=colors[i],
                         legend='p'+str(i+1),
                         p=p,
                         formal=True,
                         line_width=2)

# MLE
for i in range(len(r)):
    p.diamond(r[i]/n[i], 0.5, color=colors[i], size=20)

# Pooled MLE
p.diamond(r.sum()/n.sum(), 0.5, color='black', size=20)

p.legend.location = 'bottom_right'
bokeh.io.show(p)

We see that the individial parameter values tend to "shrink" toward the hyperparameter value. The hyperparameter value, in turn, is different than if we pooled all the data together. Notably, the black diamond, which represents the most probable value of the reversal probability if we pooled all of the data together, is significantly different than under the hierarchical model. This is because the data set with many more measurements overwhelms the other data sets if we pool the results.

We are probably most interested in the hyperparameter $q$, so let's compute its median and HPD.

In [63]:
q = np.median(df_mcmc['q'])
q_hpd = bebi103.pm.hpd(df_mcmc['q'], 0.95)
print("""
q = [{0:.3f}, {1:.3f}, {2:.3f}]
""".format(q_hpd[0], q, q_hpd[1]))
q = [0.174, 0.266, 0.386]