Lecture 7: Hierarchical models

(c) 2018 Justin Bois. With the exception of pasted graphics, where the source is noted, this work is licensed under a Creative Commons Attribution License CC-BY 4.0. All code contained herein is licensed under an MIT license.

This document was prepared at Caltech with financial support from the Donna and Benjamin M. Rosen Bioengineering Center.

This homework was generated from an Jupyter notebook. You can download the notebook here.

In [1]:
import numpy as np

import bebi103

import bokeh
colors = bokeh.palettes.d3['Category10'][10]
bokeh.io.output_notebook()
Loading BokehJS ...

A sample calculation

We will do MCMC on the hierarchical posterior derived in class for worm reversal data. Specifically,

\begin{align} &\phi \sim \text{Beta}(2, 2), \\ &\kappa \sim \text{HalfNorm}(0, 10), \\ &\alpha = \phi \kappa, \\ &\beta = (1-\phi)\kappa,\\ &\theta_i \sim \text{Beta}(\alpha, \beta) \;\;\forall i,\\ &n_i \sim \text{Binom}(N_i, \theta_i)\;\;\forall i. \end{align}

To demonstrate how the hierarchical model works, we will consider the data sets from 2015, 2016, and 2017, and an additional three hypothetical experiments that have 14/40, 5/34, and 110/660 reversals, respectively. These three experiments were not actually performed; I am using them here to demonstrate some effects we see in hierarchical models. In particular, the last experiment has many more trials and would dominate pooled data if we were not using a hierarchical model.

We will not perform prior predictive checks or other diagnostics, but simple demonstrate how a hierarchical model is built using Stan and investigate some of the resulting inferences.

In [2]:
# Data
n = np.array([9, 12, 18, 14, 5, 110])
N = np.array([35, 35, 54, 40, 34, 660])

Next, as usual, we define our model using Stan.

In [3]:
model_code = """
data {
  // Number of separate experiments
  int K;
  
  int N[K];
  int n[K];
}


parameters {
  // Hyperparameters
  real<lower=0, upper=1> phi;
  real<lower=0> kappa;

  // Parameters
  real<lower=0, upper=1> theta[K];
}


transformed parameters {
  // Transformed hyperparameters
  real<lower=0> alpha = phi * kappa;
  real<lower=0> beta_ = (1 - phi) * kappa;
}


model {
  // Hyperpriors
  phi ~ beta(2.0, 2.0);
  kappa ~ normal(0, 10.0);
  
  // Prior
  theta ~ beta(alpha, beta_);
  
  // Likelihood
  n ~ binomial(N, theta);
}
"""

sm = bebi103.stan.StanModel(model_code=model_code)
INFO:pystan:COMPILING THE C++ CODE FOR MODEL anon_model_a4313e776d2aea49f1056e6565d09bef NOW.
/Users/Justin/anaconda3/lib/python3.7/site-packages/Cython/Compiler/Main.py:367: FutureWarning: Cython directive 'language_level' not set, using 2 for now (Py2). This will change in a later release! File: /var/folders/y7/r162f1fd0zv1vzxkb5nktz300000gn/T/tmp93vs90nj/stanfit4anon_model_a4313e776d2aea49f1056e6565d09bef_8025514547376188225.pyx
  tree = Parsing.p_module(s, pxd, full_module_name)

Ok, now we're ready to do some sampling!

In [4]:
data = dict(K=len(n),
            n=n,
            N=N)

samples = sm.sampling(data=data)

Now, let's look a corner plot to see how we did with the sampling.

In [5]:
g = bebi103.viz.corner(samples,
                       pars=['phi', 'kappa', 'theta[1]', 'theta[2]', 'theta[6]'])
bokeh.io.show(g)

We can see the results more clearly if we plot the marginalized distributions on top of each other. This also allows us to see how the respective experiments serve to determine $\phi$, the master reversal probability. I will also plot the most probable value of $\theta_i$ in the independent experiment model (i.e., all experiments are independent) as a diamond. For these, I assume a Uniform prior on $\theta_i$. We also plot the most probable reversal probability of the pooled model (all experiments pooled together) as a black diamond, again with a Uniform prior.

In [6]:
# Convert trace to DataFrame
df_mcmc = bebi103.stan.to_dataframe(samples)

# Plot ECDF of phi samples
p = bebi103.viz.ecdf(df_mcmc['phi'],
                     plot_width=700,
                     formal=True,
                     line_width=2,
                     color='black',
                     legend='φ',
                     y_axis_label='marginal posterior CDF',
                     x_axis_label='parameter value')

# Plot the ECDFs for each theta_i
for i in range(len(n)):
    p = bebi103.viz.ecdf(df_mcmc[f'theta[{i+1}]'],
                         color=colors[i],
                         legend='θ'+str(i+1),
                         p=p,
                         formal=True,
                         line_width=2)

# MLE
for i in range(len(n)):
    p.diamond(n[i]/N[i], 0.5, color=colors[i], size=20)

# Pooled MLE
p.diamond(n.sum()/N.sum(), 0.5, color='black', size=20)

p.legend.location = 'bottom_right'
bokeh.io.show(p)

We see that the individial parameter values tend to "shrink" toward the hyperparameter value. The hyperparameter value, in turn, is different than if we pooled all the data together. Notably, the black diamond, which represents the most probable value of the reversal probability if we pooled all of the data together, is significantly different than under the hierarchical model. This is because the data set with many more measurements overwhelms the other data sets if we pool the results.

We are probably most interested in the hyperparameter $\phi$, so let's compute its median and HPD.

In [9]:
phi = np.median(df_mcmc['phi'])
phi_hpd = bebi103.stan.hpd(df_mcmc['phi'], 0.95)
print("""
φ = [{0:.3f}, {1:.3f}, {2:.3f}]
""".format(phi_hpd[0], phi, phi_hpd[1]))
φ = [0.176, 0.283, 0.401]