Implementation of a hierarchical model¶
[1]:
# Colab setup ------------------
import os, sys, subprocess
if "google.colab" in sys.modules:
cmd = "pip install --upgrade colorcet bebi103 arviz cmdstanpy watermark"
process = subprocess.Popen(cmd.split(), stdout=subprocess.PIPE, stderr=subprocess.PIPE)
stdout, stderr = process.communicate()
import cmdstanpy; cmdstanpy.install_cmdstan()
else:
data_path = "../data/"
# ------------------------------
import numpy as np
import cmdstanpy
import arviz as az
import bebi103
import iqplot
import bokeh
colors = bokeh.palettes.d3['Category10'][10]
bokeh.io.output_notebook()
We will do MCMC on the hierarchical posterior for worm reversal data. As a reminder, the model is
\begin{align} &\phi \sim \text{Beta}(1.1, 1.1), \\[1em] &\kappa \sim \text{HalfNorm}(0, 10), \\[1em] &\alpha = \phi \kappa, \\[1em] &\beta = (1-\phi)\kappa,\\[1em] &\theta_i \sim \text{Beta}(\alpha, \beta) \;\;\forall i,\\[1em] &n_i \sim \text{Binom}(N_i, \theta_i)\;\;\forall i. \end{align}
To demonstrate how the hierarchical model works, we will consider the data sets from 2015, 2016, and 2017, and an additional three synthetic experiments that have 14/40, 5/34, and 110/660 reversals, respectively. These three experiments were not actually performed; I am using them here to demonstrate some effects we see in hierarchical models. In particular, the last experiment has many more trials and would dominate pooled data if we were not using a hierarchical model.
We will not perform prior predictive checks or other diagnostics, but simply demonstrate how a hierarchical model is built using Stan and investigate some of the resulting inferences.
[2]:
# Data
n = np.array([9, 12, 18, 14, 5, 110])
N = np.array([35, 35, 54, 40, 34, 660])
Next, as usual, we define our model using Stan.
data {
// Number of separate experiments
int K;
int N[K];
int n[K];
}
parameters {
// Hyperparameters
real<lower=0, upper=1> phi;
real<lower=0> kappa;
// Parameters
real<lower=0, upper=1> theta[K];
}
transformed parameters {
// Transformed hyperparameters
real<lower=0> alpha = phi * kappa;
real<lower=0> beta_ = (1 - phi) * kappa;
}
model {
// Hyperpriors
phi ~ beta(1.1, 1.1);
kappa ~ normal(0, 10.0);
// Prior
theta ~ beta(alpha, beta_);
// Likelihood
n ~ binomial(N, theta);
}
We can compile it and draw samples.
[3]:
sm = cmdstanpy.CmdStanModel(stan_file='worm_hier.stan')
data = dict(K=len(n), n=n, N=N)
samples = sm.sample(data=data)
samples = az.from_cmdstanpy(posterior=samples)
bebi103.stan.check_all_diagnostics(samples)
INFO:cmdstanpy:compiling stan program, exe file: /Users/bois/Dropbox/git/bebi103_course/2021/b/content/lessons/20/worm_hier
INFO:cmdstanpy:compiler options: stanc_options=None, cpp_options=None
INFO:cmdstanpy:compiled model file: /Users/bois/Dropbox/git/bebi103_course/2021/b/content/lessons/20/worm_hier
INFO:cmdstanpy:start chain 1
INFO:cmdstanpy:start chain 2
INFO:cmdstanpy:start chain 3
INFO:cmdstanpy:start chain 4
INFO:cmdstanpy:finish chain 4
INFO:cmdstanpy:finish chain 2
INFO:cmdstanpy:finish chain 3
INFO:cmdstanpy:finish chain 1
Effective sample size looks reasonable for all parameters.
Rhat looks reasonable for all parameters.
0 of 4000 (0.0%) iterations ended with a divergence.
0 of 4000 (0.0%) iterations saturated the maximum tree depth of 10.
E-BFMI indicated no pathological behavior.
[3]:
0
Now, let’s look a corner plot to see how we did with the sampling.
[4]:
g = bebi103.viz.corner(
samples,
parameters=["phi", "kappa", "theta[0]", "theta[1]", "theta[5]"],
xtick_label_orientation=np.pi / 4,
)
bokeh.io.show(g)
We can see the results more clearly if we plot the marginalized distributions on top of each other. This also allows us to see how the respective experiments serve to determine \(\phi\), the master reversal probability. I will also plot the most probable value of \(\theta_i\) in the independent experiment model (i.e., all experiments are independent) as a diamond. We also plot the most probable reversal probability of the pooled model (all experiments pooled together) as a black diamond, again with a Uniform prior.
[5]:
# Plot ECDF of phi samples
p = iqplot.ecdf(
samples.posterior.phi.values.flatten(),
frame_width=650,
style="staircase",
line_kwargs=dict(line_color="black"),
y_axis_label="marginal posterior CDF",
x_axis_label="parameter value",
)
# Plot the ECDFs for each theta_i
for i in range(len(n)):
p = iqplot.ecdf(
samples.posterior.theta.sel(theta_dim_0=i).values.flatten(),
style="staircase",
line_kwargs=dict(line_color=colors[i]),
p=p,
)
# MAP of individuals (given by (n + α - 1) / (N + α + β - 2))
for i in range(len(n)):
theta_map = (n[i] + 0.1) / (N[i] + 0.2)
p.diamond(theta_map, 0.5, color=colors[i], size=20, legend_label=f"θ{i}")
# Pooled MLE
theta_map = (n.sum() + 0.1) / (N.sum() + 0.2)
p.diamond(theta_map, 0.5, color="black", size=20, legend_label="ϕ")
p.legend.visible = True
p.legend.location = "bottom_right"
bokeh.io.show(p)
We see that the individual parameter values tend to “shrink” toward the hyperparameter value. The hyperparameter value, in turn, is different than if we pooled all the data together. Notably, the black diamond, which represents the maximum likelihood estimate of the reversal probability if we pooled all of the data together, is significantly different than under the hierarchical model. This is because the data set with many more measurements overwhelms the other data sets if we pool the results.
We are probably most interested in the hyperparameter \(\phi\), so let’s compute its median and 80% credible interval.
[6]:
print(
"""
φ = [{0:.3f}, {1:.3f}, {2:.3f}]
""".format(
*np.percentile(samples.posterior.phi.values.flatten(), [10, 50, 90])
)
)
φ = [0.214, 0.278, 0.353]
[7]:
bebi103.stan.clean_cmdstan()
Computing environment¶
[8]:
%load_ext watermark
%watermark -v -p numpy,cmdstanpy,arviz,iqplot,bebi103,bokeh,jupyterlab
CPython 3.8.5
IPython 7.19.0
numpy 1.19.2
cmdstanpy 0.9.67
arviz 0.11.0
iqplot 0.2.0
bebi103 0.1.2
bokeh 2.2.3
jupyterlab 2.2.6