Recitation 8: Review of hierarchical modeling

(c) 2018 Julian Wagner. With the exception of pasted graphics, where the source is noted, this work is licensed under a Creative Commons Attribution License CC-BY 4.0. All code contained herein is licensed under an MIT license.

This document was prepared at Caltech with financial support from the Donna and Benjamin M. Rosen Bioengineering Center.

This tutorial was generated from an Jupyter notebook. You can download the notebook here.

In [1]:
import numpy as np
import pandas as pd

import bebi103

import bokeh.io
bokeh.io.output_notebook()
Loading BokehJS ...

We are working with some data for how defensive glands might effect survival of beetles against ant foes. You can download the data set here. We will start with some EDA on the data. This data was crudely hand annotated from videos of a behavioral set up nicknamed the 'hotdog' arena (pictured below). A single beetle was loaded into the chamber with three vicious ants and the time until ant-induced-beetle-death was assessed. These beetles posess a defensive gland with irritant compounds for use against their enemies; however, some beetles in this set up were 'disarmed.' Agitating the beetles before putting them in the arena causes them to use up their defensive compound, which takes about a week to regenerate.

Lets take a look at the data format.

In [2]:
df = pd.read_csv('../data/beetle_time_to_death_j_wagner.csv', index_col=0)
df.head(10)
Out[2]:
arena well trial defence time to death (min)
0 5d 1 arm 90
1 1d 1 arm 80
2 2d 1 arm 100
3 3d 1 arm 200
4 4d 1 arm 220
0 4d 2 arm 40
1 2d 2 arm 90
2 1d 2 arm 120
3 3d 2 arm 180
4 5d 2 arm 450

We can immidiately see that there are mulitple trials of the experiment. In this case, each trial was done at a different day/time of day. This lends itself to hierarchy. Let's take a look at how much variation there is from trial to trial.

In [3]:
bokeh.io.show(bebi103.viz.jitter(data=df, 
                                 cats='trial', 
                                 val='time to death (min)', 
                                 color_column='defence',
                                 show_legend=True))

There's not much data here, but it does look like there is a reasonably high level of variability from day to day, especially in the disarmed beetle case. Lets take a look at the shape of the data for the disarmed beetles.

In [4]:
bokeh.io.show(bebi103.viz.ecdf(df.loc[df['defence'] == 'disarm']['time to death (min)']))

One story for data generating process is that it is the wait time for the first encounter with an ant, after which the beetle is toast. Another is that there might need to be multiple encounters before a beetle dies, in which case this would look be more likely to look gamma disctributed. Based on the inflection points, we have a good sense that this isn't exponential, but lets try to model it this way for simplicity. Before we move on to hierarchy, it is a good idea to get a first version of the model written down and implemented. In this case, we will start by pooling the data over the different days. Since the 'arrival rate' $\beta$ is evidently rather small, it is more convenient for us to parameterize in terms of $\tau$. We let

\begin{align} \tau &\sim \mathrm{Normal}(150, 25) \\ t_i &\sim \mathrm{Exponential}(1/\tau) \, \forall i \end{align}

With this in hand, we go for a prior predictive check.

In [5]:
t = df.loc[df['defence']=='disarm', 'time to death (min)']

taus = np.random.normal(150, 25, size=50)

ppc_exponential_time = [np.random.exponential(tau, size=len(t)) for tau in taus]

p = bebi103.viz.ecdf(ppc_exponential_time[0], formal=True, line_width=0.5, x_axis_label='time to death (min)')

for i in range(1, len(ppc_exponential_time)):
    p = bebi103.viz.ecdf(ppc_exponential_time[i], formal=True, line_width=0.5, p=p)

bokeh.io.show(p)

This looks quite reasonable, so let's code up the model in Stan.

In [6]:
model_code = """

data {
    int N;
    real t[N];
}

parameters{
    real<lower=0> tau;
}

transformed parameters {
    real<lower=0> beta_ = 1/tau;
}

model {
    tau ~ normal(150, 25);
    t ~ exponential(beta_);
}

generated quantities {
  real t_ppc[N];

  for (i in 1:N) {
    t_ppc[i] = exponential_rng(beta_);
  }
}

"""
sm_exp = bebi103.stan.StanModel(model_code=model_code)
Using cached StanModel.

Now we will perform the sampling and take a look at some diagnostics.

In [7]:
data = dict(N=len(t), t=t)
samples_exp = sm_exp.sampling(data=data)
bebi103.stan.check_all_diagnostics(samples_exp)
n_eff / iter looks reasonable for all parameters.
Rhat looks reasonable for all parameters.
0.0 of 4000 (0.0%) iterations ended with a divergence.
0 of 4000 (0.0%) iterations saturated the maximum tree depth of 10.
E-BFMI indicated no pathological behavior.
Out[7]:
0

Let's take a look at the samples.

In [8]:
samples_exp
Out[8]:
Inference for Stan model: anon_model_e0698d46c147dd4cdef6a67ef517a07c.
4 chains, each with iter=2000; warmup=1000; thin=1; 
post-warmup draws per chain=1000, total post-warmup draws=4000.

            mean se_mean     sd   2.5%    25%    50%    75%  97.5%  n_eff   Rhat
tau       132.54    0.62  22.11  92.09 116.42 131.96 147.81 178.07   1281    1.0
beta_     7.8e-3  3.8e-5 1.4e-3 5.6e-3 6.8e-3 7.6e-3 8.6e-3   0.01   1255    1.0
t_ppc[0]  128.49    2.21 133.04   3.07  37.19  88.35 175.61 489.44   3617    1.0
t_ppc[1]  132.64    2.42 137.97   3.48   36.8  88.59 182.43 506.85   3259    1.0
t_ppc[2]  130.63    2.17 135.66   3.32  36.63  88.34 176.13 501.29   3903    1.0
t_ppc[3]  129.54    2.09 132.45   3.45  36.35  89.22 177.56 486.15   4000    1.0
t_ppc[4]  131.46    2.22 137.12   3.24  36.46  90.54 178.87 492.45   3813    1.0
t_ppc[5]  134.19    2.19 136.44   3.57  40.54  93.89 178.03 497.93   3899    1.0
t_ppc[6]  134.72    2.26 139.42   3.23  36.61  93.34 186.46 518.18   3796    1.0
t_ppc[7]  131.97    2.07 131.01   3.54  38.74  91.78 183.87 475.22   4000    1.0
t_ppc[8]  133.66     2.4  137.6   2.99  38.49  91.85 181.75 515.09   3294    1.0
t_ppc[9]  131.51    2.12 134.01    3.1  37.61  90.79 184.22 474.29   4000    1.0
t_ppc[10] 130.02     2.3  139.8   2.74  35.86  87.46 171.49 521.84   3696    1.0
t_ppc[11] 128.82     2.1 129.17   2.68  36.06  88.54 181.86 468.45   3799    1.0
t_ppc[12] 132.16    2.21  134.1   4.23  38.89  90.03 179.22  486.2   3678    1.0
t_ppc[13] 133.44    2.23  141.0    2.8   36.7  88.32 185.91 529.56   4000    1.0
t_ppc[14] 132.03    2.15 134.88   3.63  36.92  87.74 179.54 513.09   3930    1.0
lp__      -79.72    0.02   0.67 -81.61  -79.9 -79.47 -79.28 -79.22   1563    1.0

Samples were drawn using NUTS at Fri Nov 30 00:28:16 2018.
For each parameter, n_eff is a crude measure of effective sample size,
and Rhat is the potential scale reduction factor on split chains (at 
convergence, Rhat=1).

We can now do a quick postereior predictive check to assess the model.

In [9]:
bokeh.io.show(bebi103.viz.predictive_ecdf(
    samples_exp, 
    name='t_ppc', 
    data=t, 
    title='Exponential',
    data_line=False,
    diff=True))

This doesn't look great, but let's also plot the ECDF of this data for good measure.

In [10]:
df_t_ppc = bebi103.stan.extract_array(samples_exp, 't_ppc')
p = bebi103.viz.ecdf(t, x_axis_label='time to death (min)', color='orange', level='overlay')

# Plot posterior predictive ECDFs
for i in df_t_ppc['chain_idx'].unique()[::40]:
    p = bebi103.viz.ecdf(df_t_ppc.loc[df_t_ppc['chain_idx']==i, 't_ppc'],
                         alpha=0.1, p=p)

bokeh.io.show(p)

Again, not great, especially considering how many of our few data points end up outside the data generated from our samples. We might want to attempt something like the gamma distribution for this instead, but in the interest of time we will see if we can get a hierarchical veresion of this model up and running. Based on some trial and error, I found that the $\tau$ parameterization is easier to interpret and set up for this model as well. On top of that, noncentering the model is a must to stave of divergences. With that in mind, we will implement the following model:

\begin{align} \theta &\sim \mathrm{HalfNormal}(150, 25) \\ \sigma &\sim \mathrm{HalfNormal}(0, 10) \\ \tilde{\tau}_{j} &\sim \mathrm{HalfNormal}(0, 1) \\ \tau_j &= \theta + \tilde{\tau}_j*\sigma \\ t_{i, j} &\sim \mathrm{Exponential}(1/\tau_j) \, \forall i, j \end{align}

With this in hand, we will perform the prior predictive checks. We will do this in Stan this time for variety.

In [11]:
model_code_prior_pc = """
data {
    int N;
    int J_1;
    int index_1[N];
}

generated quantities {
    vector<lower=0>[J_1] tau;
    real<lower=0> sigma;
    real<lower=0> theta;
    vector[N] t;
    
    theta = normal_rng(150, 25);
    sigma = fabs(normal_rng(0, 10));
    for (i in 1:J_1){ 
        tau[i] = normal_rng(theta, sigma);
    }
    
    for (i in 1:N) {
        t[i] = exponential_rng(1/tau[index_1[i]]);
    }
}
"""
sm_exp_heir_prior_pred = bebi103.stan.StanModel(model_code=model_code_prior_pc)
Using cached StanModel.

We need to make sure our data is in the right format for the model, so we use Justin's remarkably intuitive function to do this.

In [12]:
treatment = 'disarm'

# Get time column with name Stan can use
df['t'] = df['time to death (min)']

data, _ = bebi103.stan.df_to_datadict_hier(df.loc[df['defence']==treatment, :],
                                           level_cols='trial',
                                           data_cols='t')

We will now draw the samples. To match above, we will do 50 data set generations.

In [13]:
# Generate prior predictive data sets
samples_gen = sm_exp_heir_prior_pred.sampling(data=data, 
                              warmup=0, 
                              iter=50, 
                              chains=1, 
                              algorithm='Fixed_param')

We take a look.

In [14]:
samples_gen
Out[14]:
Inference for Stan model: anon_model_fe4967ec6aa0a88d94e184430140779c.
1 chains, each with iter=50; warmup=0; thin=1; 
post-warmup draws per chain=50, total post-warmup draws=50.

         mean se_mean     sd   2.5%    25%    50%    75%  97.5%  n_eff   Rhat
tau[0] 147.45    4.11  29.06  80.77 128.91 149.87 163.98 201.26     50   0.98
tau[1] 149.16    3.68  26.02   93.5 136.28 152.03  161.6  201.0     50   1.01
tau[2] 150.12    3.71  26.23  81.76 137.51 152.32 164.45 196.78     50   0.99
sigma    8.66    0.81    5.7   0.25   3.43   8.75  12.71  22.33     50   0.98
theta  150.56    3.44  24.36  93.71 141.06 153.98 162.21  198.6     50   0.99
t[0]   143.38   25.29 151.77   4.32  45.42 116.14 198.35 590.51     36    1.0
t[1]   146.42   21.54 152.31   1.91  32.71  77.75 242.34 557.41     50   1.04
t[2]   148.49   27.07 148.25   6.71  49.92  92.86 204.62  578.3     30    1.0
t[3]   134.11   18.59 131.48   7.76   45.5  87.72 198.09 548.61     50   0.98
t[4]   148.18   19.81 140.11   8.43  53.12 110.84 203.19 598.12     50   1.05
t[5]   187.28   26.83 189.73  11.46  79.49 136.25  217.5 721.09     50   0.99
t[6]   191.76   24.82 175.52   9.08  47.23 144.65 291.97 635.64     50   0.98
t[7]   120.03   15.96 112.86   3.15  36.32 104.03 179.43 480.26     50   0.98
t[8]   125.52   18.12 128.11   8.77  43.99  88.01 169.05 486.36     50   0.99
t[9]   179.35   26.72 164.69   9.16  52.66 125.31 300.23  606.0     38    1.0
t[10]  180.18   25.89 183.07   7.44   73.6 117.53  255.7 749.51     50   0.98
t[11]   170.6   23.45 165.79   6.05  49.33 112.12  228.1 632.01     50   0.99
t[12]  173.82    22.9  161.9  12.09  63.15 132.44 268.84 677.73     50   0.99
t[13]  120.01   17.34 121.38    4.9  37.58  87.86  131.0 482.15     49   0.99
t[14]  141.77   18.28 129.24   5.54   56.2  96.86 205.38 466.34     50   0.98
lp__      0.0     0.0    0.0    0.0    0.0    0.0    0.0    0.0     50    nan

Samples were drawn using Fixed_param at Fri Nov 30 00:28:31 2018.
For each parameter, n_eff is a crude measure of effective sample size,
and Rhat is the potential scale reduction factor on split chains (at 
convergence, Rhat=1).

Let is refactor this output and make a plot!

In [15]:
t_gen = bebi103.stan.extract_array(samples_gen, 't')

trials = pd.DataFrame(np.array([t_gen.loc[t_gen['index_1'] == i]['t'].values for i in range(15)[1:]]))

p = bebi103.viz.ecdf(trials[0],
                      x_axis_label='time to death',
                      plot_width=300,
                      plot_height=300, formal=True, line_width=0.5)

[bebi103.viz.ecdf(trials[i], formal=True, p=p, line_width=0.5) for i in range(50)[1:]]

bokeh.io.show(p)

This looks pretty similar to our previous prior predictive check, so we are good to go with the model.

In [16]:
model_code = """
data {
    int N;
    real t[N];
    int J_1;
    int index_1[N];
}

parameters{
    vector[J_1] tau_tilde;
    real<lower=0> sigma;
    real theta;
}

transformed parameters {
    vector[J_1] tau = theta + tau_tilde*sigma;
    vector[J_1] beta_ = 1 ./ tau;
}

model {
    theta ~ normal(150, 50);
    sigma ~ normal(0, 10);
    tau_tilde ~ normal(0, 1);
    
    t ~ exponential(beta_[index_1]);
}

generated quantities {
  real t_ppc[N];
  real log_lik[N];

  for (i in 1:N) {
    log_lik[i] = exponential_lpdf(t[i] | beta_[index_1[i]]);
    t_ppc[i] = exponential_rng(beta_[index_1[i]]);
  }
}
"""
sm_exp_h = bebi103.stan.StanModel(model_code=model_code)
Using cached StanModel.

It compiled! Now we can perform our sampling and take a look at the diagnostics.

In [17]:
samples_heir = sm_exp_h.sampling(data=data, control=dict(adapt_delta=0.999))
bebi103.stan.check_all_diagnostics(samples_heir)
n_eff / iter looks reasonable for all parameters.
Rhat looks reasonable for all parameters.
0.0 of 4000 (0.0%) iterations ended with a divergence.
0 of 4000 (0.0%) iterations saturated the maximum tree depth of 10.
E-BFMI indicated no pathological behavior.
Out[17]:
0

Looks good so far! We will take a look at the samples before moving forward.

In [18]:
samples_heir
Out[18]:
Inference for Stan model: anon_model_a686b97ec2ae34cc221d796d25f1eaf0.
4 chains, each with iter=2000; warmup=1000; thin=1; 
post-warmup draws per chain=1000, total post-warmup draws=4000.

               mean se_mean     sd   2.5%    25%    50%    75%  97.5%  n_eff   Rhat
tau_tilde[0]  -0.15    0.02   0.97  -2.05  -0.83  -0.13   0.52    1.7   3233    1.0
tau_tilde[1]   0.21    0.02   0.97  -1.71  -0.46   0.21   0.91   2.07   2854    1.0
tau_tilde[2]  -0.19    0.02   1.01  -2.16  -0.86   -0.2   0.48   1.81   3257    1.0
sigma           8.0    0.12   6.22   0.21   3.05   6.69  11.76  23.12   2751    1.0
theta        115.26    0.61  29.18  66.39  94.76 111.87  132.7 180.79   2258    1.0
tau[0]       113.48    0.63  30.19  61.89  92.15 109.95 131.37 182.44   2289    1.0
tau[1]       117.94    0.58  28.91  71.09  96.98 114.26 135.65 184.45   2498    1.0
tau[2]        112.8    0.64  30.76  60.46  91.23 109.52 131.76 180.44   2276    1.0
beta_[0]     9.5e-3  5.1e-5 2.7e-3 5.5e-3 7.6e-3 9.1e-3   0.01   0.02   2788    1.0
beta_[1]     9.0e-3  3.9e-5 2.2e-3 5.4e-3 7.4e-3 8.8e-3   0.01   0.01   3257    1.0
beta_[2]     9.6e-3  5.6e-5 2.9e-3 5.5e-3 7.6e-3 9.1e-3   0.01   0.02   2729    1.0
t_ppc[0]     113.08    1.89 119.84   2.73  30.03  75.17 155.03 432.86   4000    1.0
t_ppc[1]     111.86    1.95 120.02   2.49   29.4  73.15 153.65 447.65   3793    1.0
t_ppc[2]     115.39    1.93 121.88   2.64  31.17  76.74  157.9 436.12   3980    1.0
t_ppc[3]     114.97    1.99 123.11   2.79   30.2  74.77 158.07 444.82   3839    1.0
t_ppc[4]     112.61    1.95 123.61   2.43  30.79  74.78 147.45 456.66   4000    1.0
t_ppc[5]     117.47    2.05  126.5   2.82  32.01  77.92 157.72 471.09   3822    1.0
t_ppc[6]     118.47    2.07 128.47    2.7  30.95   77.6  159.2  464.5   3838    1.0
t_ppc[7]     120.13    2.13 130.74   2.65  32.24  78.52 163.15 458.92   3777    1.0
t_ppc[8]     121.41    2.09 127.31   2.47  33.67  80.06  167.5 478.95   3727    1.0
t_ppc[9]     118.82    2.01 125.85   2.86   32.6  79.94 161.37 459.03   3910    1.0
t_ppc[10]    113.23    1.96 118.94   3.31  31.99  73.91 153.29 444.71   3684    1.0
t_ppc[11]    114.43    1.94 122.48    3.0  31.88  76.84 152.82 454.72   4000    1.0
t_ppc[12]    112.36    1.91 120.49    3.2  30.95  74.77 150.96 431.16   4000    1.0
t_ppc[13]    111.49    1.97 121.32    2.2   30.4  73.15  148.8 444.59   3798    1.0
t_ppc[14]    116.01    2.02  125.7   2.95  30.66  76.39 157.06 462.83   3859    1.0
log_lik[0]    -5.26  2.4e-3   0.12  -5.54  -5.33  -5.25  -5.17   -5.1   2238    1.0
log_lik[1]    -5.17  2.9e-3   0.14  -5.48  -5.26  -5.15  -5.07  -4.93   2332    1.0
log_lik[2]    -5.17  2.9e-3   0.14  -5.48  -5.26  -5.15  -5.07  -4.93   2332    1.0
log_lik[3]    -4.98  3.9e-3   0.19  -5.37  -5.11  -4.97  -4.85  -4.61   2457    1.0
log_lik[4]    -5.74  1.3e-3   0.06   -5.9  -5.75  -5.72   -5.7   -5.7   2446    1.0
log_lik[5]    -5.73  1.0e-3   0.04  -5.85  -5.74  -5.71   -5.7   -5.7   1644    1.0
log_lik[6]    -6.36  2.7e-3   0.17   -6.8  -6.43  -6.31  -6.24  -6.19   3685    1.0
log_lik[7]    -6.36  2.7e-3   0.17   -6.8  -6.43  -6.31  -6.24  -6.19   3685    1.0
log_lik[8]     -6.0  1.4e-3   0.08  -6.23  -6.02  -5.97  -5.95  -5.94   3641    1.0
log_lik[9]     -6.9  5.0e-3    0.3  -7.64  -7.05  -6.84  -6.68  -6.52   3522    1.0
log_lik[10]   -5.17  3.0e-3   0.14  -5.47  -5.26  -5.15  -5.06  -4.93   2311    1.0
log_lik[11]   -5.36  2.1e-3   0.09  -5.59  -5.41  -5.34  -5.28  -5.25   2078    1.0
log_lik[12]   -4.78  5.0e-3   0.25  -5.25  -4.96  -4.79  -4.62  -4.27   2524    1.0
log_lik[13]   -5.36  2.1e-3   0.09  -5.59  -5.41  -5.34  -5.28  -5.25   2078    1.0
log_lik[14]   -5.17  3.0e-3   0.14  -5.47  -5.26  -5.15  -5.06  -4.93   2311    1.0
lp__         -84.31    0.05   1.69  -88.6 -85.13 -83.94 -83.06 -82.12   1373    1.0

Samples were drawn using NUTS at Fri Nov 30 00:28:32 2018.
For each parameter, n_eff is a crude measure of effective sample size,
and Rhat is the potential scale reduction factor on split chains (at 
convergence, Rhat=1).

We next look at the parallel coordinate plot for our parameters.

In [19]:
df_mcmc_noncentered = bebi103.stan.to_dataframe(samples_heir)

# Which parametets to use in parallel coordinate plot
pars = (['theta', 'sigma'] +
        list(df_mcmc_noncentered.columns[df_mcmc_noncentered.columns.str.contains('tau_')]) + 
        ['lp__'])

# Transformations to get them all on the same scale
transformation = (  [lambda x: x / 10, None, None] 
                  + [lambda x: x / 10]*(len(pars)-4)
                  + [lambda x: x / np.abs(x).max()])

bokeh.io.show(bebi103.viz.parcoord_plot(samples_heir,
                                        pars=pars,
                                        transformation=transformation,
                                        xtick_label_orientation='vertical'))

Next, we take a look at the corner plot for the hyperparameters, since these are what we are really interested in.

In [20]:
bokeh.io.show(bebi103.viz.corner(samples_heir, 
                                 pars=['theta', 'sigma']))

As a last step, we will look at the posterior predictive check with an ECDF and see how this model compared to the previous exponential one.

In [21]:
df_t_ppc = bebi103.stan.extract_array(samples_heir, 't_ppc')
p = bebi103.viz.ecdf(t, x_axis_label='time',
                     color='orange',
                     level='overlay')

# Plot posterior predictive ECDFs
for i in df_t_ppc['chain_idx'].unique()[::40]:
    p = bebi103.viz.ecdf(df_t_ppc.loc[df_t_ppc['chain_idx']==i, 't_ppc'],
                         alpha=0.1, p=p)

bokeh.io.show(p)

We got quite similar results here. A next step might be to implement this whole analysis again for the Gamma distribution and see how this compares.