Recitation 9: Discrete parameters and Stan


Throughout the term, we have used Stan to great effect to sample from our bespoke statistical models. However, one caveat of the Hamiltonian Monte Carlo is that it can only sample continuous parameters because it relies on concepts of classical mechanics. Fortunately, our old friend marginalization allows us to get around this limit of HMC and handle discrete parameters in our models. This notebook has a few examples of how to handle discrete variables in Stan: two from the ecological realm of mark and recapture experiments, and one from our old friend the microtubule catastrophe data.

[1]:
import numpy as np
import scipy.special
import pandas as pd

import bebi103

import bokeh
import bokeh.io
import bokeh_catplot

import cmdstanpy
import arviz as az

bokeh.io.output_notebook()
Loading BokehJS ...

Handling log probabilities of discrete valued vectors in Stan


A need to quantifying the size of natural populations of animals often arises in ecology. A popular approach to count populations is to mark them with a label of some type like a leg bands on a birds or a fin tags on a shark. In 1970, Stradling marked ants by feeding them with radioactive phosphorous (32P), which would get integrated into their tissue and persist over time without transferring to their other ants.

Myrmica rubra ants

Worker Myrmica rubra ants. Image by Gary Alpert, can be found here

After labeling a bunch of animals, Stradling released them back into the population and allowed them to mix. Then, he re-sampled and counted how many of the labeled animals he pulled from the population. As an introduction to the concepts used in handling discrete parameters, I will first code up how to quantify the population size based on this information in Stan (a brute force type approach) before mixing in continuous variables with some other mark and recapture data. It is quite apparent that Stradling’s data matches the story of the hypergeometric distribution. We will call the radioactively labeled ants the ‘white balls’, and the unlabeled ones the ‘black balls’, which are mixed together in the colony and then samples without replacement. Hence, we have

\begin{align} n &\sim \text{hypergeometric}(N, a, b) \end{align}

Where \(N\) is the number of animals sampled from the mixed labeled-unlabeled population, \(n\) is the number of labeled animals drawn, \(a\) is the number of animals labeled before release, and \(b\) is the number of unlabeled animals. With this formulation, the population size of ants in a colony is given by \(a+b\). We could get a very good approximation of this with the Binomial distribution most of the time, i.e. we look at the success rate \(\theta\) of drawing a marked individual and use that as a ratio of marked to total population to get the answer. This would probably work well since ant populations are often pretty large, as are the number of ants labeled and sampled, but we are going to be sticklers today and insist that we use discrete quantities. Now we need to think about a prior for \(b\) to finish building the model. Since we are sticking with discrete, I chose a negative binomial, since it can be shaped as a broad/peaked distribution.

\begin{align} b &\sim \text{negative binomial}(\mu=1000, \phi=2) \end{align}

We can do a quick prior predictive check in numpy to see how this looks. For this, we will assume an experimental design where 100 animals were marked to begin, and the subsequent recaptures are also of 100 individuals.

[2]:
a = 100
N = 100
n = []
for i in range(1000):
    b = np.random.negative_binomial(2, 2/(1000+2))
    n.append(np.random.hypergeometric(a, b, N))
bokeh.io.show(bebi103.viz.ecdf(n))

This makes sense: the prior allows the population to be nearly all included in the initial sample of 100 animals, or could be much bigger, in which case the number of animals re-captured goes towards 0. Looking back, this model doesn’t even have any continuous parameters at all! What can Stan possibly bring to the table? Well, it would clearly be quite easy to brute force plot the posterior for \(b\) as we did for the worm reversal probability in a the HW (seems like ages ago!). We can do a similar thing in Stan, taking advantage of it’s built in log pmf functions, and it’s nifty log_sum_exp function. Basically, the name of the game is to decide a range of values for \(b\) that we will look at, evaluate the model at all these values, and plot the result. Note that we know the value for \(b\) must be at least as large as \(N\)-\(n\) (the number of animals that were not marked in our re-sample).

[3]:
sm = cmdstanpy.CmdStanModel(stan_file='mark_recapture_hypergeometric.stan')
print(sm.code())
INFO:cmdstanpy:stan to c++ (/Users/bois/Dropbox/git/bebi103_course/2020/b/content/recitations/recitation_09/mark_recapture_hypergeometric.hpp)
INFO:cmdstanpy:compiling c++
INFO:cmdstanpy:compiled model file: /Users/bois/Dropbox/git/bebi103_course/2020/b/content/recitations/recitation_09/mark_recapture_hypergeometric
data {
  // Measured data
  int a;
  int N;
  int n;
  int max_pop;
  int min_pop;
  real mu;
  real phi;

}

transformed data{
  int b[max_pop-min_pop];
  for(i in 1:max_pop-min_pop){
    b[i] = i+min_pop;
  }
}

transformed parameters{
  vector[max_pop-min_pop] lp;
  for(i in 1:max_pop-min_pop){
    lp[i] = neg_binomial_2_lpmf(b[i]|mu, phi) + hypergeometric_lpmf(n|N, a, b[i]);
  }
}


model {
  target += log_sum_exp(lp);
}

generated quantities{
  simplex[max_pop-min_pop] pState;
  int num[max_pop-min_pop];
  num = b;
  pState = exp(lp - log_sum_exp(lp));

}

The only other bit of funny business is going on in the generated quantities block. We are taking our unnormalized probability for \(b\) and normalizing it based on averaging over log posterior draws for a given discrete value, which marginalizes them out. In this case, we don’t have continuous parameters, but we are still normalizing based on the marginalized posterior probability. Let’s give this a spin and see what it yields! I will manually input some data from Stradling’s paper here to test with.

[4]:
a = 190
N = 346
n = 32

#a = 17
#N = 33
#n = 13

#a = 308
#N = 279
#n = 186

data = {
    'a': a,
    'N': N,
    'n': n,
    'max_pop': 4000,
    'min_pop': N-n,
    'mu': 1000.0,
    'phi': 2.0
}

Note that since we don’t have any model parameters, we must run the sampler with fixed_param=True. Also, we only need 1 for sample_iters since the calculations performed in the Stan code are deterministic (we aren’t actually sampling).

[5]:
samples = az.from_cmdstanpy(sm.sample(data=data, fixed_param=True, sampling_iters=1))
INFO:cmdstanpy:start chain 1
INFO:cmdstanpy:finish chain 1

For plotting the results, note that I am showing \(b+a\), as \(b\) represents the number of unlabeled ants and \(a\) is the number of labeled, so \(b+a\) is the total population size.

[6]:
p = bokeh.plotting.figure(plot_height=300, x_axis_label='number of ants in colony',
                          y_axis_label='posterior probability',
                          title='Distibution of ant colony size given data')
p.scatter(samples.posterior['num'].values[0][0]+a,
          samples.posterior['pState'].values[0][0])
bokeh.io.show(p)

This seems like a lot of work for where we’ve gotten! Though we have not yet sampled a model with both discrete and continuous parameters, this gives us all the building block we need in order to do so.

Discrete paremeters in Stan: integer \(\alpha\) for gamma distribution


We will build a model for experiment 5 from HW 6.2. Recall the description:

Experiment 5: You are studying a signaling pathway that, when the ligand binds the receptor, forms a multimeric protein complex at the intracellular side of the membrane. You assume that the rate of each protein joining this complex is roughly the same. You want to estimate the number of proteins in this complex, and the average rate at which they join. You employ a similar approach as in Experiment 3, using protein fusions that produce a fluorescent signal only when the complex is fully-formed. You use time-lapse microscopy to time how long it takes the complex to fully form after you add the ligand.

Although we always prefer to use real data in this class, we will simulate this process for the sake of time for this recitation. We will assume that the process is described by 3 arrivals of a Poisson process with the same arrival rate parameter. To simulate the experiment, we will make 50 data points of arrival times measured from the time-lapse microscope modeled as the sum of the arrival time of 3 successive and identical exponential distributions.

[7]:
al = 3
ra = 0.5
t = np.zeros(50)
for i in range(al):
    t+=np.random.exponential(1/ra, size=len(t))

We can now take a look at what the simulated data looks like

[8]:
p = bokeh_catplot.ecdf(t, x_axis_label='time fully formed complex')
bokeh.io.show(p)

With this in hand, we first apply a model that we know well: a gamma distribution as in the microtubule catastrophe data.

[9]:
data = {"t": t,
        "N": len(t),
        "a_min": 0,
        "a_max": 10}

sm = cmdstanpy.CmdStanModel(stan_file="mt_gamma.stan")
samples = az.from_cmdstanpy(sm.sample(data=data), posterior_predictive="t_ppc")
INFO:cmdstanpy:stan to c++ (/Users/bois/Dropbox/git/bebi103_course/2020/b/content/recitations/recitation_09/mt_gamma.hpp)
INFO:cmdstanpy:compiling c++
INFO:cmdstanpy:compiled model file: /Users/bois/Dropbox/git/bebi103_course/2020/b/content/recitations/recitation_09/mt_gamma
INFO:cmdstanpy:start chain 1
INFO:cmdstanpy:start chain 2
INFO:cmdstanpy:finish chain 1
INFO:cmdstanpy:finish chain 2
INFO:cmdstanpy:start chain 3
INFO:cmdstanpy:start chain 4
INFO:cmdstanpy:finish chain 3
INFO:cmdstanpy:finish chain 4

We now check the diagnostics.

[10]:
print(f"\nDiagnostics for gamma...")
bebi103.stan.check_all_diagnostics(samples)

Diagnostics for gamma...
Effective sample size looks reasonable for all parameters.
Rhat looks reasonable for all parameters.
0 of 4000 (0.0%) iterations ended with a divergence.
0 of 4000 (0.0%) iterations saturated the maximum tree depth of 10.
E-BFMI indicated no pathological behavior.
[10]:
0

We take a look at the posterior predictive distribution

[11]:
t_ppc = (
    samples.posterior_predictive["t_ppc"]
    .stack({"sample": ("chain", "draw")})
    .transpose("sample", "t_ppc_dim_0")
)

bokeh.io.show(
    bebi103.viz.predictive_ecdf(
        t_ppc,
        data=t,
        title="Gamma",
        diff=True,
        x_axis_label='time fully formed complex',
    )
)

We can see how the model parameters correspond to our known parameter values used to produce the data.

[12]:
bokeh.io.show(bebi103.viz.corner(samples, xtick_label_orientation=np.pi/4))

These don’t look bad, but it is pretty unsatisfying that we get an alpha value between 2 and 3, given that we know the underlying process must have an integer for \(\alpha\). We can implement this model explicitly by marginalizing over discrete values of \(\alpha\). This model is below.

[13]:
bebi103.stan.clean_cmdstan()
sm2 = cmdstanpy.CmdStanModel(stan_file="mt_gamma_integer_alpha.stan")
print(sm2.code())
INFO:cmdstanpy:stan to c++ (/Users/bois/Dropbox/git/bebi103_course/2020/b/content/recitations/recitation_09/mt_gamma_integer_alpha.hpp)
INFO:cmdstanpy:compiling c++
INFO:cmdstanpy:compiled model file: /Users/bois/Dropbox/git/bebi103_course/2020/b/content/recitations/recitation_09/mt_gamma_integer_alpha
data {
  int N;
  real t[N];
  int a_min;
  int a_max;
}


transformed data{
  int a[a_max-a_min];
  for(i in 1:a_max-a_min){
    a[i] = i+a_min;
  }
}

parameters {
  real<lower=0> beta_;
}

transformed parameters {
  real<lower=0> beta2 = beta_;
  vector[a_max-a_min] lp;
  for(i in 1:a_max-a_min){
    lp[i] = binomial_lpmf(a[i]|a_max, 0.5);
    for(j in 1:N){
      lp[i] = lp[i] + gamma_lpdf(t[j]|a[i],beta_);
    }
  }
}

model {
  beta_ ~ normal(0, 1);
  target += log_sum_exp(lp);
}


generated quantities {
  real t_ppc[N];
  simplex[a_max-a_min] pState;
  int num[a_max-a_min];
  vector[a_max-a_min] log_p_norm;
  num = a;
  log_p_norm = lp - log_sum_exp(lp);
  pState = exp(log_p_norm);

  real alpha = categorical_logit_rng(log_p_norm);
  for (i in 1:N) {
    t_ppc[i] = gamma_rng(alpha, beta_);
  }

}

One caveat of this modeling approach is that we must decide what range of values to look at our discrete parameter. Usually, if we are planning on using a discrete model for a process, we probably have at least fairly strong opinions about what the parameter value should approximately look like. For our case, I am forcing an upper limit on the parameter value by my choice of prior as a Binomial distribution.

[14]:
data = {"t": t, "N": len(t), "a_min": 0, "a_max": 7}
samples_discrete = az.from_cmdstanpy(
    sm2.sample(data=data, sampling_iters=20000), posterior_predictive="t_ppc"
)
INFO:cmdstanpy:start chain 1
INFO:cmdstanpy:start chain 2
INFO:cmdstanpy:finish chain 1
INFO:cmdstanpy:start chain 3
INFO:cmdstanpy:finish chain 2
INFO:cmdstanpy:start chain 4
INFO:cmdstanpy:finish chain 3
INFO:cmdstanpy:finish chain 4

We check the diagnostics.

[15]:
bebi103.stan.check_all_diagnostics(samples_discrete, var_names=['beta_'])
Effective sample size looks reasonable for all parameters.
Rhat looks reasonable for all parameters.
0 of 80000 (0.0%) iterations ended with a divergence.
0 of 80000 (0.0%) iterations saturated the maximum tree depth of 10.
E-BFMI indicated no pathological behavior.
[15]:
0

Looks good! Time to take a look at the marginalized posterior for \(\beta\), as well as our samples for \(\alpha\).

[16]:
p = bebi103.viz.corner(samples_discrete,
                        pars=['beta_', 'alpha'], xtick_label_orientation=np.pi/4, datashade=True)
bokeh.io.show(p)

Looks a bit odd! Let’s look at an ECDF of the \(\beta\) values.

[17]:
bokeh.io.show(bokeh_catplot.ecdf(samples_discrete.posterior.beta_.values.flatten(), x_axis_label="β"))

Well that looks funky! This is clearly a multi-modal posterior. If you think about the data we generated, this actually makes sense. The value of \(\beta\) depends strongly on how many arrivals we have: fewer arrivals means a slower arrival rate to give a similarly centered distribution. Even so, we can see that some intermediary values of \(\beta\) are preferred here. If you zoom in on the trace plot, you can see the walker shift from peak to peak in this posterior.

[18]:
bokeh.io.show(
    bebi103.viz.trace_plot(
        samples_discrete, pars=["beta_"], line_kwargs=dict(line_width=2)
    )
)

Let us take a look at our samples for \(\alpha\).

[19]:
bokeh.io.show(bebi103.viz.ecdf(samples_discrete.posterior['alpha'].values.ravel(), x_axis_label="α"))

Next, I will compare the samples we got for our discrete model with the ones from the regular gamma model.

[20]:
p = bebi103.viz.ecdf(samples_discrete.posterior['beta_'].values[0],
                     legend_label='discrete α', x_axis_label='β')
bebi103.viz.ecdf(samples.posterior['beta_'].values.ravel(), color='red',
                 p=p, legend_label='continuous α')
p.legend.location='bottom_right'
bokeh.io.show(p)

We can see that the parameter value takes on a similar range of values, though with our discrete model it favors a peaky distribution for the arrival rates governed by the alpha parameter. With a quick predictive ECDF we can confirm that the model offers a similar ability to generate the data.

[21]:
t_ppc = (
    samples_discrete.posterior_predictive["t_ppc"]
    .stack({"sample": ("chain", "draw")})
    .transpose("sample", "t_ppc_dim_0")
)


p1= bebi103.viz.predictive_ecdf(
        t_ppc,
        data=t,
        title="Gamma, discrete α",
        diff=True,
        x_axis_label='time fully formed complex',
    )

t_ppc = (
    samples.posterior_predictive["t_ppc"]
    .stack({"sample": ("chain", "draw")})
    .transpose("sample", "t_ppc_dim_0")
)

p2=  bebi103.viz.predictive_ecdf(
        t_ppc,
        data=t,
        title="Gamma",
        diff=True,
        x_axis_label='time fully formed complex',
    )
bokeh.io.show(bokeh.layouts.gridplot([p1, p2], ncols=1))

Nice! We can construct a model for discrete values of arrival numbers in a gamma distribution. However, the story doesn’t end here. Notice that there are multiple peaks in the posterior for \(\beta\) in this model. We can see that, due to the fairly small number of data points, there is quite a bit of overlap between the samples given for \(\beta\) for cases of \(\alpha=N\) and \(\alpha=N+1\). This ensures that the sampler is able to explore the posterior, and the chains mix. Here is the posterior again to illustrate this:

[22]:
p = bebi103.viz.corner(
    samples_discrete, pars=['beta_', 'alpha'], xtick_label_orientation=np.pi/4, datashade=True
)
bokeh.io.show(p)

There is, however, not necessarily going to be the case. If we had a larger number of data points, for example, the peakiness of the log posterior around a $ \beta`$ given by a particular :math:alpha` increases. As a result, the sampler has a much lower chance of crossing from one peak in the posterior to another. We can simulate this by upping the number of simulated data points and seeing if our model can handle it. We go from 50 to 500 data points.

[23]:
al = 3
ra = 0.5
t = np.zeros(500)
for i in range(al):
    t+=np.random.exponential(1/ra, size=len(t))

Now we try to sample again.

[24]:
data = {"t": t, "N": len(t), "a_min": 0, "a_max": 7}
samples_discrete = az.from_cmdstanpy(
    sm2.sample(data=data, sampling_iters=20000), posterior_predictive="t_ppc"
)
INFO:cmdstanpy:start chain 1
INFO:cmdstanpy:start chain 2
INFO:cmdstanpy:finish chain 2
INFO:cmdstanpy:start chain 3
INFO:cmdstanpy:finish chain 1
INFO:cmdstanpy:start chain 4
INFO:cmdstanpy:finish chain 3
INFO:cmdstanpy:finish chain 4

Time to take a look at the diagnostics.

[25]:
bebi103.stan.check_all_diagnostics(samples_discrete, var_names=['beta_'])
ESS for parameter beta_ is 7.072851746096688.
ESS / iter for parameter beta_ is 8.84106468262086e-05.
tail-ESS for parameter beta_ is 25.13405390430261.
ESS / iter for parameter beta_ is 0.00031417567380378266.
  ESS or tail-ESS below 100 per chain indicates that expectation values
  computed from samples are unlikely to be good approximations of the
  true expectation values.
  ESS / iter or tail-ESS / iter below 0.001 indicates that the effective
  sample size has likely been overestimated.
Rhat for parameter beta_ is 1.5277981479277805.
  Rank-normalized Rhat above 1.01 indicates that the chains very likely have not mixed
0 of 80000 (0.0%) iterations ended with a divergence.
0 of 80000 (0.0%) iterations saturated the maximum tree depth of 10.
E-BFMI indicated no pathological behavior.
[25]:
3

Oh no! These look very bad. We will look at the corner plot and trace plot to figure out what is going on.

[26]:
p = bebi103.viz.corner(samples_discrete,
                        pars=['beta_', 'alpha'], xtick_label_orientation=np.pi/4, datashade=True)
bokeh.io.show(p)

On face, this looks kind of ok. It looks like we once again have a peaky posterior for \(\beta\). Notice, though, that the sharpness/separation of these peaks is much larger before. Let’s look at the trace plot.

[27]:
bokeh.io.show(
    bebi103.viz.trace_plot(
        samples_discrete, pars=["beta_"], line_kwargs=dict(line_width=2)
    )
)

Oof. The chains are clearly not mixing. Based on this, it seems clear that the sampler is arriving at one peak for \(\beta\) based on where the walker started, and getting stuck there. With a huge number of steps, it might eventually make it over to the other peak, but the odds of that are quite low. This means that we are not sampling from the posterior. What to do about this? Well, we can once again take advantage of marginalization to try a different implementation of a similar approach. For this case, we will sample from \(N\) gamma distributions, each with a set integer value for \(\alpha\). We can then marginalize over the samples to get a value for the probability of a given \(\alpha\) given the model. In this way, we sample from the models independently, so we don’t have to worry about having the sampler make it from one peak to another in the posterior space. We will first sample with the larger dataset, and then compare this model to our previous one with the smaller dataset.

[28]:
sm = cmdstanpy.CmdStanModel(stan_file='mt_gamma_integer_alpha_jb.stan')
print(sm.code())
INFO:cmdstanpy:stan to c++ (/Users/bois/Dropbox/git/bebi103_course/2020/b/content/recitations/recitation_09/mt_gamma_integer_alpha_jb.hpp)
INFO:cmdstanpy:compiling c++
INFO:cmdstanpy:compiled model file: /Users/bois/Dropbox/git/bebi103_course/2020/b/content/recitations/recitation_09/mt_gamma_integer_alpha_jb
data {
  int N;
  real t[N];

  int alpha;

  int N_ppc;
}


parameters {
  real<lower=0> beta_;
}


model {
  beta_ ~ normal(0, 1);
  t ~ gamma(alpha, beta_);
}


generated quantities {
  real t_ppc[N_ppc];
  real log_lik[N];
  real lp = normal_lpdf(beta_ | 0, 1);
  real log_alpha_prior = binomial_lpmf(alpha|7, 0.5);

  for (i in 1:N_ppc) {
    t_ppc[i] = gamma_rng(alpha, beta_);
  }

  for (i in 1:N) {
    log_lik[i] = gamma_lpdf(t[i] | alpha, beta_);
  }

  lp += sum(log_lik) + log_alpha_prior;
}

Time to sample!

[29]:
ms = list(range(1, 8))
N_ppc = len(t)
datas = {m: dict(t=t, N=len(t), alpha=m, N_ppc=N_ppc) for m in ms}

with bebi103.stan.disable_logging():
    samples = {
        m: az.from_cmdstanpy(
            sm.sample(data=datas[m], sampling_iters=3000),
            log_likelihood="log_lik",
            posterior_predictive="t_ppc"
        )
        for m in ms
    }

log_q = np.array(
    [
        scipy.special.logsumexp(samp.posterior.lp.values.flatten())
        - np.log(len(samp.posterior.lp.values.flatten()))
        for m, samp in samples.items()
    ]
)

log_norm_const = scipy.special.logsumexp(log_q)
log_g = log_q - log_norm_const

g = np.exp(log_g)

We will now take a look at the posterior for the \(\alpha\)

[30]:
p = bokeh.plotting.figure(
    frame_height=200,
    frame_width=400,
    x_axis_label='α',
    y_axis_label='g(α | t)'
)

p.circle(ms, g, size=10)

bokeh.io.show(p)

With the larger dataset, the model very heavily favors the known value that we used to generate the data. This is no surprise. We will use the posterior for \(\alpha\) to draw samples for \(\beta\) from the combined model: the probability of drawing \(\beta\) from the samples for a given value of \(\alpha\) is proportional to the posterior probability of \(\alpha\).

[31]:
betas = [
    np.random.choice(samples[np.random.choice(len(g), p=g)+1].posterior.beta_.values.flatten())
    for i in range(10000)
]
p = bebi103.viz.ecdf(betas, x_axis_label="β")
bokeh.io.show(p)

This ends up simply pulling samples from the \(\alpha=3\) case, since this is very much favored by the model. We now return to the smaller dataset and check that our models correspond with each other.

[32]:
al = 3
ra = 0.5
t = np.zeros(50)
for i in range(al):
    t+=np.random.exponential(1/ra, size=len(t))

Now we try to sample again.

[33]:
data = {"t": t, "N": len(t), "a_min": 0, "a_max": 7}
samples_discrete = az.from_cmdstanpy(
    sm2.sample(data=data, sampling_iters=20000), posterior_predictive="t_ppc"
)
INFO:cmdstanpy:start chain 1
INFO:cmdstanpy:start chain 2
INFO:cmdstanpy:finish chain 2
INFO:cmdstanpy:start chain 3
INFO:cmdstanpy:finish chain 1
INFO:cmdstanpy:start chain 4
INFO:cmdstanpy:finish chain 4
INFO:cmdstanpy:finish chain 3

Now we sample from our second model implementation.

[34]:
ms = list(range(1, 8))
N_ppc = len(t)
datas = {m: dict(t=t, N=len(t), alpha=m, N_ppc=N_ppc) for m in ms}

with bebi103.stan.disable_logging():
    samples = {
        m: az.from_cmdstanpy(
            sm.sample(data=datas[m], sampling_iters=3000),
            log_likelihood="log_lik",
            posterior_predictive="t_ppc"
        )
        for m in ms
    }

log_q = np.array(
    [
        scipy.special.logsumexp(samp.posterior.lp.values.flatten())
        - np.log(len(samp.posterior.lp.values.flatten()))
        for m, samp in samples.items()
    ]
)

log_norm_const = scipy.special.logsumexp(log_q)
log_g = log_q - log_norm_const

g = np.exp(log_g)

With these samples in hand, we can compare the posteriors of \(\alpha\), as well as for \(\beta\).

[35]:
p = bokeh.plotting.figure(
    frame_height=200,
    frame_width=400,
    x_axis_label='α',
    y_axis_label='g(α | t)'
)

p.circle(ms, g, size=10)

for s in range(7):
    p.diamond(s+1, samples_discrete.posterior.pState.values[:,:,s].flatten().mean(),
              size=10, alpha=1, color='red')

bokeh.io.show(p)

Nice! These look quite similar to each other. Note that since we are getting these posteriors by marginalizing over samples, they may differ some based on stochasticity in the samples. We now generate some \(\beta\) values from our second model.

[36]:
betas = [
    np.random.choice(samples[np.random.choice(len(g), p=g)+1].posterior.beta_.values.flatten())
    for i in range(10000)
]

Now we can plot the posterior for \(\beta\).

[37]:
p = bebi103.viz.ecdf(
    samples_discrete.posterior.beta_.values.flatten(),
    x_axis_label="β",
    color='red',
    legend_label='model 1'
)
bebi103.viz.ecdf(betas, x_axis_label="β", p=p, legend_label='model 2')
p.legend.location='bottom_right'
bokeh.io.show(p)

These models yield quite similar results for the posteriors! Overall, it seems that working with discrete parameters in a case like this is only useful if we are somewhat data poor or working with an intermediate number of states. At small numbers of arrivals, we see a huge difference in \(\beta\) for a change in \(\alpha\), which makes the posterior for \(\beta\) peaky and impossible to sample. With a large number of arrivals, we would end up with a large range of possible arrival number, which makes it odd to even work with discrete parameters (the continuous approximation doesn’t really lose anything). The computation also becomes monstrous for a large number of possible states. For this type of model, it seems like using discrete sampling is limited in usefulness.

[38]:
bebi103.stan.clean_cmdstan()

Computing environment

[39]:
%load_ext watermark
%watermark -v -p numpy,pandas,numba,cmdstanpy,arviz,bebi103,bokeh_catplot,jupyterlab
CPython 3.7.6
IPython 7.12.0

numpy 1.18.1
pandas 0.24.2
numba 0.48.0
cmdstanpy 0.8.0
arviz 0.6.1
bebi103 0.0.53
bokeh_catplot 0.1.8
jupyterlab 1.2.6