2. Plotting posteriors
[2]:
import numpy as np
import pandas as pd
import scipy.integrate
import scipy.stats as st
import tqdm
import bebi103
import bokeh.io
bokeh.io.output_notebook()
import holoviews as hv
hv.extension('bokeh')
bebi103.hv.set_defaults()
Our goal in performing parameter estimation in a Bayesian context is to obtain information about the posterior probability. If we can write down the likelihood \(f(y\mid \theta)\) and the prior \(g(\theta)\) (or equivalently the joint \(\pi(y, \theta)\)), which we always need to do anyway to specify a generative model, we automatically have an analytical expression for the posterior,
\begin{align} g(\theta \mid y) = \frac{f(y\mid \theta)\,g(\theta)}{\int \mathrm{d}\theta\,f(y\mid \theta)\,g(\theta)}. \end{align}
So it’s that simple, right? We’ve got an analytical expression for the posterior! The problem is that we almost never can make any sense of the analytical expression for the posterior. We need to either plot it or compute expectation values from it. This is the whole bread-and-butter of Bayesian inference: figuring out how to plot or compute expectation values of the posterior. We will focus on expectation values for the majority of the course, since plots are usually inaccessible. However, for models that have few (like one, two, or three) parameters, we can directly compute and plot the posterior. This is a viable option and gives a complete picture of the posterior.
The data set
We will explore plotting posterior distributions using the data set of mitotic spindle length as a function of the diameter of encapsulating droplets from experiments performed by Matt Good. We visited this data set last term, and you should refamiliarize yourself with it.
As a reminder of the data set, let’s make a quick plot of the spindle length versus droplet data, which is available here: https://s3.amazonaws.com/bebi103.caltech.edu/data/good_invitro_droplet_data.csv. We will pull out the data of interest as Numpy arrays. We do this because we need them for speed later.
[3]:
# Load in Data Frame
df = pd.read_csv(os.path.join(data_path, 'good_invitro_droplet_data.csv'), comment='#')
# Pull out numpy arrays
ell = df['Spindle Length (um)'].values
d = df['Droplet Diameter (um)'].values
# Make a plot
hv.Scatter(
data=df,
kdims='Droplet Diameter (um)',
vdims='Spindle Length (um)',
)
[3]:
Our objective is to use these data to obtain parameter estimates under two different models for spindle size. Remember, by “parameter estimation,” we mean that we wish to compute a posterior distribution for the parameter set \(\theta\), given the data. That is, we want to compute \(g(\theta \mid d, l)\), where \(d\) and \(l\) are respectively the set of droplet diameters and spindle lengths. Beyond computing the posterior, we wish to obtain summaries of it.
Models for spindle size
The parameter sets are defined by one of two models for spindle size. The first model, which we shall call the independent size model, posits that spindle length \(l\) is independent of droplet diameter \(d\). In this model
\begin{align} l = \phi, \end{align}
where \(l\) is the measured length of the spindle and \(\phi\) is the fixed spindle size. For our generative model for spindle length, we assume they are Normally distributed about \(\phi\) with a scale parameter \(\sigma\). We also need to specify priors for \(\phi\) and \(\sigma\). We will assume a Uniform prior for \(\phi\) (\(g(\phi) = \text{constant}\)) and a Jeffreys prior on \(\sigma\) (\(g(\sigma) = 1/\sigma\)). These are both improper priors, and we will discuss priors in much more depth in coming lessons. For now, we will take these priors as given and focus on how we might plot the posterior.
So, for the independent size model, we have
\begin{align} &g(\phi) = \text{constant} \\[1em] &g(\sigma) = 1/\sigma \\[1em] &l_i \sim \text{Normal}(\phi, \sigma) \;\forall i, \end{align}
where implicit in the definitions are that \(\phi\) and \(\sigma\) are both positive. We have used the notation \(g(\phi) = \text{constant}\) instead of something like \(\phi \sim \text{SomeDistribution}\) because the priors for \(\phi\) and \(\sigma\) are not proper probability distributions because they cannot be normalized. Such priors are called improper priors. Improper priors are widely used, and we will use them in the course, particularly at the beginning of the course, but I generally do not advocate for their use. I will clarify this concern in coming weeks, but for now we will use these improper priors.
We will use the independent size model in this part of the lesson and turn to the other model, termed the tubulin conservation model, in the next part.
Plotting the posterior for a single parameter
The independent size model has two parameters. In order to plot the posterior for a single parameter, such as \(\phi\), we need to marginalize out the other parameter \(\sigma\).
Analytically marginalizing
We can analytically compute the marginalized posterior \(g(\phi\mid \{l_i\})\) for the independent size model. The posterior is proportional to the product of the likelihood and prior. The likelihood is
\begin{align} f(\{l_i\}\mid \phi, \sigma) = \prod_{i=1}^n \frac{\mathrm{e}^{-(l_i-\phi)^2/2\sigma^2}}{\sqrt{2\pi}\,\sigma} = \frac{1}{(2\pi)^{n/2} \sigma^n}\,\exp\left[-\frac{1}{2\sigma^2}\sum_{i=1}^n(l_i-\phi)^2\right], \end{align}
where \(n\) is the number of measurements. The prior is
\begin{align} g(\phi, \sigma) = g(\phi) g(\sigma) \propto 1/\sigma. \end{align}
Then, the posterior is
\begin{align} g(\phi, \sigma \mid \{l_i\}) \propto f(\{l_i\}\mid \phi, \sigma)g(\phi, \sigma) \propto \frac{1}{\sigma^{n+1}}\,\exp\left[-\frac{1}{2\sigma^2}\sum_{i=1}^n(l_i-\phi)^2\right] \end{align}
The marginal posterior is then
\begin{align} g(\phi\mid \{l_i\}) \propto \int_0^\infty \mathrm{d}\sigma \,\frac{1}{\sigma^{n+1}}\,\exp\left[-\frac{1}{2\sigma^2}\sum_{i=1}^n(l_i-\phi)^2\right]. \end{align}
This integral is difficult to evaluate, but an analytical expression can be derived. Specifically, we can use the handy result that for \(x > 0\),
\begin{align} \int_0^\infty \mathrm{d}\sigma \,\frac{1}{\sigma^{n+1}}\,\mathrm{e}^{-x/2\sigma^2} \propto x^{-n/2}. \end{align}
Substituting \(\sum_{i=1}^n(l_i-\phi)^2\) for \(x\) gives us the marginalized posterior.
\begin{align} g(\phi\mid \{l_i\}) \propto \left(\sum_{i=1}^n(l_i-\phi)^2\right)^{-n/2}. \end{align}
Computing and plotting the marginalized posterior
Even with the above analytical expression, computing the marginalized posterior is a bit tricky. First, we only know the posterior up to a constant. Secondly, if we computed the expression above directly, we would hit underflow errors.
In almost any circumstance we do not work with the posterior directly, but with its logarithm. In this case, the log posterior is
\begin{align} \ln g(\phi \mid \{l_i\}) = \text{constant} - \frac{n}{2}\,\ln\left(\sum_{i=1}^n(l_i-\phi)^2\right). \end{align}
This we can directly compute. The sum inside the logarithm is well-behaved and we do not have risk of underflow errors. If we did, we could use the LogSumExp trick. We can proceed to compute and plot the posterior.
We first write a function for the log marginalized posterior.
[4]:
def log_marginalized_posterior(phi, ell):
"""The log of the marginalized posterior (up to
additive constant) evaluated at a single value of
parameter phi for measured data ell.
"""
# Zero probability of negative phi
if phi < 0:
return -np.inf
return -len(ell) * np.log(np.sum((ell - phi)**2)) / 2
To compute the log marginalized posterior, we first make an array of \(\phi\) values to get a smooth curve and then evaluate the log marginalized posterior for each of those \(\phi\) values. We make sure to have the breadth of \(\phi\) large enough to capture interesting features. In fact, we can prove (with a little algebraic grunge) that there is a single peak, and this is the dominant feature, and this is what we want to be sure to capture.
[5]:
phi = np.linspace(0, 100, 200)
log_marg_post = np.array([log_marginalized_posterior(phi_val, ell) for phi_val in phi])
Now that we have our values, we can make the plot!
[6]:
hv.Curve(
(phi, log_marg_post),
kdims=['φ (µm)'],
vdims=['ln g(φ | {lᵢ})'],
).opts(
xlim=(0, 100)
)
[6]:
It is often useful to plot the log posterior before plotting the actual posterior because it is easier to see where features are. The peak is around \(\phi = 33\) µm, and has decayed away several orders of magnitude about three µm on either side. We will therefore consider \(30\) µm \(\le \phi \le 36\) µm in our plot.
We can exponentiate the log marginal posterior to plot the posterior. The problem is that the log marginal posterior as we have calculated has a maximum that is very small, so exponentiating it will result in underflow. We should therefore scale the log marginal posterior so that its maximum is of order zero. We do this by subtracting its maximal value.
[7]:
# Compute scaled log marg posterior over new range of φ
phi = np.linspace(30, 36, 200)
log_marg_post = np.array([log_marginalized_posterior(phi_val, ell) for phi_val in phi])
# Compute maximum value of log marg posterior and subtract
log_marg_post_max = log_marg_post.max()
log_marg_post -= log_marg_post_max
# Exponentiate to get marginal posterior
marg_post = np.exp(log_marg_post)
Now, we can make a curve marginal posterior.
[8]:
hv.Curve(
(phi, marg_post),
kdims=['φ (µm)'],
vdims=['∝ g(φ | {lᵢ})'],
).opts(
xlim=(30, 36)
)
[8]:
Normalizing by numerical quadrature
This plot is sufficient for visualizing the posterior, but if we wish to compare it to other posteriors, we should normalize it so that the the y-axis is actually the probability density function and not just proportional to it. To properly scale the y-axis, we use the fact that we currently have something proportional to the marginal posterior. That is, we computed
\begin{align} g_\propto(\phi \mid \{l_i\}) = \left(\sum_{i=1}^n(l_i-\phi)^2\right)^{-n/2}, \end{align}
which is proportional to the true posterior, say with constant of proportionality \(1/a\).
\begin{align} g(\phi \mid \{l_i\}) = \frac{1}{a}\, g_\propto(\phi \mid \{l_i\}). \end{align}
To find the constant of proportionality, we use the fact that the marginal posterior, being a probability density function, is normalized.
\begin{align} \int_0^\infty \mathrm{d}\phi\;g(\phi \mid \{l_i\}) = \frac{1}{a}\, \int_0^\infty \mathrm{d}\phi\;g_\propto(\phi \mid \{l_i\}) = 1. \end{align}
Therefore, the constant of proportionality is
\begin{align} a = \int_0^\infty \mathrm{d}\phi\;g_\propto(\phi \mid \{l_i\}). \end{align}
We can analytically compute the integral in this case (it’s gnarly, but an analytical expression exists), but we cannot in general do that. We will therefore perform numerical quadrature to obtain \(a\).
We can use the scipy.integrate.quad()
function to perform the quadrature. We need to provide a function that takes \(\phi\) (the variable was a integrating over) as its first argument and returns the value of the function we are integrating. We need to make sure we avoid underflow errors by subtracting the maximum value of the log marginalized posterior we obtained.
[9]:
def marginalized_posterior(phi, log_marg_post_max, ell):
"""Unnormalized marginal posterior."""
return np.exp(log_marginalized_posterior(phi, ell) - log_marg_post_max)
Now, we can call scipy.integrate.quad()
. The first argument is the function that returns the integrand, in our case marg_post()
. The second and third arguments are the bounds of integration. You want to choose these to be wide enough so that any peaks in the probability density function you are integrating have decayed away to be close to zero, but you want the bounds to be close enough that the peak can be resolved. We will integrate over a domain of [30, 36] µm. We can pass additional
arguments to the integrand function using the args
kwarg.
The scipy.integrate.quad()
function returns two values, the value of the integration and the estimated error.
[10]:
a, err = scipy.integrate.quad(
marginalized_posterior, 30, 36, args=(log_marg_post_max, ell)
)
# Take a look
a, err
[10]:
(0.46386461926837497, 5.981239131261995e-12)
The integration appears to be successful, and we can use the result to plot the properly normalized marginalized posterior.
[11]:
hv.Curve(
(phi, marg_post / a),
kdims=['φ (µm)'],
vdims=['g(φ | {lᵢ}) (1/µm)'],
).opts(
xlim=(30, 36)
)
[11]:
A prescription for plotting 1D posteriors
Let us review the steps we took to generate the plot of the properly normalized marginal posterior. Starting with an expression for a one-dimensional posterior, we take the following steps to plot it.
Write a function to compute the unnormalized log posterior.
Plot the unormalized log posterior on a domain large enough to capture any peaks.
From the calculation of the unnormalized log posterior, store the maximum value encountered.
Write a function to compute the posterior from the unnormalized log posterior function written in part 1, making sure to subtract the maximum computed in part 3 to avoid underflow.
Perform numerical quadrature using bounds of integration that capture any peaks, but are not too broad such that the peaks are not easily found.
Normalize the posterior you compute in part (4) with the result of part (5) and plot.
Plotting a 2D posterior
Plotting a 2D posterior takes a similar approach. Here, tough, the normalization is not important because we do not really compare heights of distributions in 2D. Rather, we present them as images and/or contours.
As an example of plotting a 2D posterior, we will again use the posterior from the independent size model. We derived that to be
\begin{align} g(\phi, \sigma \mid \{l_i\}) \propto \frac{1}{\sigma^{n+1}}\,\exp\left[-\frac{1}{2\sigma^2}\sum_{i=1}^n(l_i-\phi)^2\right]. \end{align}
Therefore, the log posterior is
\begin{align} \ln g(\phi, \sigma \mid \{l_i\}) =\text{constant} - (n+1)\ln \sigma - \frac{1}{2\sigma^2}\sum_{i=1}^n(l_i-\phi)^2. \end{align}
We can write a function to compute it for a given pair of values, \(\phi\) and \(\sigma\). We will pass \(\phi\) and \(\sigma\) in as an array of parameters, since this will be what is expected later on from scipy.integration.quad()
and also from scipy.optimized.minimize()
what we will use later in the course when we do parameter estimation by optimization. Furthermore, we will explicitly code up the likelihood and prior, since this will help keep notation clean.
Note that we make sure all parameters are positive. This is necessary in the case of \(\sigma\) because \(\sigma\) must be positive in any Normal distribution. For \(\phi\), there is no mathematical reason why it must be positive, but physically we cannot have negative spindle lengths.
[12]:
def log_prior_indep_size(params):
"""Log prior for independent size model"""
if (params <= 0).any():
return -np.inf
phi, sigma = params
return -np.log(sigma)
def log_likelihood_indep_size(params, ell):
"""Log likelihood of independent size model"""
phi, sigma = params
return np.sum(st.norm.logpdf(ell, loc=phi, scale=sigma))
def log_posterior_indep_size(params, ell):
"""Log posterior for independent size model"""
lp = log_prior_indep_size(params)
if lp == -np.inf:
return lp
return lp + log_likelihood_indep_size(params, ell)
Aside: Speed of likelihood calculation
Note that we have made use of SciPy’s built-in probability distribution for the log likelihood and used the scipy.stats.norm.logpdf()
function. This means you have to write less lines of code and therefore prevent bugs. This does come at a speed trade-off, though, since SciPy’s functions have lots of overhead. We can compare the speed of evaluation of the likelihood if we hand-coded in the log of the Normal PDF.
[13]:
def log_likelihood_indep_size_numpy_only(params, ell):
"""Log likelihood of independent size model"""
phi, sigma = params
return -len(ell) * np.log(sigma) - np.sum((ell - phi)**2) / 2 / sigma**2
print("Time per likelihood using SciPy:")
%timeit log_likelihood_indep_size(np.array([33.0, 1.0]), ell)
print("\nTime per likelihood using only NumPy:")
%timeit log_likelihood_indep_size_numpy_only(np.array([33.0, 1.0]), ell)
Time per likelihood using SciPy:
102 µs ± 2.88 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
Time per likelihood using only NumPy:
13.5 µs ± 229 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
The calculation is nearly 10 times faster with the hand-coded version. This is a consideration if your calculations are slow and you need to speed them up. For now, though, we do not really need the speed.
Computing the log of a 2D posterior
As the dimensionality of the posterior increases, it becomes increasingly more difficult to find peaks and other features. The fraction of the volume of parameter space for which the posterior is nonnegligible shrinks rapidly as dimensionality grows. This is an instance of the curse of dimensionality. Just going from one to two dimensions can make things more difficult.
One issue is that variations in the log posterior are more difficult to visualize in two-dimensions than in one. This can be seen by plotting the log posterior and the posterior as image/contour plots. To construct these plots, we start by setting up the ranges of \(\phi\) and \(\sigma\) we are interested in. We will choose broad ranges.
[14]:
phi = np.linspace(10, 40, 200)
sigma = np.linspace(1, 8, 200)
Now we can loop over all \((\phi, \sigma)\) pairs to compute the log posterior.
[15]:
LOG_POST = np.zeros((len(sigma), len(phi)))
for j, phi_val in tqdm.tqdm(enumerate(phi)):
for i, sigma_val in enumerate(sigma):
LOG_POST[i, j] = log_posterior_indep_size(
np.array([phi_val, sigma_val]), ell
)
200it [00:04, 43.71it/s]
Since we will also plot the posterior, we can compute it, taking care to avoid under/overflow.
[16]:
POST = np.exp(LOG_POST - LOG_POST.max())
Now, we can plot the log posterior and the posterior, with the log posterior first. We can use Holoviews to make a contour using the hv.operation.contours()
function. We could do it as follows.
[17]:
# Creat HoloViews image
im = hv.Image((phi, sigma, LOG_POST), kdims=["φ (µm)", "σ (µm)"])
# Generate contours from image
hv.operation.contours(im, levels=5, overlaid=True).opts(
hv.opts.Contours(
cmap=["black"],
frame_height=300,
frame_width=300,
line_width=2,
show_legend=False,
)
)
[17]:
However, the bebi103 package offers a convenient way to make contour plots more directly (with wisely chosen levels for the contour lines).
[18]:
p = bebi103.viz.contour(
phi, sigma, LOG_POST, overlaid=True, x_axis_label="φ (µm)", y_axis_label="σ (µm)"
)
bokeh.io.show(p)
We can again use this function to make a contour plot of the posterior.
[19]:
p = bebi103.viz.contour(
phi,
sigma,
POST,
overlaid=True,
x_axis_label="φ (µm)",
y_axis_label="σ (µm)",
)
bokeh.io.show(p)
We can clearly see the small region of the peak when plotting the posterior, but it is much harder to see when plotting the log posterior. We can use this second plot to figure out where to zoom in to find the maximum, the region of parameter space of primary interest. We will plot the posterior for \(31 \le \phi \le 35\) µm and \(4 \le \sigma \le 5.5\) µm.
[20]:
phi = np.linspace(31, 35, 200)
sigma = np.linspace(4, 5.5, 200)
LOG_POST = np.zeros((len(sigma), len(phi)))
for j, phi_val in tqdm.tqdm(enumerate(phi)):
for i, sigma_val in enumerate(sigma):
LOG_POST[i, j] = log_posterior_indep_size(
np.array([phi_val, sigma_val]), ell
)
p = bebi103.viz.contour(
phi,
sigma,
np.exp(LOG_POST - LOG_POST.max()),
overlaid=True,
x_axis_label="φ (µm)",
y_axis_label="σ (µm)",
)
bokeh.io.show(p)
200it [00:04, 41.03it/s]
We can now clearly identify the most relevant region of the posterior and clearly make out the shape of this distribution.
Computing environment
[21]:
%load_ext watermark
%watermark -v -p numpy,pandas,scipy,tqdm,bokeh,holoviews,bebi103,jupyterlab
Python implementation: CPython
Python version : 3.9.7
IPython version : 7.29.0
numpy : 1.20.3
pandas : 1.3.5
scipy : 1.7.3
tqdm : 4.62.3
bokeh : 2.3.3
holoviews : 1.14.6
bebi103 : 0.1.10
jupyterlab: 3.2.1