Auxiliary Tutorial 9: Approximate Bayesian Computation

(c) 2017 Christina Su. This work is licensed under a Creative Commons Attribution License CC-BY 4.0. All code contained herein is licensed under an MIT License.

This tutorial was generated from a Jupyter notebook. You can download the notebook here.

In [7]:
import numpy as np
import scipy.stats as st
import numba

import bebi103

import bokeh.io
import bokeh.plotting
bokeh.io.output_notebook()
Loading BokehJS ...

In this tutorial, we will introduce the fundamental principles of approximate Bayesian computation (ABC), implement an example of ABC in practice, and discuss the strengths and weaknesses of this class of computational methods.

What is ABC?

Like MCMC methods, ABC is an approach to sampling posterior distributions. Throughout the course, we have started every problem with the following steps:

  1. Write down Bayes' theorem.
  2. Specify the likelihood and prior.

We can then go on to compute the posterior analytically, find the MAP estimate for the parameters, sample the posterior numerically, perform model selection, etc.

ABC takes a different route to approximate the posterior and can be thought of intuitively as a generative approach. The idea is as follows:

  1. Draw samples from the prior. For instance, if we used a uniform prior to describe the probability of worm reversal, we would draw random values $p$ in the interval [0, 1].
  2. Generate simulated data using those samples. To analyze worm reversal, we would simulate $n$ trials with probability of success $p$ (the value we sampled above) and count the number of successes or reversals $r$.
  3. Compare the simulated results to the observed data. If the trial above yields the same number of reversals as the experiment, the value $p$ is drawn from the posterior.
  4. Keep samples from the prior that can give rise to the observed results, as they represent samples from the posterior.

To be more precise, our goal is to sample the posterior $g \left( \theta \mid D \right)$, where $\theta$ denotes the parameter(s) of interest and $D$ the data. ABC then takes the following approach:

  1. Draw sample $\theta^*$ from the prior $g \left( \theta \right)$.
  2. Simulate data $D^*$ according to a generative likelihood $f \left( D \mid \theta \right)$.
  3. Check whether we have $D^* = D$.
  4. If the simulated results match the observed data, keep $\theta^*$ as a sample from the posterior $g \left( \theta \mid D \right)$. Repeat until the desired number of samples is drawn.

What are the strengths of ABC?

The underlying principles of ABC can be readily understood in the framework of Bayesian inference. From the discussion above, ABC differs from our standard Bayesian inference primarily in the specification of the likelihood. Its key advantage is that it does not require analytical formulation or evaluation of the likelihood. Instead, ABC only requires a generative procedure for simulating results given sampled values of the parameter(s). In the course, we have always been able to specify a computationally tractable likelihood to model the data. ABC opens up the realm of possible models to include those that cannot be readily computed but can be simulated.

In addition, a simple approach to ABC can be readily implemented, as we will show next.

How does ABC work in practice?

To illustrate the method of ABC, we will apply it to a study of the effect of parental attractiveness on offspring sex ratio. The researchers reported that "Very attractive individuals are 26% less likely to have a son." This finding attracted significant media attention and was covered by Discover, Daily Mail, and Telegraph, among others. Gelman also provided criticism. Let's take a closer look!

Understanding the Data

This study analyzed 2,972 people who had at least one biological child, using the sex of the first child to measure the offspring sex ratio. (Only the first child was considered, to eliminate possible confounding effects from couples' stopping rules.) The researchers were specifically interested in the effect of parental attractiveness, which was rated on a scale from 1 to 5. Their finding was based on those individuals who were rated "very attractive" (5) by the interviewer, or 11.2%. (Individuals also performed self-ratings but on a different scale, from 1 to 4. 28.2% rated themselves as "very attractive.") Among these parents, the proportion of boys was 0.44.

As far as I could tell, the researchers did not give the actual numbers of very attractive individuals or male infants, so we will round the values based on the specified proportions.

In [2]:
# Number of very attractive individuals
n = np.round(0.112 * 2972).astype(int)

# Number of male births
m = np.round(0.44 * n).astype(int)

print('Number of very attractive individuals:', n)
print('Number of male births:', m)
Number of very attractive individuals: 333
Number of male births: 147

Specifying the Model

With our data in hand, we can now specify the model. We would like to estimate the offspring sex ratio in this group of very attractive people. We will define our parameter of interest $p$ as the proportion of male births. For ABC, we need only have a generative model for the data given this parameter. Therefore, we can simulate the number of male births by taking the results of $n = 333$ "coin flips," each with probability $p$ of success. This function is very simple. I have included versions with and without numba, which performs just-in-time compilation and gives a good speed boost. The constraint is that only certain functions are allowed, but the support for numpy is very good.

In [3]:
def simulate_data_binomial_no_numba(p, n):
    '''
    Return number of successes in n trials,
    each with success probability p.
    '''
    data = np.random.rand(n)
    return np.sum(data < p)


@numba.jit(nopython=True)
def simulate_data_binomial(p, n):
    '''
    Return number of successes in n trials,
    each with success probability p.
    '''
    data = np.random.rand(n)
    return np.sum(data < p)

Note that there is no particular need to use ABC in this case, as the likelihood is simply a binomial distribution. Indeed, we can even compute the posterior analytically with certain choices of the prior. However, this example is just intended to illustrate the approach to ABC.

Specifying the Prior

We now need to define our prior for $p$. For convenience, I will use a beta distribution, since this distribution is defined on the appropriate interval of [0, 1] and can take a variety of forms depending on its parameters $\alpha$ and $\beta$. (The posterior can also be computed analytically with this choice of prior.) We will write a function to generate samples from the prior.

In [4]:
@numba.jit(nopython=True)
def sample_prior_beta(a, b, n_draws=1):
    '''
    Return n_draws samples from a beta distribution
    defined by shape parameters a and b.
    '''
    return np.random.beta(a, b, size=n_draws)

We now have a function that can sample generally from any beta distribution, so we simply need to choose the appropriate parameters $\alpha$ and $\beta$. One approach to choosing priors is to be as uninformative as possible. Here, we can choose a uniform prior, defined by $\alpha = \beta = 1$. Let's compare the samples to the true distribution. We will use histograms of the samples with PDFs instead of CDFs so we don't choke the browser with too many data points to plot.

In [36]:
# Set parameters for uninformative uniform prior
a_u = 1
b_u = 1

# Draw samples from uninformative prior
samples_u = sample_prior_beta(a_u, b_u, n_draws=100000)

# Compute true distribution
x_smooth = np.linspace(0, 1, 200)
pdf = st.beta.pdf(x_smooth, a=a_u, b=b_u)

# Compare samples to true distribution
p = bebi103.viz.histogram(samples_u,
                          bins=100,
                          line_width=2,
                          x_axis_label='Proportion of Male Births',
                          y_axis_label='Density',
                          title='Uninformative Prior')
p.line(x_smooth, pdf, line_color='tomato', line_width=6, alpha=0.8)
bokeh.io.show(p)

However, the prior is intended to capture our actual knowledge about the parameter(s) of interest. We don't actually think that the proportion of male births could be 0.1 or 0.95, for instance. According to the World Bank, there were 1.073 male births per female birth in 2017, meaning that the proportion of male births is about 0.5176. We'll choose an informative prior that is peaked around this value, since the sex ratio has been quite similar for the past few decades.

In [38]:
# Set parameters for informative prior
a_i = 500
b_i = 466

# Draw samples from informative prior
samples_i = sample_prior_beta(a_i, b_i, n_draws=100000)

# Compute true distribution
pdf = st.beta.pdf(x_smooth, a=a_i, b=b_i)
print('Mean value:', st.beta.mean(a_i, b_i))

# Compare samples to true distribution
p = bebi103.viz.histogram(samples_i,
                          bins=100,
                          line_width=2,
                          x_axis_label='Proportion of Male Births',
                          y_axis_label='Density',
                          title='Uninformative Prior')
p.line(x_smooth, pdf, line_color='tomato', line_width=6, alpha=0.65, line_join='bevel')
bokeh.io.show(p)
Mean value: 0.517598343685

Looks good! We can now move on to sampling the posterior.

Sampling the Posterior

We start by writing a generic function to implement ABC.

In [21]:
def sample_posterior(sample_prior, simulate, observed, n_draws=100,
                     prior_args=(), simulate_args=()):
    '''
    Take n_draws samples from posterior distribution using ABC, where
    sample_prior(*prior_args) should generate a single set of
    parameter values param and simulate(param, *simulate_args)
    should simulate a result that can be compared with observed.
    '''
    samples = np.zeros((n_draws, len(sample_prior(*prior_args))))
    for i in range(n_draws):
        while True:
            param = sample_prior(*prior_args)
            if simulate(param, *simulate_args) == observed:
                samples[i, :] = param
                break
    return samples.squeeze()

Excellent! We can now sample any posterior distribution simply by providing a function to sample the prior and a function to simulate the experiment. For our case, let's first compare the times when using a likelihood with and without numba.

In [22]:
%%timeit
# Likelihood without numba
post_i = sample_posterior(sample_prior_beta,
                          simulate_data_binomial_no_numba,
                          observed=m, n_draws=100,
                          prior_args=(a_i, b_i),
                          simulate_args=(n, ))
692 ms ± 63.2 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
In [23]:
%%timeit
# Likelihood with numba
post_i = sample_posterior(sample_prior_beta, simulate_data_binomial,
                          observed=m, n_draws=100,
                          prior_args=(a_i, b_i),
                          simulate_args=(n, ))
269 ms ± 30.4 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

Not too much of a difference in this case, but it can provide a huge speed boost in other applications! Let's now sample the posterior with the uninformative prior.

In [24]:
%%time
# Sample posterior with uninformative prior
post_u = sample_posterior(sample_prior_beta, simulate_data_binomial,
                          observed=m, n_draws=10000,
                          prior_args=(a_u, b_u),
                          simulate_args=(n, ))
CPU times: user 17.5 s, sys: 39.1 ms, total: 17.5 s
Wall time: 17.5 s

We repeat the process for the informative prior.

In [25]:
%%time
# Sample posterior with informative prior
post_i = sample_posterior(sample_prior_beta, simulate_data_binomial,
                          observed=m, n_draws=10000,
                          prior_args=(a_i, b_i),
                          simulate_args=(n, ))
CPU times: user 25.2 s, sys: 60 ms, total: 25.3 s
Wall time: 25.3 s

As mentioned above and discussed more fully in Lecture 3, the beta distribution is the conjugate prior to the binomial likelihood, and we can compute the posterior analytically. If the prior is given by $\text{Beta} \left( \alpha, \beta \right)$ and the data show $m$ successes in $n$ trials, the posterior is given by $\text{Beta} \left( m + \alpha, n - m + \beta \right)$. We can use this knowledge to compare the samples of the posterior to the true distribution.

In [41]:
# Analyze samples from posterior with uninformative prior
print('Posterior (Uninformative Prior)')
print('2.5th percentile:\t', np.percentile(post_u, 2.5))
print('Median:\t\t\t', np.median(post_u))
print('97.5th percentile:\t', np.percentile(post_u, 97.5))

# Analyze samples from posterior with informative prior
print('\nPosterior (Informative Prior)')
print('2.5th percentile:\t', np.percentile(post_i, 2.5))
print('Median:\t\t\t', np.median(post_i))
print('97.5th percentile:\t', np.percentile(post_i, 97.5))

# Compute true distributions
pdf_u = st.beta.pdf(x_smooth, a=m+a_u, b=n-m+b_u)
pdf_i = st.beta.pdf(x_smooth, a=m+a_i, b=n-m+b_i)

# Compare samples to true distribution with uninformative prior
p1 = bebi103.viz.histogram(post_u,
                           bins=100,
                           line_width=2,
                           x_axis_label='Proportion of Male Births',
                           y_axis_label='Density',
                           title='Posterior (Uninformative Prior)')
p1.line(x_smooth, pdf_u, line_color='tomato', line_width=6, alpha=0.65, line_join='bevel')

# Compare samples to true distribution with informative prior
p2 = bebi103.viz.histogram(post_i,
                           bins=100,
                           line_width=2,
                           x_axis_label='Proportion of Male Births',
                           y_axis_label='Density',
                           title='Posterior (Informative Prior)')
p2.line(x_smooth, pdf_i, line_color='tomato', line_width=6, alpha=0.65, line_join='bevel')

# Put x-axes on the same scale
p1.x_range = p2.x_range

p = bokeh.layouts.gridplot([p1, p2], ncols=1)
bokeh.io.show(p)
Posterior (Uninformative Prior)
2.5th percentile:	 0.388587607128
Median:			 0.441662473552
97.5th percentile:	 0.49489713009

Posterior (Informative Prior)
2.5th percentile:	 0.471009549187
Median:			 0.497902664458
97.5th percentile:	 0.525298094577

With an uninformative prior, the median is simply the observed proportion of male births in the data, and the 97.5th percentile is still below 0.50. With an informative prior, the observed data does shift the distribution, but the proportion of male births among very attractive people remains close to 0.50, as expected. In both cases, we see good agreement of the samples with the analytical posterior distribution, showing that ABC accurately samples the posterior. Do you believe that very attractive people are less likely to have boys?

What are the weaknesses of ABC?

As we have seen here, ABC is relatively easy to implement and can sample posterior distributions well. What are the limitations of this approach?

Continuous Distributions

In our example, we simulated a univariate likelihood that gave rise to a discrete variable: $m$, the number of male births in $n$ births. What if we were simulating a continuous variable? In that case, it is essentially impossible that a simulated value will be equal to the observed one, and our acceptance rate would be 0.

To solve this problem, we can simply accept samples where the simulated $D^*$ is "close enough" to $D$, often measured by Euclidean distance. If we are stringent enough on that condition, the samples should still approximate the posterior well. Of course, the trade off for accurate sampling is the computational inefficiency of a high rejection rate.

Multivariate Likelihoods

What if, instead of computing a single value $m$, our results $D$ consisted of 2, 20, or even 200 values? It would be highly unlikely that every single value of a simulated result would equal the actual values in the observed data, whether discrete or continuous.

An alternative is to compute summary statistics on the actual data $D$ and the simulated results $D^*$, such as the mean, median, and variance. If the summary statistics are close, then we accept that sample.

Conclusion

ABC provides an intuitive way to perform Bayesian inference and is particularly useful for sampling the posterior distribution when the likelihood is difficult to formulate explicitly or compute efficiently. Like any method, however, it is not necessarily suitable for every problem. In particular, the tradeoff between accurate sampling and computational efficiency is highly problematic when simulated results are unlikely to be equal or close enough to observed data.

Additional Resources

There are many great resources to learn more about ABC! For instance, these blog posts provide general introduction to the subject and include examples in Python and R. This lecture provides more information on the theoretical underpinnings of ABC.