BE/Bi 103, Fall 2014: Homework 2

Due 1pm, Monday, October 13

This homework was generated from an IPython notebook. You can download the notebook here.

Problem 2.1 (Effects of the prior, 40 pts)¶

As we have seen and will see in class, it is important to carefully choose the prior distribution. However, for many parameter estimation problems where there are many data, the contribution of the prior is overwhelmed by the likelihood, so the exact functional form (within some constraints) of the prior is less important. We will explore this computationally in this problem.

As in lecture and in Tutorial 2b, we will assume that the data all have errors drawn from an independent Gaussian sampling distribution with variance $\sigma^2$. We do not assume we know what $\sigma$ is. Our likelihood is

\begin{align} P(D~|~\mu, \sigma, I) = \left(\frac{1}{\sqrt{2\pi\sigma^2}}\right)^n \prod_{x_i\in D} \mathrm{e}^{-(x_i - \mu)^2/2\sigma^2}, \end{align}

where $n = |D|$ is the number of points in the data set. The posterior distribution is then

\begin{align} P(\mu, \sigma~|~D,I) \propto P(D~|~\mu, \sigma, I)\,P(\mu, \sigma~|~I), \end{align}

where the latter probability distribution is our prior.

I will demonstrate how to compute and plot the posterior with example fake data. Note that it is much easier to compute

\begin{align} \ln P(\mu, \sigma~|~D,I) = \text{constant} + \ln P(D~|~\mu, \sigma, I) + \ln P(\mu, \sigma~|~I) \end{align}

because of precision issues and other reasons. We do not really care about normalization, so we can set the constant to whatever we please to make exponentiation of the logarithm easier. In my example, we take a prior in which $\mu$ and $\sigma$ are independent, $\mu$ is uniform on the interval $[\mu_\mathrm{min},\mu_\mathrm{max}]$ and $\sigma$ has a Jeffreys prior (to be discussed in lecture Wednesday). I.e.,

\begin{align} P(\mu,\sigma~|~I) = P(\mu|I)\,P(\sigma|I) \propto \left\{ \begin{array}{cl} \sigma^{-1} & \text{for }\mu_\mathrm{min} < \mu < \mu_\mathrm{max},\;\sigma > 0,\\[1em] 0 & \text{otherwise}. \end{array} \right. \end{align}

To start the example calculation, I'll first define some handy functions for computing the posterior probability.

import numpy as np

# Define log of the prior 
# mu uniform on the interval mu_range[0] < mu < mu_range[1], sigma Jeffreys
def log_prior(mu, sigma, mu_range):
    """
    Log of the unnormalized prior for a single mu and sigma with
    a Jeffreys prior for sigma and uniform prior on mu_range for mu.
    """
    if sigma < 0.0 or mu < mu_range[0] or mu > mu_range[1]:
        return -np.inf
    else:
        return -np.log(sigma)

# Define log of the likelihood
def log_likelihood(data, mu, sigma):
    """
    Log of the unnormalized likelihood for a single mu and sigma 
    for data values data.
    """
    return -((data - mu)**2).sum() / (2 * sigma**2) - n * np.log(sigma)

# Define log of the posterior
def log_posterior(log_likelihood_fun, log_prior_fun, data, mu, sigma, 
                  mu_range):
    """
    Compute the log of the posterior, given an array of data, plus
    arrays of mu and sigma.  The functions defining the likelihood
    and prior are given by log_likelihood_fun and log_prior_fun, 
    respectively.
    
    Returns a 2D array of the posterior where entry i,j corresponds to
    mu[j], sigma[i].
    
    There are faster ways to calculate this, but we're doing it this way
    because we will use similar definitions when we do MCMC.
    """
    log_post = np.empty((len(sigma), len(mu)))
    for i in range(len(sigma)):
        for j in range(len(mu)):
            log_post[i, j] = log_likelihood_fun(data, mu[j], sigma[i]) \
                                + log_prior_fun(mu[j], sigma[i], mu_range)

    # Add a constant to log_post to get it close to unity so well-behaved
    # when we exponentiate it.  We don't care about normalization.
    log_post -= log_post.max()

    return log_post

Now that we have these functions in hand, we can cook up some fake data and compute the posterior probability.

# Generate 17 fake data, normally distributed with mean = std = 1.
np.random.seed(42)
real_mean = 1.0
real_std = 1.0
data = np.random.normal(real_mean, real_std, 17)

# Number of data points, for convenience
n = len(data)

# Ranges to sample parameters
mu = np.linspace(0.0, 2.0, 200)
sigma = np.linspace(0.01, 2.0, 200)

# Assume mu uniform on -1.5 < mu < 2.0
mu_range = (-1.5, 2.0)

# Compute the posterior
log_post = log_posterior(log_likelihood, log_prior, data, mu, sigma, mu_range)

Now that we can compute the posterior, we can plot it. Since it is a function of two variables, we need to make a 3D-like plot. Contour plots are particularly useful for this. I like to use filled contours, which we can make using the plt.contourf function. I like to use colormaps from ColorBrewer2, which we'll talk about later in the class, but for now, I'll use matplotlibs Purples_r colormap.

import matplotlib.pyplot as plt
%matplotlib inline

# Use contourf with Purples_r colormap.
plt.contourf(mu, sigma, np.exp(log_post), cmap=plt.get_cmap('Purples_r'))
plt.xlabel(r'$\mu$')
plt.ylabel(r'$\sigma$')
plt.title(r'$P(\mu,\sigma|D,I)$');

Another alternative is to plot $P(\mu,\sigma~|~D,I)$ as a 3D wire plot. For this, we need to use Axes3D from matplotlib.

# Make 3D wire plot
from mpl_toolkits.mplot3d.axes3d import Axes3D

fig = plt.figure()
ax = fig.add_subplot(111, projection='3d')
mmu, ssigma = np.meshgrid(mu, sigma)
ax.plot_wireframe(mmu, ssigma, np.exp(log_post), rstride=3, cstride=3, 
                  lw=0.5)
ax.view_init(40, -50)
ax.set_xlabel(r'$\mu$')
ax.set_ylabel(r'$\sigma$')
ax.set_zlabel(r'$P(\mu,\sigma|D,I)$');

In lieu of analytically computing the most probable values of $\mu$ and $\sigma$ analytically (which is possible), we will just approximate them by the largest in our sampling.

# Approximate most probable value of mu and sigma
most_prob_inds = np.unravel_index(log_post.argmax(), log_post.shape)
mu_most_prob = mu[most_prob_inds[1]]
sigma_most_prob = sigma[most_prob_inds[0]]
print('most probable mu    = {0:.3f}'.format(mu_most_prob))
print('most probable sigma = {0:.3f}'.format(sigma_most_prob))

most probable mu    = 0.915
most probable sigma = 0.910

We will use data from the Gandhi, et al. data set that we analyzed in Tutorial 2. We will take as our data set the total minutes of activity of wild type larvae during the third night of the experiment. This is the total number of minutes that each fish was active on that night.

a) Using the CSV files we made in Tutorial 2, generate an array (either a NumPy array or a pandas Series) containing the total number of sleep minutes during the third night of the experiment of each wild type fish. This array should have 17 entries.

b) Compute and plot the posterior probability $P(\mu,\sigma~|~D,I)$. Assume $\mu$ and $\sigma$ have independent priors. Assume the prior for $\mu$ is uniform on an appropriate domain and that the prior for $\sigma$ is a Jeffreys prior ($P(\sigma|I) \propto \sigma^{-1})$. Use the rough calculation method above to compute the most probable values of $\mu$ and $\sigma$.

c) Do the same as for part (b), except with a uniform prior for $\sigma$. Does using a Jeffreys prior make much difference? Whether it does or does not, why do you think this is the case?

d) Again assume a Jeffreys prior for $\sigma$. Assume a Gaussian prior for $\mu$ with mean of 225 minutes and standard deviation of 150 minutes. What effect does choosing this prior as opposed to a uniform prior have on the posterior probability?

e) Now assume a standard deviation of 20 minutes for the prior for $\mu$. How is the posterior effected?

As you complete this problem, you should note that not all data you are analyzing will have the same sensitivity to prior information. Researchers often test robustness of their results to choices of prior, and this is often a good idea.

Problem 2.2 (Exploring fish sleep data, 60 pts)¶

In Tutorial 2, we investigated sleeping states of zebrafish larvae. We had a discussion about what are the best metrics for a sleepful versus waking states based on the one-minute interval activity data we have. I think we agreed that it is far from obvious how to define a sleepful state. In this problem, you will work with your group to come up with some good ways to parametrize sleep behavior and estimate the values of these parameters.

Choose three different ways to parametrize sleeping vs. wakeful states. You can use sleep metrics from the Prober, et al. paper, ones we discussed in class, or invent your own. For each of the three ways of parametrizing sleep, provide instructive plots and estimate the values of the parameters. Compare the relative strengths and weaknesses of the sleep metrics you propse and give a discussion on which parametrization(s) you prefer.