Tutorial 4b: Parameter estimation and MLE

(c) 2017 Justin Bois. This work is licensed under a Creative Commons Attribution License CC-BY 4.0. All code contained herein is licensed under an MIT license.

This tutorial was generated from an Jupyter notebook. You can download the notebook here.

In [1]:
import numpy as np
import pandas as pd
import scipy.optimize
import scipy.stats as st
import statsmodels.tools.numdiff as smnd

import bebi103

import bokeh.io
import bokeh.plotting
bokeh.io.output_notebook()

colors = bokeh.palettes.d3['Category10'][10]
Loading BokehJS ...

In this tutorial, we will expand our practice of parameter estimation to non-Gaussian likelihoods. We will see here that there are really no difference in the principles, but there are some subtleties in how we find the MAP.

The term "MLE" is also in the title of this tutorial. Maximum likelihood estimation is the procedure for finding the parameters that maximize the likelihood. This is a special case in the Bayesian framework: it is the MAP for a uniform prior. The term is widely used, and we show here that it is as simple as what we have already done: it's just MAP finding. However, in special cases where the MLE problem reduces to finding a minimal residual sum of squares, we can use the Levenberg-Marquardt algorithm, which is fast and reliable.

In the process of this tutorial, we will explore more of the powerful functionality of the scipy.stats module, which we visited in Tutorial 3b, as we use it in our posterior definitions and also for sampling out of distributions.

The data set

The data come from the Elowitz lab, published in Singer et al., Dynamic Heterogeneity and DNA Methylation in Embryonic Stem Cells, Molec. Cell, 55, 319-331, 2014, available here.

In this paper, the authors investigated cell populations of embryonic stem cells using RNA single molecule fluorescence in situ hybridization (smFISH), a technique that enables them to count the number of mRNA transcripts in a cell for a given gene. They were able to measure four different genes in the same cells. So, for one experiment, they get the counts of four different genes in a collection of cells.

The authors focused on genes that code for pluripotency-associated regulators to study cell differentiation. Indeed, differing gene expression levels are a hallmark of differentiated cells. The authors do not just look at counts in a given cell at a given time. The temporal nature of gene expression is also important. While the authors do not directly look at temporal data using smFISH (since the technique requires fixing the cells), they did look at time lapse fluorescence movies of other regulators. We will not focus on these experiments here, but will discuss how the distribution of mRNA counts acquired via smFISH can serve to provide some insight about the dynamics of gene expression.

The data set we are analyzing now comes from an experiment where smFISH was performed in 279 cells for the genes rex1, rest, nanog, and prdm14. The data set may be downloaded here.

ECDFs of mRNA counts

Let's go ahead and explore the data. We will load in the data set and generate ECDFs for the mRNA counts for each of the four genes.

In [2]:
# Load DataFrame
df = pd.read_csv('../data/singer_transcript_counts.csv', comment='#')

# Take a look
df.head()
Out[2]:
Rex1 Rest Nanog Prdm14
0 11 34 39 0
1 172 91 33 5
2 261 70 68 0
3 178 54 88 1
4 129 54 41 0

The data are already tidy, since each row is a single measurement (a single cell). We can plot ECDFs for each gene.

In [3]:
# Make a list of plots
plots = [bebi103.viz.ecdf(df[gene], x_axis_label='mRNA count', title=gene)
         for gene in df]

# Show them as a grid
bokeh.io.show(bokeh.layouts.gridplot(plots, ncols=2))

Note the difference in the $x$-axis scales. Clearly, prdm14 has far fewer mRNA copies than the other genes. The presence of two inflection points in the Rex1 EDCF implies bimodality.

The distribution of transcript counts and the "story"

The smFISH experiments give the number of mRNA copies of a given gene in a given cell. How would we expect these numbers to be distributed?

If gene expression is a purely Poisson process, we might expect a Poisson distribution. Or, if the copy number is itself somehow tightly regulated, we might expect a Gaussian distribution.

Study of gene expression dynamics, largely through fluorescence imaging, has lead to a different story. Expression of many important genes can be bursty, which means that the promoter is on for a period of time in which transcripts are made, and then it is off for a while. The "on" periods are called "bursts" and are themselves well-modeled as a Poisson process. That is to say that the amount of time that a promoter is on is Exponentially distributed. Thus, we can think of a burst as a series of Bernoulli trials. A "failure" is production of an mRNA molecule, and a "success" is a switch to an off state. The number of "successes" we get is equal to the number of bursts we get per decay time of the mRNA. We can define the number of bursts before degradation of the mRNA as $r$. This is the so-called burst frequency. So, we have a series of Bernoulli trials and we wait for $r$ successes. Then, $n$, the total number of failures (which is the number of mRNA transcripts), is Negative Binomially distributed, since this matches the Negative Binomial story. Defining the probability of "success" (ending a burst) as $p$, we have a likelihood for a given cell of

\begin{align} f(n\mid r,p) = \frac{(n + r - 1)!}{n!(r-1)!}\,p^r(1-p)^n. \end{align}

The quantity $p$ is a little mystical here. We would like to relate it to the typical burst size, i.e., the typical number of transcripts made per burst. The size of a single given burst is geometrically distributed (since it matches that story), so

\begin{align} f(n_\mathrm{burst}\mid p) = (1-p)^{n_\mathrm{burst}}p. \end{align}

The mean number of transcripts in a burst is

\begin{align} b \equiv \left\langle n_\mathrm{burst}\right\rangle &= \sum_{n_\mathrm{burst}=0}^\infty n_\mathrm{burst}(1-p)^{n_\mathrm{burst}}p\\[1em] &= p \sum_{n_\mathrm{burst}=0}^\infty n_\mathrm{burst}(1-p)^{n_\mathrm{burst}} \\[1em] &= p(1-p)\, \frac{\mathrm{d}}{\mathrm{d}(1-p)}\sum_{n_\mathrm{burst}=0}^\infty(1-p)^{n_\mathrm{burst}} \\[1em] &= p(1-p)\, \frac{\mathrm{d}}{\mathrm{d}(1-p)}\,\frac{1}{p}\\[1em] &= -p(1-p)\, \frac{\mathrm{d}}{\mathrm{d}p}\,\frac{1}{p} \\[1em] &= \frac{1-p}{p}. \end{align}

So, if we determine $p$ and $r$, we can compute the typical burst size, $b = (1-p)/p$, and the burst frequency, $r$.

Bimodal distributions and their story

As we saw when we explored the data, the distribution in mRNA copies for some genes is clearly bimodal. Since the negative binomial distribution is unimodal, what could be the story here? It is quite possible we are seeing two different states of cells, one with one level of bursty expression of the gene of interest, and another with a different level of bursty expression. This could mean the cells are differentiating. So, we would expect the number of mRNA transcripts to be distributed according to a linear combination of negative binomial distributions.

\begin{align} f(n\mid r_1, r_2, p_1, p_2, w_1) &= q\,\frac{(n + r_1 - 1)!}{n!\,(r_1-1)!}\,p_1^{r_1}(1-p_1)^{n} \\[1em] &\;\;\;\;+ (1-w_1) \frac{(n + r_2 - 1)!}{n!\,(r_2-1)!}\,p_2^{r_2}(1-p_2)^{n}, \end{align}

where $w_1$ is the probability that the burst size and frequency are determined by $p_1$ and $r_1$.

Parameter estimation for negative binomial distributions

The procedure for parameter estimation for a given gene starts at a familiar place. We write Bayes's theorem. We will denote by $\mathbf{n}$ the set of mRNA transcript counts (i.e., $D = \mathbf{n}$). (We often define $n \equiv |D|$, which we are not doing here, so don't get confused.) Then, Bayes's theorem reads

\begin{align} g(r, p \mid \mathbf{n}) = \frac{f(\mathbf{n}\mid r, p)\,g(r, p)}{f(\mathbf{n})}. \end{align}

Assuming each cell is independent of all others,

\begin{align} f(\mathbf{n}\mid r, p) = \prod_{n\in\mathbf{n}} \frac{(n + r - 1)!}{n!(r-1)!}\,p^r(1-p)^n. \end{align}

We now need to specify the prior, $g(r,p)$. For simplicity, we will assume that $r$ and $p$ are independent and assign Uniform priors for both $r$ and $p$ (on $(0,r_\mathrm{max}]$ for the former and on $[0,1]$ for the latter). To estimate $r_max$, we will refer to the excellent BioNumbers website. The typical mRNA degradation time in mice is 9 hours. A burst every half hour seems pretty fast to me, so we'll estimate $r_\mathrm{max}\approx 20$ and use this in our calculations.

When we have uniform priors on the parameters, the procedure of finding the MAP is then called maximum likelihood estimation for the obvious reason that the posterior is proportional to the expression for the likelihood.

Our goal is to find the MAP, or

\begin{align} \arg \max_{r,p} \sum_{n\in\mathbf{n}} \ln\left(\frac{(n + r - 1)!}{n!(r-1)!}\,p^r(1-p)^n\right). \end{align}

We will use numerical optimization to achieve this, but there is a wrinkle: we cannot use scipy.optimize.leastsq() like we did in the previous tutorial.

Numerical optimization to find the MAP

To compute the MAP, we first need to construct the log posterior.

In [4]:
def log_likelihood(params, n):
    """
    Log likelihood for MLE of Singer data.
    """
    r, p = params

    return np.sum(st.nbinom.logpmf(n, r, p))


def log_prior(params, r_max):
    """
    Log prior for MLE of Singer data.
    """
    r, p = params
    
    # Zero probability of having p < 0 or p > 1
    if p < 0 or p > 1:
        return -np.inf
    
    # Zero probability of r < 1
    if r <= 0 or r > r_max:
        return -np.inf

    return -np.log(r_max - 1)


def log_posterior(params, n, r_max):
    """
    Log posterior for the Singer data.
    """
    # Compute log prior
    lp = log_prior(params, r_max)

    # If log prior is -inf, return that
    if lp == -np.inf:
        return -np.inf

    # Compute and return posterior
    return lp + log_likelihood(params, n)

Note how I constructed the log posterior. It calls functions for the log prior and log likelihood and then adds them together to get the posterior. Importantly, the log prior function handles all issues related to the bounds on the parameters. There is then no issue passing the parameters to the log likelihood function. This construction will prove useful when we do optimization because it enforces bounds on the parameters through the definitions of the log posterior itself. (There are other ways to enforce bounds, such as with the bounded BFGS algorithm, but we will not use that here.)

As before, we need to specify a negative log posterior for the minimizer.

In [5]:
def neg_log_posterior(params, n, r_max):
    """
    Negative log posterior for MLE of Singer data.
    """
    return -log_posterior(params, n, r_max)

Note that in our definition of the log posterior, we used scipy.stats to compute the posterior. There are many convenient functions available in this module. In particular, most common distributions are available, and for each, we can compute things like the PMF/PDF, CDF, as well as draw random samples out of the distribution. In this case, we want the logarithm of the PMF, implemented with st.nbinom.logpmf().

Before implementing the optimization, I'll point out that we are using optimization of continuous variables, but $r$ is discrete in a negative binomial distribution. Is this ok? In this scenario, $r$ is the typical number of transcripts that are made in the characteristic decay time of mRNA. It is given by the ratio of the transcript rate and the degradation rate of mRNA. This is inherently non-integer. So, allowing $r$ to be continuous is not really a sin, but a feature. The negative binomial distribution generalizes to noninteger $r$ by replacing the factorials with gamma functions, which is what the scipy.stats module does anyhow.

After reading the doc strings, we'll use scipy.optimize.minimize() to find the MAP for each gene. We may get some warnings because the solver encounters from np.inf values, but this is ok.

In [6]:
# Specify r_max
r_max = 20

# Give inital guesses
p0 = np.array([5, 0.5])

# Results
df_MAP = pd.DataFrame(columns=['gene', 'r', 'r_err', 'p', 'p_err', 'b'])

# Do MLE for each gene
for i, gene in enumerate(df.columns):
    # Pull out counts
    n = df[gene].values
    
    # Perform minimization
    res = scipy.optimize.minimize(neg_log_posterior, p0, args=(n, r_max), 
                                  method='powell')
    
    # Compute error bars
    hes = smnd.approx_hess(res.x, log_posterior, args=(n, r_max))
    cov = -np.linalg.inv(hes)

    # Unpack results
    df_out = pd.DataFrame([[gene, 
                            res.x[0], 
                            np.sqrt(cov[0,0]), 
                            res.x[1], 
                            np.sqrt(cov[1,1]), 
                            (1 - res.x[1]) / res.x[1]]],
                          columns=df_MAP.columns, index=[i])
    df_MAP = df_MAP.append(df_out)
    
# Look at results
df_MAP
Out[6]:
gene r r_err p p_err b
0 Rex1 1.632727 0.130568 0.011662 0.001070 84.745085
1 Rest 4.529398 0.400147 0.057002 0.004998 16.543272
2 Nanog 1.263433 0.101810 0.014196 0.001355 69.444413
3 Prdm14 0.552853 0.057859 0.108689 0.013079 8.200553

Plotting the posterior

Because there are only two parameters, we can make a plot of the posterior. Let's go ahead and do that to make sure our MAP is reasonable. We'll do it for Rest.

In [7]:
# Set up ranges for plotting
r = np.linspace(3, 6, 200)
p = np.linspace(0.04, 0.07, 200)

# Fetch data
n = df['Rest'].values

# Compute log posterior
log_post = np.empty((len(p), len(r)))
for j, r_val in enumerate(r):
    for i, p_val in enumerate(p):
        log_post[i,j] = log_posterior((r_val, p_val), n, 20)
                    
# Compute values for plot
R, P = np.meshgrid(r, p)
post = np.exp(log_post - log_post.max())

# Show posterior plot
p = bebi103.viz.contour(R,
                        P,
                        post,
                        overlaid=True,
                        x_axis_label='r',
                        y_axis_label='p')
bokeh.io.show(p)

This plot does not show anything pathological, so it is reasonable to report the MAP with an error bar coming from the Gaussian approximation.

The MAP for the bimodal Rex1 distribution

As we noted during exploratory data analysis, the Rex1 distribution appears bimodal. To specify its posterior, we will again assume uniform priors on all parameters. Then, our posterior is

\begin{align} g(r_1, r_2, p_1, p_2, w_1\mid n) &\propto w_1\,\frac{(n + r_1 - 1)!}{n!\,(r_1-1)!}\,p_1^{r_1}(1-p_1)^{n} \nonumber \\[1em] &\;\;\;\;+ (1-w_1) \frac{(n + r_2 - 1)!}{n!\,(r_2-1)!}\,p_2^{r_2}(1-p_2)^{n}. \end{align}

We'll code up the log posterior and again use Powell's method to find the MAP. We will specify also that $p_1 < p_2$ so the subscripts always refer to the same thing. We are also careful to put in logic in the definition of the prior to make sure that the posterior is zero in regions where it is not defined (e.g., when $p_2 > 1$).

In [8]:
def log_likelihood_bimodal(params, n):
    """
    Log likelihood for linear combination of neg. binomials.
    """
    r_1, r_2, p_1, p_2, w_1 = params
    
    return np.sum(np.log(w_1 * st.nbinom.pmf(n, r_1, p_1)
                  + (1-w_1) * st.nbinom.pmf(n, r_2, p_2)))


def log_prior_bimodal(params, r_max):
    """
    Log prior for linear combination of neg. binomials.
    """
    r_1, r_2, p_1, p_2, w_1 = params
    
    if ((w_1 <= 0) or (w_1 >= 1) or (r_1 <= 0) or (r_2 <= 0) or (p_1 > p_2) 
            or (p_1 < 0) or (p_2 > 1) or (r_1 > r_max) or (r_2 > r_max)):
        return -np.inf
    
    return -2 * np.log(r_max - 1)


def log_posterior_bimodal(params, n, r_max):
    """
    Log posterior for the Singer data for lin. comb. of neg. binom
    """
    # Compute log prior
    lp = log_prior_bimodal(params, r_max)

    # If log prior is -inf, return that
    if lp == -np.inf:
        return -np.inf

    # Compute and return posterior
    return lp + log_likelihood_bimodal(params, n)


def neg_log_posterior_bimodal(params, n, r_max):
    """
    Negative log posterior for linear combination of neg. binomials.
    """
    return -log_posterior_bimodal(params, n, r_max)

Now that the posterior is built, we can perform the optimization.

In [9]:
# Extract counts
n = df['Rex1'].values

# Give inital guesses
p0 = np.array([5, 5, 0.1, 0.3, 0.5])

# Perform the optimization
res = scipy.optimize.minimize(neg_log_posterior_bimodal,
                              p0,
                              args=(n, r_max),
                              method='powell')

# Compute error bars
hes = smnd.approx_hess(res.x, log_posterior_bimodal, args=(n, r_max))
cov = -np.linalg.inv(hes)

# Store results in Pandas series
inds = ['r1', 'r1_err', 'r2', 'r2_err', 
           'p1', 'p1_err', 'p2', 'p2_err', 
           'w1', 'w1_err', 'b1', 'b2']
data = [x[j] for x in list(zip(res.x, np.sqrt(np.diag(cov)))) 
             for j in range(len(x))]
data += [(1 - res.x[2]) / res.x[2], (1 - res.x[3]) / res.x[3]]
s_Rex1_bimodal = pd.Series(data, index=inds)

# Have a look
s_Rex1_bimodal
Out[9]:
r1         5.080793
r1_err     0.682914
r2         3.427826
r2_err     1.377454
p1         0.030353
p1_err     0.003795
p2         0.192659
p2_err     0.082079
w1         0.838538
w1_err     0.026179
b1        31.945822
b2         4.190529
dtype: float64

So, we have managed to measure all of the parameters and their error bars for a bimodal distribution.

However, we cannot really plot this distribution. If has five parameters, and we would have to marginalize three of them away to make a plot. We will not attempt that here, but marginalizing and plotting posteriors becomes trivial when we learn how to use Markov chain Monte Carlo to compute posteriors later in the course.

Graphical evaluation of the MLE

To graphically evaluate how well the model, parametrized by the MLE, describes the data, we have a few options.

Plotting the CDF and ECDF

First, we can plot the ECDF with the CDF given by the most probable parameters.

In [10]:
for i, gene in enumerate(df.columns):
    # What data to take from MAP DataFrame
    inds = df_MAP['gene'] == gene

    # Compute theoretical CDF
    n_plot = np.arange(0, df[gene].max()+1)
    r, p = df_MAP.loc[inds, ['r', 'p']].values.flatten()
    cdf_theor = st.nbinom.cdf(n_plot, r, p)
    
    # Overlay CDF
    plots[i].line(n_plot, cdf_theor, color=colors[1], line_width=3)
    
    # If it is Rex1, do bimodal as well
    if gene == 'Rex1':
        # Build theroetical PMF
        inds = ['r1', 'r2', 'p1', 'p2', 'w1']
        r_1, r_2, p_1, p_2, w_1 = s_Rex1_bimodal[inds].values.flatten()
        cdf_theor = ( w_1 * st.nbinom.cdf(n_plot, r_1, p_1)
                     + (1-w_1) * st.nbinom.cdf(n_plot, r_2, p_2))
        plots[i].line(n_plot, cdf_theor, color=colors[9], line_width=3)
        
bokeh.io.show(bokeh.layouts.gridplot(plots, ncols=2))

This provides a clear "by eye" evaluation of the quality of the MAP (and by extension the model) for describing the data. Our theoretical CDFs for Rex1 and Prdm14 agree very well with the data. There is a bit of systematic difference in the Nanog and Rest genes.

Q-Q plots

Q-Q plots (the "Q" stands for "quantile) are convenient ways to graphically compare two probability distributions. In our case, we wish to compare the measured distribution to the theoretical one described by the MAP. There are many ways to generate Q-Q plots, and many of the descriptions out there are kind of convoluted. Here is a procedure/description I like for $N$ total empirical samples.

  1. Sort your samples from lowest to highest.
  2. Draw $N$ samples from the theoretical distribution and sort them.
  3. Plot these against each other.

If the plotted points fall on a straight line of slope one, the distributions are the same. Deviations from this highlight differences in the distributions.

I actually like to generate many many samples from the theoretical distribution and then plot the 95% confidence interval region. To plot the confidence interval, I use the bebi103.viz.fill_between() function. To have access to it, you may have to update the bebi103 module.

pip install bebi103 --upgrade

This plot gives a feel of how plausible it is that the observed data were drawn out of the theoretical distribution. Let's do this with a negative binomial distribution with our measured mRNA transcripts using the MAP to parametrize the theoretical distribution.

In [11]:
plots = []
for i, gene in enumerate(df.columns):
    # What data to take from MAP DataFrame
    inds = (df_MAP['gene']==gene)

    # Parameters
    r, p = df_MAP.loc[inds, ['r', 'p']].values.flatten()

    # x-values
    x = np.sort(df[gene].values)
    
    # Make draws
    theor_x = np.array(
        [np.sort(st.nbinom.rvs(r, p, size=len(x))) for _ in range(1000)])

    # Upper and lower bounds
    low_theor, up_theor = np.percentile(theor_x, (2.5, 97.5), axis=0)

    # Plot Q-Q plots with 95% conf.
    plots.append(bebi103.viz.fill_between(x,
                                          up_theor,
                                          x,
                                          low_theor,
                                          fill_alpha=0.7,
                                          x_axis_label='transcript counts',
                                          y_axis_label='transcript counts',
                                          title=gene))

    # Plot 45 degree line
    plots[-1].line([0, x.max()],
                   [0, x.max()],
                   line_width=4,
                   color='black',
                   alpha=0.5)
    
bokeh.io.show(bokeh.layouts.gridplot(plots, ncols=2))

And now also for the bimodal Rex1 distribution.

In [12]:
# Pull out parameters
inds = ['r1', 'r2', 'p1', 'p2', 'w1']
r1, r2, p1, p2, w_1 = s_Rex1_bimodal[inds].values.flatten()

# x-values
x = np.sort(df['Rex1'].values)

# Make draws
def draw_double_neg_binom(r1, r2, p1, p2, w_1, size=1):
    """
    Draw from double negative binomial
    """
    qr = st.bernoulli.rvs(w_1, size=size)
    return qr * st.nbinom.rvs(r1, p1, size=size) + \
            (1-qr) * st.nbinom.rvs(r2, p2, size=size)
    
    
theor_x = np.array(
    [np.sort(draw_double_neg_binom(r1, r2, p1, p2, w_1, size=len(x)))
                     for _ in range(1000)])

# Upper and lower bounds
low_theor, up_theor = np.percentile(theor_x, (2.5, 97.5), axis=0)

# Plot Q-Q plots with 95% conf.
p = bebi103.viz.fill_between(x,
                             up_theor,
                             x,
                             low_theor,
                             fill_alpha=0.7,
                             x_axis_label='transcript counts',
                             y_axis_label='transcript counts',
                             title='Rex1')

# Plot 45 degree line
p.line([0, x.max()], [0, x.max()], line_width=4, color='black', alpha=0.5)

bokeh.io.show(p)

This indicates that the observed mRNA distribution could certainly come from a double negative binomial distribution. Whether or not this is more likely than another model is part of the topic of model selection, which we will talk in coming weeks.

Regression by least squares

We will now look at a special type of MLE problem. Recall the last tutorial and the microtubule spindle length analysis. First, let's load in the data set.

In [13]:
df = pd.read_csv('../data/good_invitro_droplet_data.csv', comment='#')

The marginalized posterior distribution we derived there was

\begin{align} g(\gamma, \phi, \mid \mathbf{l},\mathbf{d}) &\propto \frac{1}{\phi}\left(\sum_i(l_i - l(d_i;\gamma, \phi))^2\right)^{-\frac{n}{2}}, \end{align}

with

\begin{align} l(d;\gamma, \phi) = \frac{\gamma d}{(1 + (d/\phi)^3)^\frac{1}{3}}. \end{align}

Now, imagine if we had instead chosen a Uniform prior for $\phi$, giving a marginalized posterior of

\begin{align} g(\gamma, \phi, \mid \mathbf{l},\mathbf{d}) &\propto \left(\sum_i(l_i - l(d_i;\gamma, \phi))^2\right)^{-\frac{n}{2}}. \end{align}

If we look at the form of the posterior, we see that the posterior is maximal when the sum

\begin{align} \sum_{i} (l_i - l(d_i;\gamma, \phi))^2 \end{align}

is minimal. So, we need to find the values of $\gamma$ and $\phi$ that minimize the sum of the square of the residuals, where a residual is the difference from the prediction of the mathematical model and the measured value.

This is very common in curve fitting applications, though not all (an obvious exception being the case of a non-Uniform prior). As such, highly optimized algorithms exist for numerically solving this optimization problem. In particular, the Levenberg-Marquardt algorithm effectively solves the problem of minimizing the sum of the squares of a function. In our case, the function is the residual. This is implemented in the scipy.optimize.leastsq() function, which we will use to find the MAP. As usual, you should read the docs.

We start be defining a residual function, as the function requires. The first argument must be a Numpy array of the parameter values, and remaining arguments can be passed in after that. We can borrow from the previous tutorial.

In [14]:
def spindle_length(params, d):
    """Spindle length as a function of droplet diameter."""
    # Unpack parameters
    gamma, phi = params
    
    # Compute spindle length
    return gamma * d / np.cbrt(1 + (d/phi)**3)


def resid(params, d, ell):
    """Residuals for spindle length model."""
    return ell - spindle_length(params, d)

We are now ready to use scipy.optimize.leastsq() to compute the MAP. We use the args kwarg to pass in the other arguments to the resid() function. In our case, these arguments are the data points. The leastsq() function returns multiple values, but the first, the optimal parameter values (the MAP), is all we are interested in. We therefore use a dummy variable, _, for the other values that are returned.

In [15]:
# Extra arguments as a tuple
args = (df['Droplet Diameter (um)'].values, df['Spindle Length (um)'].values)

# Initial guess
params_0 = np.array([0.6, 67])

# Compute the MAP
popt, _ = scipy.optimize.leastsq(resid, params_0, args=args)

# Extract the values
gamma_MAP, phi_MAP = popt

# Print results
print("""
Most probable parameters
------------------------
γ = {0:.2f}
φ = {1:.2f} µm
""".format(gamma_MAP, phi_MAP))
Most probable parameters
------------------------
γ = 0.86
φ = 44.47 µm

This was useful to find the MAP, and though we should use the whole posterior to compute the Hessian and then get error bars on the parameter values. Note that in this case, the log posterior has the same functional form of the log likelihood because we have assumed Uniform priors.

In [16]:
def log_post(params, d, ell):
    """Unnormalized log posterior for spindle length problem."""
    return -len(d) / 2 * np.log(np.sum(resid(params, d, ell)**2))

# Compute covarianve matrix
hes = smnd.approx_hess(popt, log_post, args=args)
cov = -np.linalg.inv(hes)

# Report results
print("""
Results for conserved tubulin model (≈ 68% of total probability)
----------------------------------------------------------------
γ = {0:.2f} ± {1:.2f}
φ = {2:.1f} ± {3:.1f} µm
""".format(gamma_MAP, np.sqrt(cov[0,0]), phi_MAP, np.sqrt(cov[1,1])))
Results for conserved tubulin model (≈ 68% of total probability)
----------------------------------------------------------------
γ = 0.86 ± 0.02
φ = 44.5 ± 1.3 µm

Conclusions

Maximum likelihood estimation (MLE) is the name given to the process of computing the MAP for a statistical model with uniform priors. If the priors are not uniform, the procedure for finding the MAP is exactly the same. In the special case where finding the MAP reduces to minimizing a residual sum of squares, the Levenberg-Marquardt algorithm can and usually should be used.

It is important to note that the MAP is one point of the posterior. We might be tempted to say that because it is where the posterior is maximal, that it is the most important point. At least by also reporting an error bar, you are giving a bit more information about the posterior. This whole practice often works out well to give reliable summaries of posteriors, but can fail spectacularly, we you will see in your homework. You should take care that the posterior is well-behaved.