3. Choosing priors


Part 1: Review of Lecture

In Part 1 we reviewed the Lecture 2 section on choosing priors. A quick recap of what we learned: 1. Uniform prior: Not the best. Improper prior – it gives us no information and is not normalizable! 2. Jeffreys prior: Also not great. Fixes the problem of the uniform prior where the priors vary with change of variables, but it is very hard to compute for some distributions. 3. Weakly informative prior: Good! We can choose a prior probability distribution to encode what we know about the parameter before we measured data. We will discuss how to come up with weakly informative priors below. 4. Conjugate prior: Can be helpful! Makes the posterior possible to analyze analytically, but only a few distributions have conjugates, so not always possible.

Part 2: Tips and Practice for Choosing Weakly Informative Priors

Recommended steps for choosing priors: 1. Write down Bayes’ Theorem first!

\[\begin{aligned} g(\mu \mid y) = \frac{f(y\mid \mu )\,g(\mu )}{f(y)}. \end{aligned}\]
In words:
$$
\begin{aligned}
\text{posterior} = \frac{\text{likelihood}\,x\ \text{prior}}{\text{evidence}}
\end{aligned}
$$
  1. Choose your likelihood (please see Recitation 2/Lecture 2 for help).

  2. You will need to choose a prior for each parameter in your likelihood. For example, if you choose \(f(y\mid \mu )\sim\text{Normal}(\mu, \sigma)\), you will need to choose priors \(g(\mu)\) and \(g(\sigma)\).

  3. Consider what you know about the parameter. Do you know what it means in the system? Are there any bounds on its value? Is it related to other parameters? Will its value be very close to zero/very far from zero?

  4. Come up with a sketch of what you think the probability distribution for your prior will look like. Your sketch should cover all possible values of the parameter.

  5. Use the Distribution Explorer to help you choose a probability distribution that matches your sketch. Consider the story and shape of the distribution. Play around with the distribution explorer, how does changing the parameters change the shape of the distribution?

  6. Go for more broad instead of more narrow!

  7. Don’t stress too much! Most often there will be several reasonable choices for priors. And they will be overwhelmed by the likelihood. As long as your prior is reasonable and not overly narrow, you will be fine!

We will now do a couple examples of choosing priors with data used in previous terms/years of the course. In these examples we will not worry too much about the likelihood and focus more on practicing choosing priors.

[1]:
import numpy as np
import pandas as pd

import bokeh_catplot
import bebi103

import bokeh.io
bokeh.io.output_notebook()

import holoviews as hv
hv.extension('bokeh')

import scipy.special
import scipy.optimize
import scipy.stats as st

bebi103.hv.set_defaults()
//anaconda3/lib/python3.7/site-packages/bebi103/stan.py:36: UserWarning: Both pystan and cmdstanpy are importable in this environment. As per the cmdstanpy documentation, this is not advised.
  "Both pystan and cmdstanpy are importable in this environment. As per the cmdstanpy documentation, this is not advised."
Loading BokehJS ...

Example 1: Spindle Size

From BE/Bi 103 2018, Tutorial 6a:

“The data set we will use for this analysis comes from Good, et al., Cytoplasmic volume modulates spindle size during embryogenesis Science, 342, 856-860, 2013. You can download the data set here. In this work, Matt Good and coworkers developed a microfluidic device where they could create droplets cytoplasm extracted from Xenopus eggs and embryos (see figure below; scale bar 20 µm). A remarkable property about Xenopus extract is that mitotic spindles spontaneously form; the extracted cytoplasm has all the ingredients to form them. This makes it an excellent model system for studying spindles. With their device, Good and his colleagues were able to study how the size of the cell affects the dimensions of the mitotic spindle; a simple, yet beautiful, question. The experiment is conceptually simple; they made the droplets and then measured their dimensions and the dimensions of the spindles using microscope images.”

Load in the data:

[2]:
df_spindle = pd.read_csv('../data/good_invitro_droplet_data.csv', comment='#')

df_spindle.head()
[2]:
Droplet Diameter (um) Droplet Volume (uL) Spindle Length (um) Spindle Width (um) Spindle Area (um2)
0 27.1 0.000010 28.9 10.8 155.8
1 28.2 0.000012 22.7 7.2 81.5
2 29.4 0.000013 26.2 10.5 138.3
3 31.0 0.000016 19.2 9.4 90.5
4 31.0 0.000016 28.4 12.1 172.4

Plot of Spindle Length vs. Droplet Diameter

[3]:
hv.Scatter(
    data=df_spindle,
    kdims=['Droplet Diameter (um)'],
    vdims=['Spindle Length (um)']
)
[3]:

ECDF of Spindle Length

[4]:
bokeh.io.show(bokeh_catplot.ecdf(df_spindle, val='Spindle Length (um)'))
Say the spindle length is independent of droplet size. We would then expect the spindle length to be given by:

\[\begin{aligned} l_i = \phi + e_i, \end{aligned}\]

where \(e_i\) is the noise component of the \(i\)the datum. So, we have a theoretical model for spindle length, \(l = \phi\), and to get a fully generative model, we need to model the errors \(e_i\).

Let us use the following likelihood (for explanation on how this likelihood was chosen, please see BE/Bi 103a Lesson 8:

\[\begin{aligned} l_i \sim \text{Norm}(\phi, \sigma) \;\;\forall i. \end{aligned}\]

So, we will need to choose the priors \(g(\phi)\) and \(g(\sigma)\).

Possibility #1:

Let’s think about what we know about mitotic spindles. I don’t know their exact range in size, but I would guess a reasonable choice for \(\phi\) might be around ~20 µm. Why? I know a eukaryotic cell is around 100 µm, so it’s going to be smaller than that. But I also know the mitotic spindle has to separate chromosomes during metaphase/anaphase, so they’re going to have some length probably greater than 1 µm. 20 um sounds reasonable, but I’m not super sure about it, so let’s be sure to keep it broad. I think the spindles could probably take on a large range of values. I also don’t think the mean length is going to be affected by any other parameters in my model. Based on this, I’ll make a rough sketch of my idea for \(g(\phi)\):

Great! I can use the Distribution Explorer to choose a reasonable prior based on my sketch. I will choose:

\[\begin{aligned} g(\phi) \sim \text{Norm}(20, 20) \end{aligned}\]

.

Which looks like:

[11]:
phi = np.linspace(-50, 100, 200)
p = bokeh.plotting.figure(width=300, height=200,
                          x_axis_label='phi (µm)',
                          y_axis_label='g(phi)')
p.line(phi, st.norm.pdf(mu, 20, 20), line_width=2)
bokeh.io.show(p)

For \(g(\sigma)\), I might choose a normal prior again. I think spindle size may vary a fair amount. They are created spontaneously in a droplet, so I would guess there could be many sizes. Therefore, I might choose

\[\begin{aligned} g(\sigma) \sim \text{Norm}(5, 5) \end{aligned}\]
[6]:
sigma = np.linspace(-10, 20, 100)
p = bokeh.plotting.figure(width=300, height=200,
                          x_axis_label='sigma (µm)',
                          y_axis_label='g(sigma)')
p.line(sigma, st.norm.pdf(sigma, 5, 5), line_width=2)
bokeh.io.show(p)

Visualizing the Prior

The prior can be a bit confusing because it’s hard to think about what it means. When choosing priors, we think about our knowledge of the parameter before collecting data. From what we know, we can estimate the probability of the parameter taking on certain values, and therefore can choose a reasonable distribution for our prior probability. One question that came up in recitation a lot is: what does each part of the prior mean? For example, what does the µ in \(g(\phi) \sim \text{Normal}(\mu, \sigma)\) represent? What about the \(\sigma\)?

Consider our prior \(g(\phi) \sim \text{Normal}(\mu, \sigma)\). We want to choose a prior for \(\phi\) that represents the probability of \(\phi\) taking on possible values. We chose a normal prior, so we are concerned with prior parameters \(\mu\) and \(\sigma\). It is important to note that the \(\mu\) in \(g(\phi) \sim \text{Normal}(\mu, \sigma)\) is not the same parameter as the \(\phi\) in \(l_i \sim \text{Norm}(\phi, \sigma) \;\;\forall i\). In the likelihood, \(l_i \sim \text{Norm}(\phi, \sigma) \;\;\forall i\), \(\phi\) is the location parameter of the PDF of the likelihood, and therefore represents the center of the probability distribution of the values of \(l_i\). In the prior for \(\phi\), \(g(\phi) \sim \text{Normal}(\mu, \sigma)\), \(\mu\) is the location parameter of the probability distribution of the possible values of \(\phi\), the center we were just talking about. Essentially, \(g(\phi)\) is giving the estimated probability distribution for where \(\phi\) lies.

What can be even more confusing is the \(\sigma\) in \(g(\phi) \sim \text{Normal}(\mu, \sigma)\). This is not related to the \(\sigma\) in \(l_i \sim \text{Norm}(\phi, \sigma) \;\;\forall i\)!! In \(g(\phi) \sim \text{Normal}(\mu, \sigma)\), \(\sigma\) is the scale parameter for our PDF for possible \(\phi\)s. If we want to pick a broad normal prior probability distribution for \(\phi\), then \(\sigma\) will be large, and vice versa. I hope I’m not just confusing you more, let’s do some more examples to clarify.

Possibility #2:

You may be thinking: hold on, \(\phi\) and \(\sigma\) could not physically have negative values, so why are we using normal priors that could potentially take on negative values? Totally fair point! We can resolve this in a few ways:

  1. If the location parameter for our Normal prior is far away from 0, negative values have a very low probability anyway due to the light tails, so we won’t have to worry about them too much. We can just include a conditional statement in our model to avoid negative values for our prior.

  2. We could choose a Log-Normal or Half-Normal prior to avoid this issue. Be sure to review those distributions on the distribution explorer to make sure they are reasonable choices.

Rethinking my prior \(g(\phi)\), I might consider a Log-Normal prior. 1) The Log-Normal prior will not take on negative values, which is physically true for \(\phi\), and 2) I think it is reasonable that \(\text{ln}(\phi)\) is normally distributed. If you look at my sketch again, it actually looks fairly Log-Normal, because I would expect the probability on the left side of \(\mu\) will vanish more quickly than on the right. So, I will choose:

\[\begin{aligned} g(\phi) \sim \text{LogNorm}(\text{ln} \ 20, 0.75) \end{aligned}\]

Which looks like:

[13]:
phi = np.linspace(0, 100, 200)
p = bokeh.plotting.figure(width=300, height=200,
                          x_axis_label='phi (µm)',
                          y_axis_label='g(phi)')
p.line(phi, st.lognorm.pdf(phi, 0.75, loc=0, scale=20), line_width=2)
bokeh.io.show(p)

Looks reasonable!

Revisiting \(g(\sigma)\), we know it doesn’t make sense for \(\sigma\) to be negative. I think Half-Normal would be an appropriate choice here, especially considering our original plot of \(g(\sigma) \sim \text{Norm}(5, 5)\). It also doesn’t make sense that \(\text{ln}(\sigma)\) would be normally distributed, so Half-Normal is a good choice. I might choose Half-Normal centered at 0 with a large scale parameter, such as

\[\begin{aligned} g(\sigma) \sim \text{HalfNorm}(0, 10) \end{aligned}\]

This would look like:

[15]:
sigma = np.linspace(0, 40, 200)
p = bokeh.plotting.figure(width=300, height=200,
                          x_axis_label='sigma (µm)',
                          y_axis_label='g(sigma)')
p.line(phi, st.halfnorm.pdf(sigma, 0, 10), line_width=2)
bokeh.io.show(p)

So, our final model would be:

\[\begin{split}\begin{aligned} \phi \sim \text{LogNorm}(\text{ln} \ 20, 0.75) \\ \sigma \sim \text{HalfNorm}(0, 10) \\ l_i \sim \text{Norm}(\phi, \sigma) \;\;\forall i \\ \end{aligned}\end{split}\]

Possibility #3: A New Model

Let’s use a new model for spindle length where droplet diameter is related to spindle length. To understand how we came up with this model, and its caveats, please refer to BE/Bi 103a Lesson 8. Be sure to read through this first, otherwise the following will not make sense.

From BE/Bi 103a Lesson 8:
“We have a theoretical model relating the droplet diameter to the spindle length. Let us now build a generative model. For spindle, droplet pair i, we assume
\[ \begin{align}\begin{aligned} \begin{aligned} l_i = \frac{\gamma d_i}{\left(1+(\gamma d/\phi)^3\right)^{\frac{1}{3}}} + e_i. \end{aligned}\\We will assume that :math:`e_i` is Normally distributed with variance :math:`\sigma^2`. This leads us to our statistical model.\end{aligned}\end{align} \]
\[ \begin{align}\begin{aligned}\begin{split} \begin{aligned} &\mu_i = \frac{\gamma d_i}{\left(1+(\gamma d_i/\phi)^3\right)^{\frac{1}{3}}}, \\[1em] &l_i \sim \text{Norm}(\mu_i, \sigma) \;\forall i, \end{aligned}\end{split}\\where\end{aligned}\end{align} \]
\[ \begin{align}\begin{aligned}\begin{split} \begin{aligned} \gamma &= \left(\frac{T_0-T_\mathrm{min}}{k^2T_\mathrm{s}}\right)^\frac{1}{3} \\ \phi &= \gamma\left(\frac{6\alpha\beta}{\pi}\right)^{\frac{1}{3}}. \end{aligned}\end{split}\\which is equivalently stated as\end{aligned}\end{align} \]
\[\begin{aligned} l_i \sim \text{Norm}\left(\frac{\gamma d_i}{\left(1+(\gamma d_i/\phi)^3\right)^{\frac{1}{3}}}, \sigma\right) \;\forall i. \end{aligned}\]

Yikes, we have a lot more parameters in this model! Not to worry, we will choose priors for one parameter at a time. Starting with \(\phi\), this still represents the location parameter for the distribution of spindle size. So, we will use the same prior:

\[\begin{aligned} \phi \sim \text{LogNorm}(\text{ln} \ 20, 0.75) \end{aligned}\]

Now let’s consider \(\gamma\). \(\gamma\) determines the relationship between \(d\), the droplet diameter, and \(\phi\), the spindle length. We can already think of some bounds on \(\gamma\). \(\gamma\) must be greater than 0. Also, \(\gamma\) must be less than or equal to 1, or else the spindle length exceeds the droplet diameter. Further, we can see that for large droplets, with \(d \gg \phi/\gamma\), the spindle size becomes independent of \(d\), with

\[ \begin{align}\begin{aligned} \begin{aligned} l \approx \phi. \end{aligned}\\Conversely, for $ d :nbsphinx-math:`\ll `:nbsphinx-math:`\phi`/:nbsphinx-math:`\gamma`$, the spindle length varies approximately linearly with diameter.\end{aligned}\end{align} \]
\[\begin{aligned} l(d) \approx \gamma\,d. \end{aligned}\]

Of course, these cases are physically possible. Please see the notes I referenced from BE/Bi 103a to review the limitations of this model. So, we know our model will only work in cases of droplets with intermediate diameter. Thus, we will choose:

\[\begin{aligned} \gamma \sim \text{Beta}(2, 2) \end{aligned}\]
[20]:
gamma = np.linspace(0, 2, 100)
p = bokeh.plotting.figure(width=300, height=200,
                          x_axis_label='gamma (µm)',
                          y_axis_label='g(gamma)')
p.line(gamma, st.beta.pdf(gamma, 2, 2), line_width=2)
bokeh.io.show(p)
Note: notice how useful the beta distribution can be!
Next, we will consider \(\sigma\). We could use our previous choice, \(\sigma \sim \text{HalfNorm}(0, 10)\). We could also consider the case where we assume \(\sigma\) scales with \(\phi\). This is not an unreasonable assumption. As \(\phi\) increases, we may expect the standard deviation \(\sigma\) to increase as well. We can write the expression:

\[\begin{aligned} \sigma = \sigma_0 \phi \end{aligned}\]

We have now introduced a new prior parameter, \(\sigma_0\). \(\sigma_0\) is our constant of proportionality for the relationship between \(\sigma\) and \(\phi\). We choose a gamma prior for \(\sigma_0\):

\[ \begin{align}\begin{aligned} \begin{aligned} \sigma_0 = \text{Gamma}(2, 10) \end{aligned}\\Gamma makes sense in this case because, as stated in the Distribution Explorer: “it imparts a heavier tail than the Half-Normal distribution (but not too heavy; it keeps parameters from growing too large), and allows the parameter value to come close to zero.” We want to leave room for the case where :math:`\sigma_0` is small, so a Gamma distribution is appropriate.\end{aligned}\end{align} \]

This will look like:

[32]:
sigma_0 = np.linspace(0, 2, 100)
p = bokeh.plotting.figure(width=300, height=200,
                          x_axis_label='sigma_0',
                          y_axis_label='g(sigma_0)')
p.line(sigma_0, st.gamma.pdf(sigma_0, 2, loc=0, scale=0.1), line_width=2)
bokeh.io.show(p)

We have now finished building the model:

\[\begin{split}\begin{aligned} &\phi \sim \text{LogNorm}(\ln 20, 0.75),\\[1em] &\gamma \sim \text{Beta}(2, 2), \\[1em] &\sigma_0 \sim \text{Gamma}(2, 10),\\[1em] &\sigma = \sigma_0\,\phi,\\[1em] &\mu_i = \frac{\gamma d_i}{\left(1+(\gamma d_i/\phi)^3\right)^{\frac{1}{3}}}, \\[1em] &l_i \sim \text{Norm}(\mu_i, \sigma) \;\forall i. \end{aligned}\end{split}\]

Example 2: Microtubule Catastrophe

Gardner, Zanic, and coworkers investigated the dynamics of microtubule catastrophe, the switching of a microtubule from a growing to a shrinking state. In particular, they were interested in the time between the start of growth of a microtubule and the catastrophe event. They monitored microtubules by using tubulin (the monomer that comprises a microtubule) that was labeled with a fluorescent marker. As a control to make sure that fluorescent labels and exposure to laser light did not affect the microtubule dynamics, they performed a similar experiment using differential interference contrast (DIC) microscopy. They measured the time until catastrophe with labeled and unlabeled tubulin.
[7]:
df_mt = pd.read_csv('../data/gardner_time_to_catastrophe_dic_tidy.csv')
df_labeled = df_mt.loc[df_mt['labeled'], 'time to catastrophe (s)'].values

df_mt.head()
[7]:
Unnamed: 0 time to catastrophe (s) labeled
0 0 470.0 True
1 1 1415.0 True
2 2 130.0 True
3 3 280.0 True
4 4 550.0 True

ECDF of Catastrophe Times for Labeled Tubulin

[8]:
p = bokeh_catplot.ecdf(pd.DataFrame({'t (s)': df_labeled}), val='t (s)', conf_int=True)

bokeh.io.show(p)

Scatter of Time to Catastrophe for Labeled Tubulin

[9]:
hv.Scatter(df_labeled, kdims=None, vdims='time to catastrophe (s)')
[9]:

We ran out of time in recitation to cover this model, but I will leave it as an exercise. Consider the Gamma model we explored in BE/Bi 103a HW 9. Try to pick reasonable priors for \(\alpha\) and \(\beta\). If you need help completing this exercise, please feel free to ask Cece in class, office hours, or via Ed.