BE/Bi 103, Fall 2018: Homework 11¶

Due 1pm, December 14¶

(c) 2018 Justin Bois. With the exception of pasted graphics, where the source is noted, this work is licensed under a Creative Commons Attribution License CC-BY 4.0. All code contained herein is licensed under an MIT license.

This document was prepared at Caltech with financial support from the Donna and Benjamin M. Rosen Bioengineering Center.

This homework was generated from an Jupyter notebook. You can download the notebook here.

You may do this homework individually, or with your group. Only group members who work on the respective problems will be awarded extra credit points.

Problem 11.1: Hierarchical models are hiding in plain sight, 20 pts extra credit¶

Say we have a set of measurements, $\mathbf{x} = \{x_1, x_2,\ldots\}$. Each measurement has associated with is some measurement error such that $x_i \sim \text{Norm}(\mu_i, s)$. Furthermore, there is a natural variability from measurement to measurement such that $\mu_i \sim \text{Norm}(\mu, \sigma)$.

a) Write a mathemetical expression for the joint generative probability density, $\pi(\mathbf{x}, \boldsymbol{\mu}, \mu, s, \sigma)$, where $\mathbf{x} = \{x_1, x_2,\ldots\}$ and $\boldsymbol{\mu} = \{\mu_1, \mu_2, \ldots\}$. You may assume that the prior may be separated such that $g(\mu, \sigma, s) = g(\mu) g(\sigma) g(s)$.

b) Calculate $\pi(\mathbf{x}, \mu, s, \sigma)$. Then write this expression in the limit where the natural variability is much greater than the measurement error ($s \ll \sigma$).

c) Finally, in this limit, write the expression for $\pi(\mathbf{x}, \mu, \sigma)$ and show that for this hierarchical model, the limit of $s\ll \sigma$ is equivalent to having a likelihood of $x_i \sim \text{Norm}(\mu, \sigma)\;\forall i$.

This problem shows how a hierarchical model can reduce to a non-hierarchical one. In this case, it helps you see what is being neglected when you choose not to use a hierarchical model (since so many models actually are hierarchical if you don't make approximations like the one derived here).

Problem 11.2: Hierarhical modeling of FRAP data, 30 pts extra credit¶

Redo Problem 6.3) using a hierarchical model, sampling out of the full posterior (instead of finding the MAP).

Problem 11.3: Another model for microtubule catastrophe, 50 pts extra credit¶

In Homework 8.2, we investigated the process by which microtubules undergod catastrophe using data from the Gardner, Zanic, et al. paper. In the file gardner_mt_catastrophe_only_tubulin.csv, we have observed catastrophe times of microtubules with different concentrations of tubulin. We used an Exponential, Gamma, and Weibull distribution to model the catastrophe times. In this problem, we will consider a different model. We assume that a each of $m$ Poisson processes must arrive sequentially in order for catastrophe to occur, where $m$ is integer. Poisson process $i$ arrives at rate $\beta_i$, meaning that it has characteristic arrive time $\tau_i = 1/\beta)_i$. While is it possible that $\beta_i = \beta_j$, we allow for all of the rates to be unequal. Your goal is to ascertain which values of $m$ are most plausible, given the data. This model more closely resembles how molecular events can trigger catastrophe, with $m$ being integer and the rates of chemical processes being different.

a) Explain why the probability distribution for catastrophe times for a three-step process ($m=3$) is

\begin{align} f(t\mid \tau_1, \tau_2, \tau_3, 3) = \frac{1}{\tau_1\tau_2\tau_3}\int_0^t\mathrm{d}t_1 \int_{t_1}^t\mathrm{d}t_2\, \mathrm{e}^{-t_1/\tau_1}\,\mathrm{e}^{-(t_2-t_1)/\tau_2}\,\mathrm{e}^{-(t-t_2)/\tau_3}. \end{align}

b) The above expression for general $m$ can be integrated, giving

\begin{align} f(t\mid \boldsymbol{\tau}, m) = \sum_{j=1}^m \frac{\tau_j^{m-2}\,\mathrm{e}^{-t/\tau_j}}{\prod_{k=1,k\ne j}^m (\tau_j - \tau_k)}. \end{align}

For clarity, the likelihoods for the first few $m$ are

\begin{align} f(t\mid \tau_1, 1) &= \frac{\mathrm{e}^{-t/\tau_1}}{\tau_1},\\[1em] f(t\mid \tau_1, \tau_2, 2) &= \frac{\mathrm{e}^{-t/\tau_1}}{\tau_1 - \tau_2} + \frac{\mathrm{e}^{-t/\tau_2}}{\tau_2 - \tau_1} = \frac{\mathrm{e}^{-t/\tau_2} - \mathrm{e}^{-t/\tau_1}}{\tau_2 - \tau_1} \\[1em] f(t\mid \tau_1, \tau_2, \tau_3, 3) &= \frac{\tau_1\,\mathrm{e}^{-t/\tau_1}}{(\tau_1 - \tau_2)(\tau_1-\tau_3)} +\frac{\tau_2\,\mathrm{e}^{-t/\tau_2}}{(\tau_2 - \tau_1)(\tau_2-\tau_3)} +\frac{\tau_3\,\mathrm{e}^{-t/\tau_3}}{(\tau_3 - \tau_1)(\tau_3-\tau_2)}. \end{align}

Note that these probability distributions assume that no two of the $\tau_j$'s are equal, and you should explicitly ensure this in your calculations. If any two $\tau_j$'s are equal, you need to take a limit, e.g.,

\begin{align} \lim_{\tau_2\to\tau_1} f(t\mid \tau_1, \tau_2, 2) &= \frac{t^2}{2\tau_1}\,\mathrm{e}^{-t/\tau_1}, \end{align}

in this case, a Gamma distribution.

Hint: You will need to refer to the Stan manual about how to use custom distributions. You will also want to use the logsumexp trick to prevent underflow or overflow in the calculation of your posterior.