Homework 7.2: Models for microtubule catastrophe (70 pts)

Data set download

We have thoroughly investigated the process by which microtubules undergo catastrophe using data from the Gardner, Zanic, et al. paper. We used an Exponential, Gamma, Weibull, and our custom two-step distribution to model the catastrophe times. We have consistently found that the Gamma model works best. It does have the shortcoming,though, that it does not directly match a story that might arise from chemical kinetics, which we strongly suspect would regulate microtubule castastrophe. We expect an integer number \(m\) of Poisson processes to arrive sequentially in order for catastrophe to occur. The Exponential and two-step model are special cases we have already considered where \(m = 1\) and \(m = 2\), respectively. In this problem, we will consider a model for arbitrary \(m\) and assess which \(m\) gives the most plausible generative model. We will again use the experiment run with a tubulin concentration of 12 µM as our data set.

The probability density function for the time \(t\) to catastrophe triggered by the arrival of \(m\) Poisson processes may be shown to be

\begin{align} f(t\mid \boldsymbol{\beta}, m) = \sum_{j=1}^m \frac{\mathrm{e}^{-\beta_j t}}{\beta_j^{m-2}\,\prod_{k=1,k\ne j}^m \frac{\beta_k - \beta_j}{\beta_j\beta_k}}. \end{align}

The expression is a bit cleaner when written in terms of \(\tau_i = 1/\beta_i\).

\begin{align} f(t\mid \boldsymbol{\tau}, m) = \sum_{j=1}^m \frac{\tau_j^{m-2}\,\mathrm{e}^{-t/\tau_j}}{\prod_{k=1,k\ne j}^m (\tau_j - \tau_k)}. \end{align}

For clarity, the likelihoods for the first few \(m\) are

\begin{align} f(t\mid \tau_1, 1) &= \frac{\mathrm{e}^{-t/\tau_1}}{\tau_1},\\[1em] f(t\mid \tau_1, \tau_2, 2) &= \frac{\mathrm{e}^{-t/\tau_1}}{\tau_1 - \tau_2} + \frac{\mathrm{e}^{-t/\tau_2}}{\tau_2 - \tau_1} = \frac{\mathrm{e}^{-t/\tau_2} - \mathrm{e}^{-t/\tau_1}}{\tau_2 - \tau_1}, \\[1em] f(t\mid \tau_1, \tau_2, \tau_3, 3) &= \frac{\tau_1^\,\mathrm{e}^{-t/\tau_1}}{(\tau_1 - \tau_2)(\tau_1-\tau_3)} +\frac{\tau_2\,\mathrm{e}^{-t/\tau_2}}{(\tau_2 - \tau_1)(\tau_2-\tau_3)} +\frac{\tau_3\,\mathrm{e}^{-t/\tau_3}}{(\tau_3 - \tau_1)(\tau_3-\tau_2)},\\[1em] f(t\mid \tau_1, \tau_2, \tau_3, \tau_4, 4) &= \frac{\tau_1^2\,\mathrm{e}^{-t/\tau_1}}{(\tau_1 - \tau_2)(\tau_1-\tau_3)(\tau_1 - \tau_4)} +\frac{\tau_2^2\,\mathrm{e}^{-t/\tau_2}}{(\tau_2 - \tau_1)(\tau_2-\tau_3)(\tau_2 - \tau_4)} \\[1em] &\;\;\;\;\;+\frac{\tau_3^2\,\mathrm{e}^{-t/\tau_3}}{(\tau_3 - \tau_1)(\tau_3-\tau_2)(\tau_3 - \tau_4)} +\frac{\tau_4^2\,\mathrm{e}^{-t/\tau_4}}{(\tau_4 - \tau_1)(\tau_4-\tau_2)(\tau_4 - \tau_3)}. \end{align}

Note that these probability distributions assume that none of the \(\tau_j\)’s are equal, and you should explicitly ensure this in your calculations (Hint: You may want to read the solutions to Homework 5.1, in particular the Stan model for the two-step model, to see how to implement this.)

a) Build a model for arbitrary \(m\) and code it up in Stan. Draw samples out of this model for values of \(m\) ranging from 1 to whatever you think is reasonable.

b) Compare the models for various \(m\). Which value of \(m\) gives the best predictive model based on a model comparison problem?

c) This is another example where I think the model comparison is unnecessary and should actually be avoided. Without directly doing a model comparison by computing a LOO or WAIC like you did in part (b), interpret the results of your sampling to advocate for a physical picture of how catastrophe proceeds.