(c) 2018 Justin Bois. With the exception of pasted graphics, where the source is noted, this work is licensed under a Creative Commons Attribution License CC-BY 4.0. All code contained herein is licensed under an MIT license.
This document was prepared at Caltech with financial support from the Donna and Benjamin M. Rosen Bioengineering Center.
This homework was generated from an Jupyter notebook. You can download the notebook here.
Remind yourself about Homework 4.2, in which you computed the growth and division events of Caulobacter crescentus over time using date from this paper. In this problem, you will use your results from the image processing of those data sets to perform parameter estimation of the growth rates of the individual mother cells and also determine if bacterial growth on a single cell level is linear or exponential. You should use your own results form homework 4 for this problem, but in the event that you had trouble getting those results, you can use these results from the HW4 solutions.
We know that under ideal conditions, bacterial cells experience exponential growth in bulk. That is, the number of cells grows exponentially. This is possible regardless of how individual cells growth; the repeated divisions lead to exponential growth. In their paper, the authors argue that the growth rate of each cell is also exponential. I.e.,
\begin{align} a(t) = a_0 \mathrm{e}^{k t}, \end{align}
where $a(t)$ is the area of the cell in the image as a function of time and $a_0$ is the area of the cell right after a division has been completed, which we mark as $t = 0$.
As an alternative model, the authors consider a linear growth model, in which
\begin{align} a(t) = a_0 + b t. \end{align}
An exponential curve is approximately linear (with $b = a_0k$) for short time scales. So, it is often difficult to distinguish between a linear and an exponential growth. Your goal is to perform parameter estimates and do an effective comparison between these two models for growth. You should use hierarchical models, and be sure to take a principled approach in your model construction and evaluation.
Since you are using a hierarchical model, here are a few tips for building and implementing the models. You do not need to take this advice if you do not want to, but I have found that these strategies help.
We often want to ascertain how tightly two proteins are bound by measuring their dissociation constant, $K_d$. This is usually done by doing a titration experiment and then performing a regression. For example, imagine two proteins, a and b may bind to each other in the reaction
\begin{align} \text{ab} \rightleftharpoons \text{a} + \text{b} \end{align}
with dissociation constant $K_d$. At equilibrium
\begin{align} K_d = \frac{c_a\,c_b}{c_{ab}}, \end{align}
were $c_i$ is the concentration of species $i$. If we add known amounts of a and b to a solution such that the total concentration of a is $c_a^0$ and the total concentration of b is $c_b^0$, we can compute the equilibrium concentrations of all species. Specifically, in addition to the equation above, we have conservation of mass equations,
\begin{align} c_a^0 &= c_a + c_{ab}\\[1em] c_b^0 &= c_b + c_{ab}, \end{align}
fully specifying the problem. We can solve the three equations for $c_{ab}$ in terms of the known quantities $c_a^0$ and $c_b^0$, along with the parameter we are trying to measure, $K_d$. We get
\begin{align} c_{ab} = \frac{2c_a^0\,c_b^0}{K_d+c_a^0+c_b^0 + \sqrt{\left(K_d+c_a^0+c_b^0\right)^2 - 4c_a^0\,c_b^0}}. \end{align}
The technique, then, is to hold $c_a^0$ fixed in the experiment and measure $c_{ab}$ for various $c_b^0$. We can then perform a regression to get $K_d$.
In order to do this, though, we need some readout of $c_{ab}$. For this problem, we will use FRET (fluorescence resonance energy transfer) to monitor how much of a is bound to b. Specifically, we consider a to have a fluorophore and b to be its receptor. When the two are unbound, we get a fluorescence signal per molecule of $f_0$. When they are bound, the receptor absorbs the light coming out of the fluorophore, so we get less fluorescence per molecule, which we will call $f_q$ (for "quenched"). Let $f$ be the total per-fluorophore fluorescence signal. Then, the measured fluorescence signal, $F$, is
\begin{align} F = c_a^0\,V f = \left(c_a \,f_0 + c_{ab}\, f_q\right)V, \end{align}
where $V$ is the reaction volume. We can absorb $V$ into the other parameters such that $\hat{f}_0 = f_0 V$ and $\hat{f}_q = f_q V$, giving
\begin{align} F = \hat{f}_0(c_a^0 - c_{ab}) + \hat{f}_q\, c_{ab} = \hat{f}_0\,c_a^0 - \frac{2(\hat{f}_0 - \hat{f}_q)c_a^0\,c_b^0}{K_d+c_a^0+c_b^0 + \sqrt{\left(K_d+c_a^0+c_b^0\right)^2 - 4c_a^0\,c_b^0}}. \end{align}
Compute parameter estimates for $K_d$ with and without an outlier detection scheme for this data set. How do the results differ depending on whether or not you were trying to detect outliers?
Note: These are real data, but they are from an unpublished experiment here on campus. I therefore have not exposed the identities of the proteins a and b.