This homework was generated from an IPython notebook. You can download the notebook here.
Write down your goals for the class. Is there something that has been confusing for you that you would like cleared up? Are there specific techniques you would like to learn?
Each member of your group should write his or her own response, but the responses should be turned in together.
We will soon be doing regression analysis. We will have a set of $(x,y)$ data and a model that we think describes the observed trends in the data. For example, we may think that $y$ depends linearly on $x$, so we would propose
\begin{align} y(x) = a x + b, \end{align}
where $a$ and $b$ are parameters.
In order to do the regression, we will need to write a Python function of the form f(p, x)
, where p
is a NumPy array containing the fit parameters. For example, if we wanted to make a linear function, we might define the following.
def lin_func(p, x):
"""
Returns p[0] * x + p[1].
"""
a, b = p
return a * x + b
One of the tricks is that your function should work if x
is a single number or a NumPy array. In the above example, it does, as we can see by plotting.
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
# Make a set of evenly spaced points in x
x = np.linspace(-1.0, 2.0, 50)
# Compute y
y = lin_func(np.array([7.0, -3.0]), x)
# Plot as dots to verify it was calculated for each value of x
plt.plot(x, y, 'k.')
plt.grid(True)
plt.margins(x=0.02, y=0.02)
plt.xlabel(r'$x$')
plt.ylabel(r'$y$');
Write Python functions of this form (f(p, x)
) for the following functions and make smooth plots of them for a few sets of parameter values over appropriate ranges of $x$ values. If you think it is appropriate, plot the functions on a logarithmic or semilogarithmic scale. (Check out functions like plt.loglog
and plt.semilogy
for this sort of thing.) Whatever you choose, give an explanation as to why you chose to plot the function the way you did.
a) Exponential decay + background signal:
\begin{align} y = a + b\,\mathrm{e}^{-x/\lambda} \end{align}
b) The Cauchy distribution:
\begin{align} y = \frac{\beta}{\pi\left(\beta^2 + (x - \alpha)^2\right)} \end{align}
c) The Hill function:
\begin{align} y = \frac{x^\alpha}{k^\alpha + x^\alpha}. \end{align}
Throughout the class, we will analyze data from several sources. We will look at some data sets over and over again because there is plenty of interesting data analysis to be done. One of these data sets comes from this paper by Gardner, Zanic, and coworkers. The full reference is: Gardner, Zanic, et al., Depolymerizing kinesins Kip3 and MCAK shape cellular microtubule architecture by differential control of catastrophe, Cell, 147, 1092-1103, 2011, 10.1016/j.cell.2011.10.037.
We will discuss the paper more throughout the class, and I encourage you to read it. Briefly, the authors investigated the dynamics of microtubule catastrophe, the switching of a microtubule from a growing to a shrinking state. In particular, they were interested in the time between the start of growth of a microtubule and the catastrophe event. They monitored microtubules by using tubulin (the monomer that comprises a microtubule) that was labeled with a fluorescent marker. As a control to make sure that fluorescent labels and exposure to laser light did not affect the microtubule dynamics, they performed a similar experiment using differential interference contrast (DIC) microscopy. They measured the time until catastrophe with labeled and unlabeled tubulin.
In this problem, we will look at the data used to generate Fig. 2a of their paper. In the end, we will generate a plot similar to Fig. 2a.
a) If you haven't already, download the data file here. Read the data from the data file into a convenient form, either a pandas
DataFrame or NumPy arrays.
b) Plot histograms of the catastrophe times for the experiments with labeled and unlabeled tubulin. Try different settings of the plotting parameters to see what works best. In particular, you might want to experiment with the bins
, normed
, and histtype
keyword arguments. You can show a few candidates for how you would display the data. For your "official" histogram(s), discuss the design decisions you made to plot it the way you did.
c) Plot cumulative histograms for the labeled and unlabeled experiments using the same binning you used in part (b). Hint: You might find the cumulative
keyword argument of plt.hist
useful.
d) Plot cumulative histograms as in Fig. 2a of the Gardner, Zanic, et al. paper. You do not need to plot the inset of that figure. Hint: Think about how to compute a cumulative histogram with no binning. The np.arange
function might be useful.