This tutorial was generated from an IPython notebook. You can download the notebook here.
It is often the case that we wish to quantify properties of data taken over a time course, such as time derivatives (rates of change, velocity, etc.) or period (for periodic data). In some cases, we want to calculate these things from data varying over space as well. The problem is that the data often have noise, so computing derivatives numerically is difficult. Defining peaks (to be able to compute periods) is also difficult due to the data points going up and down with the noise.
One way of dealing with the noise, as we saw in Tutorial 4, is to perform a regression to get a smooth curve. But, what if we do not have a good function to fit for the regression, but we still want to extract parameters like deratives and periods? This is where smoothing is used. Smoothing is a specific example of the more general technique of nonparameteric estimation.
In practice many smoothing methods are heuristic, meaning that they are chosen in a bit of an ad hoc way based on what may work for a given data set. So long as the smoothing method is consistent across data sets that are being compared (and, of course, the raw data are also plotted and reported), this tends to work just fine. We will use a few smoothing methods in this tutorial, taking a more heuristic approach and not going into much detail about the theory. To learn more about smoothing techniques, you can read chapter 6 of Sivia, or the other references. Nonparametric estimation is a vast field, venturing into machine learning, among others.
In this tutorial, we will be focusing on two main methods of smoothing, kernel-based averaging and basis-function smoothing. Other methods exist, but we only have time to consider these here.
In this tutorial, we will use data from Weitz, et al., Diversity in the dynamical behaviour of a compartmentalized programmable biochemical oscillator, Nature Chemistry, 6, 295-302, 2014. In this paper, the authors engineered a synthetic two-switch negative-feedback oscillator circuit. When the components of this circuit are mixed together, the concentrations of the respective species oscillate in time. The concentration is reported by a fluorescently labeled DNA strand whose base-pairing state affects the fluorescent signal.
The authors did this experiment in bulk solution and observe oscillations. Interestingly, the also encapsulated the reaction mixture in small droplets ranging from 33 to 16000 µm$^3$. They observe differences in the oscillation period and amplitude in the small droplets when compared to bulk. This highlights the effect that statistical variation in the concentration of components in the droplets can have on observed dynamics. Understanding these effects are important as we seek to engineer cell-scale systems.
We will work with a subset of the data in the paper, studying droplets and bulk systems with sustained oscillations. We will first look at the bulk reaction data and compute the period of oscillation.
The data are included in the Tutorial 5 ZIP file.
As usual, we start by importing all of our favorite modules.
# As usual, import modules
from __future__ import division, absolute_import, \
print_function, unicode_literals
import numpy as np
import scipy.optimize
import scipy.interpolate
import matplotlib.pyplot as plt
import pandas as pd
import scikits.datasmooth as ds
from brewer2mpl import qualitative
import jb_utils as jb
# Necessary to display plots in this IPython notebook
%matplotlib inline
The bulk fluorescence are in the file bulk_trace_sustained_oscillation.csv
. Let's load it in.
# Load in bulk trace
fname = '../data/weitz_et_al/t5_data/bulk_trace_sustained_oscillation.csv'
df_bulk = pd.read_csv(fname, comment='#')
# Take a look
df_bulk.head()
We do a measurement every minute. Let's rename the columns, as usual, for convenience. And let's make our time unit minutes, instead of seconds.
# Rename columns
df_bulk.columns = ['time', 'fl']
# Convert time to minutes
df_bulk.time /= 60.0
Now let's plot the data to see what we're dealing with.
# Plot the trace for the bulk data
plt.plot(df_bulk.time, df_bulk.fl, 'k-')
plt.xlabel('time (min)')
plt.ylabel('norm. fluor.')
We see strong oscillations. Let's soom in on one period.
# Plot the trace for the bulk data
plt.plot(df_bulk.time, df_bulk.fl, 'k.')
plt.xlim((240, 400))
plt.xlabel('time (min)')
plt.ylabel('norm. fluor.')
As time traces go, these data are really smooth. Even so, let's try to compute a derivative using naive finite differencing.
# Compute derivatives of part of bulk trace using finite difference
bulk_deriv = np.diff(df_bulk.fl[240:400]) / np.diff(df_bulk.time[240:400])
# Plot numerical derivative
plt.plot(df_bulk.time[240:399] + np.diff(df_bulk.time[240:400]) / 2.0,
bulk_deriv, 'k-')
plt.xlabel('time (min)')
plt.ylabel('$\mathrm{d}f/\mathrm{d}t$ (sec$^{-1}$)')
We see that the derivative is much noiser! If we wanted to learn about the derivative, we should smooth the curve first. This serves as our motivation for our first foray into smoothing.
We will introduce our first smoothing technique. We will use a Nadaraya-Watson kernel estimator, which is essentially a weighted average of the data points.
When we have a high sampling rate, the extra fine resolution often just gives the noise. We could do a sliding average over the data. We already performed this operation in Tutorial 2 when we were looking at the fish sleep data. I will write this process more concretely. For ease of notation, we arrange our data points in time, i.e., index $i+1$ follows $i$ in time, with a total of $n$ data point. For data point $(x_i, y_i)$, we get a smoothed values $\hat{x}_i$ by
\begin{align} \hat{y}(x_0) = \frac{\sum_{i=1}^n K_\lambda(x_0, x_i)\, y_i}{\sum_{i=1}^n K_\lambda(x_0,x_i)} \end{align}
where $K_\lambda(x_0, x_i)$ is the kernel, or weight. If we took each data point to be the mean of itself and its 10 nearest-neighbors, we would have (neglecting end effects)
\begin{align} K_\lambda(x_j, x_i) = \left\{\begin{array}{cl} \frac{1}{11} & \left|i - j\right| \le 5 \\[1mm] 0 & \text{otherwise}. \end{array} \right. \end{align}
Commonly used Kernels are the Epanechnikov quadratic kernel, the tri-cube kernel, and the Gaussian kernel. We can write them generically as
\begin{align} K_\lambda(x_0, x_i) &= D\left(\frac{\left|x_i-x_0\right|}{\lambda}\right) \equiv D(t), \\[1mm] \text{where } t &= \frac{\left|x_i-x_0\right|}{\lambda}. \end{align}
These kernels are
\begin{align} \text{Epanechnikov:}\;\;\;\;\; D &= \left\{\begin{array}{cl} \frac{3}{4}\left(1 - t^2\right) & |t| \le 1 \\[1mm] 0 & \text{otherwise} \end{array}\right. \\[5mm] \text{tri-cube:}\;\;\;\;\;D&=\left\{\begin{array}{cl} \left(1 - \left|t\right|^3\right)^3 & |t| \le 1 \\[1mm] 0 & \text{otherwise} \end{array}\right. \\[5mm] \text{Gaussian:}\;\;\;\;\;D &= \mathrm{e}^{-t^2}. \end{align}
Let's code up functions for these kernels.
# Define our kernels
def epan_kernel(t):
"""
Epanechnikov kernel.
"""
return np.logical_and(t > -1.0, t < 1.0) * 3.0 * (1.0 - t**2) / 4.0
def tri_cube_kernel(t):
"""
Tri-cube kernel.
"""
return np.logical_and(t > -1.0, t < 1.0) * (1.0 - abs(t**3))**3
def gauss_kernel(t):
"""
Gaussian kernel.
"""
return np.exp(-t**2 / 2.0)
Now we can compare the respective kernels to get a feel for how the averaging goes.
# Set up time points
t = np.linspace(-4.0, 4.0, 200)
dt = t[1] - t[0]
# Compute unnormalized kernels
K_epan = epan_kernel(t)
K_tri_cube = tri_cube_kernel(t)
K_gauss = gauss_kernel(t)
# Adjust to approximately integrate to unity for ease of comparison
K_epan /= K_epan.sum() * dt
K_tri_cube /= K_tri_cube.sum() * dt
K_gauss /= K_gauss.sum() * dt
plt.plot(t, K_epan, '-', label='Epanechnikov')
plt.plot(t, K_tri_cube, '-', label='Tri-cube')
plt.plot(t, K_gauss, '-', label='Gaussian')
plt.legend(loc='upper right')
plt.xlabel('$t = |x-x_0|/\lambda$')
plt.ylabel('$K_\lambda(x_0, x)$');
From the scale of the $x$-axis in this plot, we see that the parameter $\lambda$ sets the width of the smoothing kernel. The bigger $\lambda$ is, the stronger the smoothing and the more fine detail will be averaged over. For this reason, $\lambda$ is sometimes called the bandwidth of the kernel. The Gaussian kernel is broad and has no cutoff like the Epanechnikov or tri-cube kernels. If therefore uses all of the data points.
We will now write a function to give the smoothed value of a function for a given kernel using Nadaraya-Watson.
def nw_kernel_smooth(x_0, x, y, kernel_fun, lam):
"""
Gives smoothed data at points x_0 using a Nadaraya-Watson kernel
estimator. The data points are given by NumPy arrays x, y.
kernel_fun must be of the form
kernel_fun(t),
where t = |x - x_0| / lam
This is not a fast way to do it, but it simply implemented!
"""
# Function to give estimate of smoothed curve at single point.
def single_point_estimate(x_0_single):
"""
Estimate at a single point x_0_single.
"""
t = np.abs(x_0_single - x) / lam
return np.dot(kernel_fun(t), y) / kernel_fun(t).sum()
# If we only want an estimate at a single data point
if np.isscalar(x_0):
return single_point_estimate(x_0)
else: # Get estimate at all points
y_smooth = np.empty_like(x_0)
for i in range(len(x_0)):
y_smooth[i] = single_point_estimate(x_0[i])
return y_smooth
All right, we're all set. Let's try applying some kernels to our data. We'll just use the bit of data we differentiated before.
# Define x and y values based on the section of the data we're using
t, fl = df_bulk.time[240:400].values, df_bulk.fl[240:400].values
# Let's get data at the same points as the actual data
t_0 = t
# Try three kernels with lambda = 5 minutes
lam = 5.0
fl_epan = nw_kernel_smooth(t_0, t, fl, epan_kernel, lam)
fl_tri_cube = nw_kernel_smooth(t_0, t, fl, tri_cube_kernel, lam)
fl_gauss = nw_kernel_smooth(t_0, t, fl, gauss_kernel, lam)
# Plot the results
plt.plot(t, fl, 'k.')
plt.plot(t_0, fl_epan, '-', label='Epanechnikov')
plt.plot(t_0, fl_tri_cube, '-', label='Tri-cube')
plt.plot(t_0, fl_gauss, '-', label='Gaussian')
plt.legend(loc='lower right')
plt.xlabel('time (min)')
plt.ylabel('norm. fluor.')
We see that all three kernels give smooth curves. The all suffer boundary effects, with the Gaussian kernel having the biggest problems. This is because the Gaussian kernel is broader. We can compute derivatives of the smooth curves.
# Numerically compute derivatives
deriv_epan = np.diff(fl_epan) / np.diff(t_0)
deriv_tri_cube = np.diff(fl_tri_cube) / np.diff(t_0)
deriv_gauss = np.diff(fl_gauss) / np.diff(t_0)
# Plot numerical derivative
plt.plot(t_0[:-1] + np.diff(t_0) / 2.0, deriv_epan, '-',
label='Epanechnikov')
plt.plot(t_0[:-1] + np.diff(t_0) / 2.0, deriv_tri_cube, '-',
label='Tri-cube')
plt.plot(t_0[:-1] + np.diff(t_0) / 2.0, deriv_gauss, '-', label='Gaussian')
plt.xlabel('time (min)')
plt.ylabel('$df/dt$ (sec$^{-1}$)')
The Gaussian kernel leads to smoother derivatives, which is generally true since it considers more data points. Let's try smoothing more aggressively, with $\lambda = 15$ minutes.
# Try three kernels with lambda = 5 minutes
lam = 15.0
fl_epan = nw_kernel_smooth(t_0, t, fl, epan_kernel, lam)
fl_tri_cube = nw_kernel_smooth(t_0, t, fl, tri_cube_kernel, lam)
fl_gauss = nw_kernel_smooth(t_0, t, fl, gauss_kernel, lam)
# Plot the results
plt.plot(t, fl, 'k.')
plt.plot(t_0, fl_epan, '-', label='Epanechnikov')
plt.plot(t_0, fl_tri_cube, '-', label='Tri-cube')
plt.plot(t_0, fl_gauss, '-', label='Gaussian')
plt.legend(loc='lower right')
plt.xlabel('time (min)')
plt.ylabel('norm. fluor.')
We see the Gaussian kernel results in the smoothed curve coming off of the data. Again, this is because it considers a wider window of points. Let's check the derivatives again.
# Numerically compute derivatives
deriv_epan = np.diff(fl_epan) / np.diff(t_0)
deriv_tri_cube = np.diff(fl_tri_cube) / np.diff(t_0)
deriv_gauss = np.diff(fl_gauss) / np.diff(t_0)
# Plot numerical derivative
plt.plot(t_0[:-1] + np.diff(t_0) / 2.0, deriv_epan, '-',
label='Epanechnikov')
plt.plot(t_0[:-1] + np.diff(t_0) / 2.0, deriv_tri_cube, '-',
label='Tri-cube')
plt.plot(t_0[:-1] + np.diff(t_0) / 2.0, deriv_gauss, '-', label='Gaussian')
plt.xlabel('time (min)')
plt.ylabel('$\mathrm{d}f/\mathrm{d}t$ (sec$^{-1}$)')
The tri-cube and Epanechinov derivatives are smoother now and more accurate (though not at the tails because of edge effects).
Now let's look at some droplet data. The authors performed a number of filters to pick out the good data. I.e., they wanted to make sure there we no artifacts from the imaging such as droplets going out of focus, moving on top of each other, etc. We will simply filter out droplets that didn't last the whole imaging session and those whose area fluctuates too much. This is a nice exercise to remind you of some of the magic of DataFrames.
There are two data sets. One set has the areas of all the droplets and the other has the integrated intensity.
# Load in droplet data
df = pd.read_csv('../data/weitz_et_al/t5_data/intensity.csv')
df_area = pd.read_csv('../data/weitz_et_al/t5_data/area.csv')
# Only consider droplets that lasted the entire 1000 minutes
df = df.ix[:, df_area.iloc[-1] != 0.0]
df_area = df_area.ix[:, df_area.iloc[-1] != 0.0]
# Compute stdev to mean ratio
std_mean_ratio = df_area.ix[:,1:].std(axis=0) / df_area.ix[:,1:].mean(axis=0)
# Only keep droplets with std/mean < 10%
df = df.ix[:, np.concatenate(((True,), std_mean_ratio < 0.1))]
df_area = df_area.ix[:, np.concatenate(((True,), std_mean_ratio < 0.1))]
# Rename columns for convenient indexing
df.columns = np.arange(-1,len(df.columns)-1)
df.rename(columns={-1: 'time'}, inplace=True)
df_area.columns = np.arange(-1,len(df_area.columns)-1)
df_area.rename(columns={-1: 'time'}, inplace=True)
# Change time units to minutes (5 minutes per slice)
df.time *= 5.0
df_area.time *= 5.0
Let's look at the traces. We'll plot the first 10.
# Plot all the traces with thin lines. Normalize the intensities
for i in range(10):
plt.plot(df.time, df[i] / df[i].max(), '-', lw=0.5)
plt.xlabel('time (s)')
plt.ylabel('norm. fluor.')
For a couple of them, we see their signal just die away. One of them has a precipitous drop in the signal. We could come up with ways to eliminate these from our data sets (and you should think about how to do that), but we'll just take a few example ones for now. Let's choose traces 2 and 8.
# Traces we want to analyze
tr = (2, 8)
# Plot traces
bmap = qualitative.Set1[3]
plt.plot(df.time, df[tr[0]], '-', color=bmap.mpl_colors[0])
plt.plot(df.time, df[tr[1]], '-', color=bmap.mpl_colors[1])
plt.xlabel('time (min)')
plt.ylabel('fluor. (a.u.)')
For ease of analysis later, and generally because it is a good idea, the authors performed background subtraction on the data. This involves strongly smoothing the data, e.g., witha Gaussian kernel, and subtracting the result form the data. Let's do that for our first trace.
# Strongly smooth trace with Gaussian kernel (use bandwidth of 250 minutes)
t_0 = df.time
bg_1 = nw_kernel_smooth(t_0, df.time, df[tr[0]], gauss_kernel, 250.0)
bg_2 = nw_kernel_smooth(t_0, df.time, df[tr[1]], gauss_kernel, 250.0)
# Plot the result to see backgound
plt.plot(df.time, df[tr[0]], '-', color=bmap.mpl_colors[0])
plt.plot(t_0, bg_1, '-', color=bmap.mpl_colors[0])
plt.plot(df.time, df[tr[1]], '-', color=bmap.mpl_colors[1])
plt.plot(t_0, bg_2, '-', color=bmap.mpl_colors[1])
plt.xlabel('time (min)')
plt.ylabel('norm. fluor.')
We see that computing the background can correct for small decreases in signal over time. We now plot the background-subtracted signals.
# Define bg-subtracted signals
f_1 = df[tr[0]].values - bg_1
f_2 = df[tr[1]].values - bg_2
# Plot them
plt.plot(df.time, f_1, '-', color=bmap.mpl_colors[0])
plt.plot(df.time, f_2, '-', color=bmap.mpl_colors[1])
plt.xlabel('time (min)')
plt.ylabel('norm. fluor.')
plt.grid(True)
Let's try smoothing the background-subtracted signal with an Epanechnikov kernel with a 40 minute bandwidth.
# Make the smooth curves
t_0 = df.time
epan_1 = nw_kernel_smooth(t_0, df.time, f_1, epan_kernel, 30.0)
epan_2 = nw_kernel_smooth(t_0, df.time, f_2, epan_kernel, 30.0)
# Plot them!
plt.plot(df.time, f_1, '-', color=bmap.mpl_colors[0], lw=0.5)
plt.plot(df.time, f_2, '-', color=bmap.mpl_colors[1], lw=0.5)
plt.plot(t_0, epan_1, '-', color=bmap.mpl_colors[0], lw=2)
plt.plot(t_0, epan_2, '-', color=bmap.mpl_colors[1], lw=2)
plt.grid(True)
plt.xlabel('time (min)')
plt.ylabel('norm. fluor.')
We'll now look at another smoothing strategy. A function $f(x)$ (or data) can generally be written as an expansion of $M$ orthogonal basis functions $h_m$, such that
\begin{align} f(x) = \sum_{m=1}^M \beta_m\,h_m(x), \end{align}
where $\beta_m$ are coefficients. Examples of basis functions you may be familiar with are sines/cosines (used to construct a Fourier series), Legendre polynomials, spherical harmonics, etc. The idea behind smoothing with basis functions is to control the complexity of $f(x)$ by restricting the number of basis functions or selecting certain basis functions to use. This restricted $f(x)$, being less complex than the data, is smoother.
We will look at smoothing with two different types of basis functions. First, we will approximate the data with piece-wise cubic functions. An approximation from this procedure is called a spline. If we have $k$ joining points of cubic functions (called knots), the data are rpresented with $k$ basis functions. The second type of basis function we will consider is wavelets. Wavelet methods are particularly useful if the data have both bumpy and smooth regions with dynamics we wish to capture.
The scipy.interpolate
modules has spline tools. Importantly, it selects the number of knots to use until a smoothing condition is met. This condition is defined by
\begin{align} \sum_{i\in D} \left[w_i(y_i - \hat{y}(x_i))\right]^2 \le s, \end{align}
where $w_i$ is the weight we wish to assign to data point $i$, and $\hat{y}(x_i)$ is the smoothed function at $x_i$. We will take all $w_i$'s to be equal in our analysis. A good rule of thumb is to choose
\begin{align} s = f\sum_{i\in D} y_i^2, \end{align} where $f$ is roughly the fractional difference between the rough and smooth curves.
Let's use splines to smooth our first droplet trace!
# Determine smoothing factor from rule of thumb (use f = 0.05)
smooth_factor_1 = 0.05 * (f_1**2).sum()
smooth_factor_2 = 0.05 * (f_2**2).sum()
# Set up an scipy.interpolate.UnivariateSpline instance
s_1 = scipy.interpolate.UnivariateSpline(df.time, f_1, s=smooth_factor_1)
s_2 = scipy.interpolate.UnivariateSpline(df.time, f_2, s=smooth_factor_2)
# s_1 is now a callable function
f_1_spline = s_1(df.time)
f_2_spline = s_2(df.time)
# Plot results
plt.plot(df.time, f_1, '-', color=bmap.mpl_colors[0], lw=0.5)
plt.plot(df.time, f_1_spline, '-', color=bmap.mpl_colors[0])
plt.plot(df.time, f_2, '-', color=bmap.mpl_colors[1], lw=0.5)
plt.plot(df.time, f_2_spline, '-', color=bmap.mpl_colors[1])
plt.grid(True)
plt.xlabel('time (min)')
plt.ylabel('norm. fluor.')
Conveniently, the UnivariateSpline
object has build-in derivatives (and integrals and other goodies; check out the documentation.
# Get derivative splines (argument 1 means first deriv)
s_1_deriv = s_1.derivative(1)
s_2_deriv = s_2.derivative(1)
# Compute derivs
f_1_deriv = s_1_deriv(df.time)
f_2_deriv = s_2_deriv(df.time)
# Plot derivs
plt.plot(df.time, f_1_deriv, '-', color=bmap.mpl_colors[0])
plt.plot(df.time, f_2_deriv, '-', color=bmap.mpl_colors[1])
plt.grid(True)
plt.xlabel('time (min)')
plt.ylabel(r'$\mathrm{d}f/\mathrm{d}t$ (s$^{-1}$)')
We will not go into the details of the theory behind wavelets here, but will say that these are convenient basis functions because the capture both time (space) and frequency localization. As a result, wavelets tend to capture strong jumps in the data that would be smoothed out by our other smoothing methods.
There are many ways of doing wavelet-based smoothing, but we will use the VisuShrink method, developed by Donoho and Johnstone. The idea is to shrink the set of basis functions to achieve smoothness. Under the hood, we do a wavelet transform to reperesent the data with a complete basis set of wavelets. We then prune the coefficients to only include those representing major trens in the data. We take the inverse wavelet transform to get our smoothed data.
I wrote a function to do wavelet smoothing that uses the PyWavelets
module. Importantly, the threshold_factor
keyword argument can be used to control smoothness. The larger this factor is, the more fine detail will be lost in the smoothing. You can also try different type of wavelet basis functions. I find sym15 works well. (You can view the shapes of wavelets at this website).
# Use VisuShink to smooth
f_1_wavelet = jb.visushrink(f_1, threshold_factor=1, thresh_method='hard',
wavelet='sym15')
f_2_wavelet = jb.visushrink(f_2, threshold_factor=1, thresh_method='hard',
wavelet='sym15')
# Plot them!
plt.plot(df.time, f_1, '-', color=bmap.mpl_colors[0], lw=0.5)
plt.plot(df.time, f_2, '-', color=bmap.mpl_colors[1], lw=0.5)
plt.plot(df.time, f_1_wavelet, '-', color=bmap.mpl_colors[0], lw=2)
plt.plot(df.time, f_2_wavelet, '-', color=bmap.mpl_colors[1], lw=2)
plt.xlabel('time (min)')
plt.ylabel('norm. fluor.')
The final smoothing method we will use is smoothing by regularization. The concept here is simple. We define an objective function
\begin{align} Q(\hat{y}) = \int_{x_1}^{x_n} \mathrm{d}x \,\left|y_i - \hat{y}(x)\right|^2 + \lambda \int_{x_1}^{x_n} \mathrm{d}x \,\left|\hat{y}^{(d)}(x)\right|^2, \end{align}
where $\hat{y}^{(d)}(x)$ is the order $d$ derivative of $\hat{y}(x)$. In practice, these integrals and derivatives are approximated numerically on the discrete points. The goal is to find the values of $\hat{y}(x)$ that minimize $Q(\hat{y})$. The first integral keeps $\hat{y}$ true to the real data $\{y_i\}$. The second integral penalizes large fluctuations in $\hat{y}(x)$, which would be marked by large higher-order derivatives.
The package scikits.datasmooth
performs this type of analysis. Importantly, the points where you want smooth data (passed a x_hat
into scikits.datasmooth
) must by evenly-spaced. It must also be a NumPy array (it doesn't like DataFrames). This is because of the way derivatives and quadrature are implemented.
The smoothing factor $\lambda$ is automatically chosen using some heuristics unless passed as the lmbd
keyword argument.
# Smooth data using regulatization using fourth derivs
d = 4
x_hat = df.time.values
f_1_reg, lam_1 = ds.smooth_data(df.time, f_1, d, xhat=x_hat)
f_2_reg, lam_2 = ds.smooth_data(df.time, f_2, d, xhat=x_hat)
# Plot results
plt.plot(df.time, f_1, '-', color=bmap.mpl_colors[0], lw=0.5)
plt.plot(df.time, f_2, '-', color=bmap.mpl_colors[1], lw=0.5)
plt.plot(df.time, f_1_reg, '-', color=bmap.mpl_colors[0], lw=2)
plt.plot(df.time, f_2_reg, '-', color=bmap.mpl_colors[1], lw=2)
plt.xlabel('time (min)')
plt.ylabel('norm. fluor.')
# Print value of lambda to the screen
print("""
lam_1 = {0:g}
lam_2 = {1:g}
""".format(lam_1, lam_2))
We can dial $\lambda$ up a bit to get more smoothing.
# Smooth data using regulatization using fourth derivs
d = 4
x_hat = df.time.values
f_1_reg = ds.smooth_data(df.time, f_1, d, xhat=x_hat, lmbd=1e-6)
f_2_reg = ds.smooth_data(df.time, f_2, d, xhat=x_hat, lmbd=1e-6)
# Plot results
plt.plot(df.time, f_1, '-', color=bmap.mpl_colors[0], lw=0.5)
plt.plot(df.time, f_2, '-', color=bmap.mpl_colors[1], lw=0.5)
plt.plot(df.time, f_1_reg, '-', color=bmap.mpl_colors[0], lw=2)
plt.plot(df.time, f_2_reg, '-', color=bmap.mpl_colors[1], lw=2)
plt.xlabel('time (min)')
plt.ylabel('norm. fluor.')