Tutorial 4a: Basic image filtering and thresholding

(c) 2018 Justin Bois. With the exception of pasted graphics, where the source is noted, this work is licensed under a Creative Commons Attribution License CC-BY 4.0. All code contained herein is licensed under an MIT license.

This document was prepared at Caltech with financial support from the Donna and Benjamin M. Rosen Bioengineering Center.

This tutorial was generated from a Jupyter notebook. You can download the notebook here.

In [1]:
import numpy as np

# Image processing tools
import skimage
import skimage.io
import skimage.filters
import skimage.morphology

import bebi103

import bokeh
bokeh.io.output_notebook()
Loading BokehJS ...

The data set

We will look a a snapshot of some Bacillus subtilis cells taken by Jin Park in the Elowitz lab. These data can be downloaded here and were used in this publication.

There are three images.

  1. A phase contrast image for reference.
  2. An RFP image for used segmentation.
  3. A YFP image that is also used for quantification.

Note that the phase and YFP channels have 2$\times$2 binning. This means that each pixel in the image comes from combining the values of a 2$\times$2 array of pixels. This is often done when signal is low to boost the signal to noise ratio.

The interpixel distance is 0.065 µm for the RFP image, which means that it is 0.13 µm for the pnase and YFP images.

Let's have a look

Let's take a look at the images. We will use skimage.io.imread to load the images.

In [2]:
im_names = ['phase', 'rfp', 'yfp']
interpixel_distances = [0.13, 0.065, 0.13]
im_list = [f'../data/bacillus_images/{name}.tif' for name in im_names]

# Load images and store in list [phase, RFP, CFP, YFP]
ims = [skimage.io.imread(im_name) for im_name in im_list]

# Display images
plots = [bebi103.viz.imshow(im,
                            plot_height=300,
                            title=name,
                            interpixel_distance=ip,
                            length_units='µm')
             for name, ip, im in zip(im_names, interpixel_distances, ims)]

# Link axes for zooming
for i in [1, 2]:
    plots[i].x_range = plots[0].x_range
    plots[i].y_range = plots[0].y_range
    
bokeh.io.show(bokeh.layouts.gridplot(plots, ncols=2))

We notice that the binning reduces the dimensions of the phase, CFP, and YFP channels. We have just "plotted" the images, but let's get some more information. First, skimage.io automatically detected data types. Let's see what it detected.

In [3]:
# Print the data types of the images
for name, im in zip(im_names, ims):
    print(name, ':', im.dtype)
phase : uint16
rfp : uint16
yfp : uint16

They are all 16-bit images. We can learn more about the pixel values by looking at their histograms. We we talk about the histogram of an image, we are referring to the counts of the pixel values in the image. To compute the histograms, we use the skimage.exposure.histogram() function.

In [4]:
# Compute histograms
hist_bins = [skimage.exposure.histogram(im) for im in ims]

def plot_hist(hist_bin, title, y_axis_type='linear'):
    """Make plot of image histogram."""
    p = bokeh.plotting.figure(plot_height=300,
                              plot_width=400,
                              y_axis_type=y_axis_type,
                              x_axis_label='intensity',
                              y_axis_label='count',
                              title=title)
    hist, bins = hist_bin
    p.line(bins, hist, line_width=2)

    return p

# Display histograms
plots = [plot_hist(hist_bin, name) for name, hist_bin in zip(im_names, hist_bins)]
bokeh.io.show(bokeh.layouts.gridplot(plots, ncols=2))

We see that the fluorescence channels have a sharp peak at low intensity values. This is the black background. Conversely, the phase image has a big peak at a higher intensity, which is the white background.

It is often easier to look at the histograms on a log scale.

In [5]:
plots = [plot_hist(hist_bin, name, 'log') for name, hist_bin in zip(im_names, hist_bins)]
bokeh.io.show(bokeh.layouts.gridplot(plots, ncols=2))

It is now easier to see the features (bacteria) in the histograms. For the fluorescence channels, these are the higher intensities. For the phase image, they are the lower intensities. The histograms are just another way of "plotting" an image, and can provide insights. In this case, we see that the images use nowhere near the full range of pixel values for 16-bit images (0 to 65535); these are probably 12 bit images.

For the next portion of this tutorial, we will focus on the phase image to learn some basic image processing techniques.

Cleaning images

The fluorescence images are very "clean." They have great signal to noise, and the boundaries between bacteria and background are clear. The phase image is a bit dirtier. We will use the phase image to illustrate some basic techniques in cleaning up images and to introduce some of the basic concepts in morphological image processing.

Let's start by taking a closer look at it.

In [6]:
# Rename ims[0] to im for convenience
im = np.copy(ims[0])

# Display phase image
bokeh.io.show(bebi103.viz.imshow(im, interpixel_distance=0.13, length_units='µm'))

Background subtraction

We see some unevenness in illumination, going from the upper right to the lower left. We might correct this by computing the background and doing a background subtraction. We compute the background analogously to how we computed it when we were doing time series smoothing. We use a strong Gaussian smoothing and subtract it. The principle is the same. The differences are only in the details.

  1. When you apply a Gaussian filter in image processing, it is often called a Gaussian blur.
  2. The Gaussian filter is implemented in skimage.filters.gaussian(). Its first argument is the image to filter and the second is sigma, which is the standard deviation of the Gaussian filter. As is always the case in using built-in filters and image processing tools, everything is in units of pixels.
  3. When applying a Gaussian filter, or any continuous transform to an image, the integer nature of the pixel values is typically lost. skimage will automatically convert the images to floats upon filtering. This can be dangerous, though. It assumes that because the image is 16-bit that the maximal pixel value allowed is 65535. It then makes a linear mapping of pixel values where 0 maps to 0 and 65535 maps to unity. I prefer to have more control over data types and convert my images ahead of time. To be safe, because I will be comparing many different snapshots, I will allow the maximal pixel value to be 65535, which means that all of our floating point pixel values will be close to each other, sacrificing about two digits worth of numerical precision. This is not a big deal, and it's worth it not to risk improperly scaling images. A word of warning, though: some of the image processing routines are designed to only work on integer pixel values, so you should always be careful about data types when doing image processing operations! Read the docs!

We will try $\sigma = 50$ pixels to give a strong blur.

In [7]:
# Convert image to float
im_float = skimage.img_as_float(im)

# Filter the image with a strong Gaussian blur
im_bg = skimage.filters.gaussian(im_float, 50.0)

def show_two_ims(im_1, im_2, titles=[None, None], interpixel_distances=[0.13, 0.13],
                 color_mapper=None):
    """Convenient function for showing two images side by side."""
    p_1 = bebi103.viz.imshow(im_1,
                             plot_height=300,
                             title=titles[0],
                             color_mapper=color_mapper,
                             interpixel_distance=interpixel_distances[0],
                             length_units='µm')
    p_2 = bebi103.viz.imshow(im_2,
                             plot_height=300,
                             title=titles[1],
                             color_mapper=color_mapper,
                             interpixel_distance=interpixel_distances[0],
                             length_units='µm')
    p_2.x_range = p_1.x_range
    p_2.y_range = p_1.y_range
    
    return bokeh.layouts.gridplot([p_1, p_2], ncols=2)

bokeh.io.show(show_two_ims(im_float, im_bg, titles=['original', 'background']))

We see that the blurred image roughly gives the background luminescence. We can subtract this from the original image and view it.

In [8]:
# Subtract background
im_no_bg = im_float - im_bg

# Show images
bokeh.io.show(show_two_ims(im_float, im_no_bg, titles=['original', 'bg subtracted']))

Now the background is more uniform. This is a simple way of doing background subtraction. You can read about another technique using morphological reconstruction here.

Denoising images

If we zoom in on the image, we can see that there is some noise, particularly in the background. We will use the same zoomed portion of the image over and over, so we will make a NumPy slice option to make it easier to select.

In [9]:
# Make slice object
zoom = np.s_[75:175, 175:275]

# Display zoom phase image
bokeh.io.show(
    bebi103.viz.imshow(im[zoom], interpixel_distance=0.13, length_units='µm'))

A simple way to reduce the noise is to apply a Gaussian filter, this time, of course, with a smaller $\sigma$. We will try $\sigma = 1$ pixel.

In [10]:
# Filter image
im_filt_gauss = skimage.filters.gaussian(im_float, 1)

# Show filtered image
bokeh.io.show(show_two_ims(im_float[zoom],
                           im_filt_gauss[zoom],
                           titles=['original', 'filtered']))

Next, we'll try a total variation filter. The idea is that areas with large gradients are smoothed out, as this implies rapid fluctuation of pixel intensity values (noise). We will use a Chambolle total variation filter. This is implemented in skimage.restoration.denoise_tv_chambolle(). The first argument is the image to filter. The second argument describes the degree of smoothing.

In [11]:
# Filter image
im_filt_tv = skimage.restoration.denoise_tv_chambolle(im_float, 0.001)

# Show filtered image
bokeh.io.show(show_two_ims(im_float[zoom],
                           im_filt_tv[zoom],
                           titles=['original', 'tv filtered']))
/Users/Justin/anaconda3/lib/python3.7/site-packages/skimage/restoration/_denoise.py:226: FutureWarning: Using a non-tuple sequence for multidimensional indexing is deprecated; use `arr[tuple(seq)]` instead of `arr[seq]`. In the future this will be interpreted as an array index, `arr[np.array(seq)]`, which will result either in an error or a different result.
  g[slices_g] = np.diff(out, axis=ax)
/Users/Justin/anaconda3/lib/python3.7/site-packages/skimage/restoration/_denoise.py:212: FutureWarning: Using a non-tuple sequence for multidimensional indexing is deprecated; use `arr[tuple(seq)]` instead of `arr[seq]`. In the future this will be interpreted as an array index, `arr[np.array(seq)]`, which will result either in an error or a different result.
  d[slices_d] += p[slices_p]

Median filtering is another effective technique to eliminate salt-and-pepper noise. It is worth demonstrating this filter because it introduces the concept of a structuring element. The concept is best illustrated through example. In the case of median filtering, the value of the center pixel of some neighborhood of pixels of a given shape is replaced by the median of the pixel values around it. The neighborhood is defined by the structuring element. It could be a disk, a square, a cross, or anything else you might design. For example, let's look at a disc structuring element or radius 10.

In [12]:
selem = skimage.morphology.disk(10)
gray_color_mapper = bebi103.viz.mpl_cmap_to_color_mapper('gray')
bokeh.io.show(bebi103.viz.imshow(selem, color_mapper=gray_color_mapper, plot_height=200))

So, in a median filter using this disk as a structuring element works by passing this disk over the image. The pixel value at the center of the disk is replaced by the median of all values within the disk.

To illustrate median filtering, we will damage our image by introducing salt and pepper noise. Note: skimage's median filter only works on uint8 or uint16 images.

In [13]:
# Introduce salt and pepper noise
np.random.seed(42)
noise = np.random.random(im.shape)
im_snp = np.copy(im)
im_snp[noise > 0.96] = im.max()
im_snp[noise < 0.04] = im.min()

# Use a median filter in a small square structuring element for median filter
selem = skimage.morphology.square(3)
im_snp_filt = skimage.filters.rank.median(im_snp, selem)

# Show results
bokeh.io.show(show_two_ims(im_snp,
                           im_snp_filt,
                           titles=["Salt-n-Pepa's here", 'Ah, push it']))

Countless other filters exist, but we will not go into them here. The available filters are described in the skimage documentation.

Now, we will move on to thresholding.

Thresholding

Thresholding is the process by which we convert an image into a binary image containing ones and zeros. This is often an important part of segmentation. The idea is that we want to label as True regions that are bacteria and False regions that are background (or vice versa). Looking at the histogram again may help us understand this process.

In [14]:
# Display histogram with log scale
bokeh.io.show(plot_hist(hist_bins[0], 'phase', 'log'))

We see that the background has fallen off at pixel values of about 900. So, we could naively choose all pixels greater than 900 to be background and those below to be bacteria. Let's try that.

In [15]:
# Threshold image
thesh = 900
im_bw = im < 900

# Take a look
bokeh.io.show(show_two_ims(im_float,
                           im_bw,
                           titles=['original', 'thresholded']))

It looks like we didn't do that badly. Let's zoom in to check.

In [16]:
bokeh.io.show(show_two_ims(im_float[zoom],
                           im_bw[zoom],
                           titles=['original', 'thresholded']))

We see that some pixels that are bacteria are deleted. We would like to do better. There are countless ways to do thresholding, and any method that gives a binary image at the end could work. For example, we could only keep pixels that are below a local mean pixel value. To do this, we use a morphological filter to convert each pixel to the mean of a local neighborhood (with the neighborhood again being defined by a structuring element), and then compare pixel values to that mean. This has the effect of locally thresholding the image, resulting in insensitivity to background lighting. We will take the neighborhood to be bigger than a typical bacterium.

In [17]:
# Make the structuring element 25 pixel radius disk
selem = skimage.morphology.disk(25)

# Do the mean filter
im_mean = skimage.filters.rank.mean(im, selem)

# Threshhold based on mean filter
im_bw = im < 0.85 * im_mean

# Show the result
full = show_two_ims(im_float, im_bw, titles=['original', 'thresholded'])
zoomed = show_two_ims(im_float[zoom], im_bw[zoom])
bokeh.io.show(bokeh.layouts.column([full, zoomed]))

This looks much better. We thresholded based on pixel values being below 85% of the local mean. We would like to have an automated way to choose this value. We could adopt a strategy where we look at how many pixels are added to the thresholded image as a function of this parameter (which we'll call k). We choose the cutoff to be where we start adding the most pixels, which would mean we're starting to add background. We can do this by computing the second derivative of the number of pixels versus k curve. First, let's look at that curve.

In [18]:
# Compute number of pixels in binary image as a function of k
k = np.linspace(0.5, 1.0, 100)
n_pix = np.empty_like(k)
for i in range(len(k)):
    n_pix[i] = (im < k[i] * im_mean).sum()

# Plot the result
p = bokeh.plotting.figure(plot_height=300,
                          plot_width=500,
                          x_axis_label='k',
                          y_axis_label='number of pixels')
p.line(k, n_pix, line_width=2)
bokeh.io.show(p)

We'll now compute the second derivative and select a value of $k$ for which it is maximal.

In [19]:
# Compute rough second derivative
dn_pix_dk2 = np.diff(np.diff(n_pix))

# Find index of maximal second derivative
max_ind = np.argmax(dn_pix_dk2)

# Use this index to set k
k_opt = k[max_ind-2]

# Re-threshold with this k
im_bw = im < k_opt * im_mean

# Report the result
print('Optimal k = ', k_opt)

# Show images
full = show_two_ims(im_float, im_bw, titles=['original', 'thresholded'])
zoomed = show_two_ims(im_float[zoom], im_bw[zoom])
bokeh.io.show(bokeh.layouts.column([full, zoomed]))
Optimal k =  0.9545454545454546

We have lots of little regions laying around from the background, but these are easily removed using skimage.morphology.remove_small_objects. Since a bacterium is at least 20 $\times$ 5 pixels (1.3$\times$0.3 µm) in size, we delete any objects smaller than 100 total pixels.

In [20]:
# Remove all the small objects
im_bw = skimage.morphology.remove_small_objects(im_bw, min_size=100)

# Show images
full = show_two_ims(im_float, im_bw, titles=['original', 'thresholded'])
zoomed = show_two_ims(im_float[zoom], im_bw[zoom])
bokeh.io.show(bokeh.layouts.column([full, zoomed]))

This is a pretty nice result. We could code up our own thresholding function to do this. Note that this function is not in the bebi103 module. This is something we came up with while exploring our images, and might be useful to put into an image processing pipeline for this kind of image.

In [21]:
def bebi103_thresh(im, selem, white_true=True, k_range=(0.5, 1.5), 
                   min_size=100):
    """
    Threshold image as described above.  Morphological mean filter is 
    applied using selem.
    """    
    # Determine comparison operator
    if white_true:
        compare = np.greater
        sign = -1
    else:
        compare = np.less
        sign = 1
    
    # Do the mean filter
    im_mean = skimage.filters.rank.mean(im, selem)

    # Compute number of pixels in binary image as a function of k
    k = np.linspace(k_range[0], k_range[1], 100)
    n_pix = np.empty_like(k)
    for i in range(len(k)):
        n_pix[i] = compare(im, k[i] * im_mean).sum() 

    # Compute rough second derivative
    dn_pix_dk2 = np.diff(np.diff(n_pix))

    # Find index of maximal second derivative
    max_ind = np.argmax(sign * dn_pix_dk2)

    # Use this index to set k
    k_opt = k[max_ind - sign * 2]

    # Threshold with this k
    im_bw = compare(im, k_opt * im_mean)

    # Remove all the small objects
    im_bw = skimage.morphology.remove_small_objects(im_bw, min_size=min_size)

    return im_bw, k_opt

Let's take our function for a spin!

In [22]:
im_thresh, k = bebi103_thresh(im, skimage.morphology.disk(25), white_true=False,
                              min_size=100)

bokeh.io.show(bebi103.viz.imshow(im_thresh, color_mapper=gray_color_mapper))

The purpose of this little exercise in thresholding is not to push the thresholding method we developed as the way to threshold. It most certainly is not. I hoped to show that there are straightforward ways we can go about thresholding an image, which is often the first step toward segmentation.

A first step in morphological processing: opening

Let's look at some of our bacteria in our favorite zoomed region again.

In [23]:
bokeh.io.show(show_two_ims(im_float[zoom], im_bw[zoom]))

We see that bacteria that we would like to be separate are in fact joined together. We would like to separate them. Two common morphological operations that might help us are erosion and dilation. The former whittles away the edges of white regions, while the latter expands them. For erosion, as the structuring element is passed over the image, if all white pixels lie within the structuring element, the central pixel stays white. Otherwise, it becomes black. For dilation, if any of the white pixels lie within the structuring element, the central pixel becomes white. Let's try eroding our image.

In [24]:
# Structuring element is radius 2 disk
selem = skimage.morphology.disk(2)

# Erode binary image
im_bw_eroded = skimage.morphology.erosion(im_bw, selem)

# Look at the result
bokeh.io.show(show_two_ims(im_bw[zoom],
                           im_bw_eroded[zoom],
                           titles=['original bw', 'eroded']))

We managed to disconnect one of the bacteria. We should put the pixels we lost back in, though, using dilation.

In [25]:
# Dilate eroded image
im_bw_redilated = skimage.morphology.dilation(im_bw_eroded, selem)

# Look at the result
bokeh.io.show(show_two_ims(im_bw[zoom],
                           im_bw_redilated[zoom],
                           titles=['original bw', 'redilated']))

This operation, of eroding and then dilating, is called opening. The opposite is closing. Opening tends to disconnect blobs and closing connects them. Here, we only managed to disconnect one of the bacteria. We can directly open a binary image using skimage.morphology.binary_opening.

In [26]:
# Structuring element is radius 2 disk
selem = skimage.morphology.disk(2)

# Erode binary image
im_bw_opened = skimage.morphology.binary_opening(im_bw, selem)

# Show result
bokeh.io.show(show_two_ims(im_bw[zoom],
                           im_bw_opened[zoom],
                           titles=['original bw', 'opened']))

Separating the other bacteria is difficult and requires more sophisticated methods. It is better to have better images to use for segmentation, which is why the RFP channel is so useful. We will use that in the next tutorial as we continue with segmentation.

Computing environment

In [27]:
%load_ext watermark
In [28]:
%watermark -v -p numpy,skimage,bokeh,bebi103,jupyterlab
CPython 3.7.0
IPython 7.0.1

numpy 1.15.2
skimage 0.14.0
bokeh 0.13.0
bebi103 0.0.30
jupyterlab 0.35.0