Basic image filtering and thresholding¶
[1]:
import numpy as np
import skimage
import skimage.io
import skimage.filters
import skimage.morphology
import bebi103
import colorcet
import bokeh
bokeh.io.output_notebook()
import holoviews as hv
hv.extension('bokeh')
bebi103.hv.set_defaults()
The data set¶
We will look a a snapshot of some Bacillus subtilis cells taken by Jin Park in the Elowitz lab. These data can be downloaded here and were used in this publication.
There are three images. 1. A phase contrast image for reference. 2. An RFP image for used segmentation. 3. A YFP image that is also used for quantification.
Note that the phase and YFP channels have 2$:nbsphinx-math:times\(2 **binning**. This means that each pixel in the image comes from combining the values of a 2\):nbsphinx-math:`times`$2 array of pixels. This is often done when signal is low to boost the signal to noise ratio.
The interpixel distance is 0.065 µm for the RFP image, which means that it is 0.13 µm for the pnase and YFP images.
Let’s have a look¶
Let’s take a look at the images. We will use skimage.io.imread
to load the images.
[2]:
im_names = ["phase", "rfp", "yfp"]
interpixel_distances = [0.13, 0.065, 0.13]
im_list = [f"../data/bacillus_images/{name}.tif" for name in im_names]
# Load images and store in list [phase, RFP, CFP, YFP]
ims = [skimage.io.imread(im_name) for im_name in im_list]
# Display images
plots = [
bebi103.image.imshow(
im, frame_height=200, title=name, interpixel_distance=ip, length_units="µm"
)
for name, ip, im in zip(im_names, interpixel_distances, ims)
]
# Link axes for zooming
for i in [1, 2]:
plots[i].x_range = plots[0].x_range
plots[i].y_range = plots[0].y_range
bokeh.io.show(bokeh.layouts.gridplot(plots, ncols=2))
We notice that the binning reduces the dimensions of the phase, CFP, and YFP channels. We have just “plotted” the images, but let’s get some more information. First, skimage.io
automatically detected data types. Let’s see what it detected.
[3]:
# Print the data types of the images
for name, im in zip(im_names, ims):
print(name, ':', im.dtype)
phase : uint16
rfp : uint16
yfp : uint16
They are all 16-bit images. We can learn more about the pixel values by looking at their histograms. We we talk about the histogram of an image, we are referring to the counts of the pixel values in the image. To compute the histograms, we use the skimage.exposure.histogram()
function.
[4]:
def plot_hist(im, title, logy=False):
"""Make plot of image histogram."""
counts, vals = skimage.exposure.histogram(im)
if logy:
inds = counts > 0
log_counts = np.log(counts[inds])
return hv.Spikes(
data=(vals[inds], log_counts),
kdims=['pixel values'],
vdims=['log₁₀ count'],
label=title,
).opts(
frame_height=100,
)
return hv.Spikes(
data=(vals, counts),
kdims=['pixel values'],
vdims=['count'],
label=title,
).opts(
frame_height=100,
)
# Display histograms
plots = [plot_hist(im, name) for name, im in zip(im_names, ims)]
hv.Layout(
plots
).opts(
shared_axes=False,
).cols(
1
)
[4]:
We see that the fluorescence channels have a sharp peak at low intensity values. This is the black background. Conversely, the phase image has a big peak at a higher intensity, which is the white background.
It is often easier to look at the histograms on a log scale.
[5]:
plots = [plot_hist(im, name, logy=True) for name, im in zip(im_names, ims)]
hv.Layout(
plots
).opts(
shared_axes=False,
).cols(
1
)
[5]:
It is now easier to see the features (bacteria) in the histograms. For the fluorescence channels, these are the higher intensities. For the phase image, they are the lower intensities. The histograms are just another way of “plotting” an image, and can provide insights. In this case, we see that the images use nowhere near the full range of pixel values for 16-bit images (0 to 65535); these are probably 12 bit images.
For the next portion of this tutorial, we will focus on the phase image to learn some basic image processing techniques.
Cleaning images¶
The fluorescence images are very “clean.” They have great signal to noise, and the boundaries between bacteria and background are clear. The phase image is a bit dirtier. We will use the phase image to illustrate some basic techniques in cleaning up images and to introduce some of the basic concepts in morphological image processing.
Let’s start by taking a closer look at it.
[6]:
# Rename ims[0] to im for convenience
im = np.copy(ims[0])
# Display phase image
bokeh.io.show(bebi103.image.imshow(im, interpixel_distance=0.13, length_units='µm'))
Background subtraction¶
We see some unevenness in illumination, going from darker in the lower right to brigheter in the upper left. We might correct this by computing the background and doing a background subtraction. We use a strong Gaussian smoothing and subtract it.
When you apply a Gaussian filter in image processing, it is often called a Gaussian blur.
The Gaussian filter is implemented in
skimage.filters.gaussian()
. Its first argument is the image to filter and the second issigma
, which is the standard deviation of the Gaussian filter. As is always the case in using built-in filters and image processing tools, everything is in units of pixels.When applying a Gaussian filter, or any continuous transform to an image, the integer nature of the pixel values is typically lost.
skimage
will automatically convert the images to floats upon filtering. This can be dangerous, though. It assumes that because the image is 16-bit that the maximal pixel value allowed is 65535. It then makes a linear mapping of pixel values where 0 maps to 0 and 65535 maps to unity. I prefer to have more control over data types and convert my images ahead of time. To be safe, because I will be comparing many different snapshots, I will allow the maximal pixel value to be 65535, which means that all of our floating point pixel values will be close to each other, sacrificing about two digits worth of numerical precision. This is not a big deal, and it’s worth it not to risk improperly scaling images. A word of warning, though: some of the image processing routines are designed to only work on integer pixel values, so you should always be careful about data types when doing image processing operations! Read the docs!
We will try \(\sigma = 50\) pixels to give a strong blur.
[7]:
# Convert image to float
im_float = skimage.img_as_float(im)
# Filter the image with a strong Gaussian blur
im_bg = skimage.filters.gaussian(im_float, 50.0)
def show_two_ims(
im_1,
im_2,
titles=[None, None],
interpixel_distances=[0.13, 0.13],
cmap=None,
):
"""Convenient function for showing two images side by side."""
p_1 = bebi103.image.imshow(
im_1,
frame_height=200,
title=titles[0],
cmap=cmap,
interpixel_distance=interpixel_distances[0],
length_units="µm",
)
p_2 = bebi103.image.imshow(
im_2,
frame_height=200,
title=titles[1],
cmap=cmap,
interpixel_distance=interpixel_distances[1],
length_units="µm",
)
p_2.x_range = p_1.x_range
p_2.y_range = p_1.y_range
return bokeh.layouts.gridplot([p_1, p_2], ncols=2)
bokeh.io.show(show_two_ims(im_float, im_bg, titles=["original", "background"]))
We see that the blurred image roughly gives the background luminescence. We can subtract this from the original image and view it.
[8]:
# Subtract background
im_no_bg = im_float - im_bg
# Show images
bokeh.io.show(show_two_ims(im_float, im_no_bg, titles=['original', 'bg subtracted']))
Now the background is more uniform. This is a simple way of doing background subtraction. You can read about another technique using morphological reconstruction here.
Denoising images¶
If we zoom in on the image, we can see that there is some noise, particularly in the background. We will use the same zoomed portion of the image over and over, so we will make a NumPy slice
option to make it easier to select.
[9]:
# Make slice object
zoom = np.s_[75:175, 175:275]
# Display zoom phase image
bokeh.io.show(
bebi103.image.imshow(im[zoom], interpixel_distance=0.13, length_units="µm")
)
A simple way to reduce the noise is to apply a Gaussian filter, this time, of course, with a smaller \(\sigma\). We will try \(\sigma = 1\) pixel.
[10]:
# Filter image
im_filt_gauss = skimage.filters.gaussian(im_float, 1)
# Show filtered image
bokeh.io.show(
show_two_ims(im_float[zoom], im_filt_gauss[zoom], titles=["original", "filtered"])
)
Next, we’ll try a total variation filter. The idea is that areas with large gradients are smoothed out, as this implies rapid fluctuation of pixel intensity values (noise). We will use a Chambolle total variation filter. This is implemented in skimage.restoration.denoise_tv_chambolle()
. The first argument is the image to filter. The second argument describes the degree of smoothing.
[11]:
# Filter image
im_filt_tv = skimage.restoration.denoise_tv_chambolle(im_float, 0.001)
# Show filtered image
bokeh.io.show(
show_two_ims(im_float[zoom], im_filt_tv[zoom], titles=["original", "tv filtered"])
)
Median filtering is another effective technique to eliminate salt-and-pepper noise. It is worth demonstrating this filter because it introduces the concept of a structuring element. The concept is best illustrated through example. In the case of median filtering, the value of the center pixel of some neighborhood of pixels of a given shape is replaced by the median of the pixel values around it. The neighborhood is defined by the structuring element. It could be a disk, a square, a cross, or anything else you might design. For example, let’s look at a disc structuring element or radius 10.
[12]:
selem = skimage.morphology.disk(10)
bokeh.io.show(bebi103.image.imshow(selem, cmap=colorcet.gray, frame_height=200))
So, in a median filter using this disk as a structuring element works by passing this disk over the image. The pixel value at the center of the disk is replaced by the median of all values within the disk.
To illustrate median filtering, we will damage our image by introducing salt and pepper noise. Note: ``skimage``’s median filter only works on ``uint8`` or ``uint16`` images.
[13]:
# Introduce salt and pepper noise
np.random.seed(42)
noise = np.random.random(im.shape)
im_snp = np.copy(im)
im_snp[noise > 0.96] = im.max()
im_snp[noise < 0.04] = im.min()
# Use a median filter in a small square structuring element for median filter
selem = skimage.morphology.square(3)
im_snp_filt = skimage.filters.rank.median(im_snp, selem)
# Show results
bokeh.io.show(
show_two_ims(im_snp, im_snp_filt, titles=["Salt-n-Pepa's here", "Ah, push it"])
)
/Users/Justin/anaconda3/lib/python3.7/site-packages/skimage/filters/rank/generic.py:104: UserWarning: Bad rank filter performance is expected due to a large number of bins (1961), equivalent to an approximate bitdepth of 10.9.
"bitdepth of {:.1f}.".format(n_bins, np.log2(n_bins)))
Countless other filters exist, but we will not go into them here. The available filters are described in the `skimage
documentation <https://scikit-image.org/docs/dev/api/skimage.filters.html>`__.
Now, we will move on to thresholding.
Thresholding¶
Thresholding is the process by which we convert an image into a binary image containing ones and zeros. This is often an important part of segmentation. The idea is that we want to label as True
regions that are bacteria and False
regions that are background (or vice versa). Looking at the histogram again may help us understand this process.
[14]:
# Display histogram with log scale
plot_hist(ims[0], 'phase', logy=True)
[14]:
We see that the background has fallen off at pixel values of about 900. So, we could naively choose all pixels greater than 900 to be background and those below to be bacteria. Let’s try that.
[15]:
# Threshold image
thesh = 900
im_bw = im < 900
# Take a look
bokeh.io.show(show_two_ims(im_float, im_bw, titles=["original", "thresholded"]))
It looks like we didn’t do that badly. Let’s zoom in to check.
[16]:
bokeh.io.show(
show_two_ims(im_float[zoom], im_bw[zoom], titles=["original", "thresholded"])
)
We see that some pixels that are bacteria are deleted. We would like to do better. There are countless ways to do thresholding, and any method that gives a binary image at the end could work. For example, we could only keep pixels that are below a local mean pixel value. To do this, we use a morphological filter to convert each pixel to the mean of a local neighborhood (with the neighborhood again being defined by a structuring element), and then compare pixel values to that mean. This has the effect of locally thresholding the image, resulting in insensitivity to background lighting. We will take the neighborhood to be bigger than a typical bacterium.
[17]:
# Make the structuring element 25 pixel radius disk
selem = skimage.morphology.disk(25)
# Do the mean filter
im_mean = skimage.filters.rank.mean(im, selem)
# Threshhold based on mean filter
im_bw = im < 0.85 * im_mean
# Show the result
full = show_two_ims(im_float, im_bw, titles=['original', 'thresholded'])
zoomed = show_two_ims(im_float[zoom], im_bw[zoom])
bokeh.io.show(bokeh.layouts.column([full, zoomed]))
/Users/Justin/anaconda3/lib/python3.7/site-packages/skimage/filters/rank/generic.py:104: UserWarning: Bad rank filter performance is expected due to a large number of bins (1961), equivalent to an approximate bitdepth of 10.9.
"bitdepth of {:.1f}.".format(n_bins, np.log2(n_bins)))
This looks much better. We thresholded based on pixel values being below 85% of the local mean. We would like to have an automated way to choose this value. We could adopt a strategy where we look at how many pixels are added to the thresholded image as a function of this parameter (which we’ll call k). We choose the cutoff to be where we start adding the most pixels, which would mean we’re starting to add background. We can do this by computing the second derivative of the number of pixels versus k curve. First, let’s look at that curve.
[18]:
# Compute number of pixels in binary image as a function of k
k = np.linspace(0.5, 1.0, 100)
n_pix = np.empty_like(k)
for i in range(len(k)):
n_pix[i] = (im < k[i] * im_mean).sum()
# Plot the result
p = bokeh.plotting.figure(
plot_height=300, plot_width=500, x_axis_label="k", y_axis_label="number of pixels"
)
p.line(k, n_pix, line_width=2)
bokeh.io.show(p)
We’ll now compute the second derivative and select a value of \(k\) for which it is maximal.
[19]:
# Compute rough second derivative
dn_pix_dk2 = np.diff(np.diff(n_pix))
# Find index of maximal second derivative
max_ind = np.argmax(dn_pix_dk2)
# Use this index to set k
k_opt = k[max_ind-2]
# Re-threshold with this k
im_bw = im < k_opt * im_mean
# Report the result
print('Optimal k = ', k_opt)
# Show images
full = show_two_ims(im_float, im_bw, titles=['original', 'thresholded'])
zoomed = show_two_ims(im_float[zoom], im_bw[zoom])
bokeh.io.show(bokeh.layouts.column([full, zoomed]))
Optimal k = 0.9545454545454546
We have lots of little regions laying around from the background, but these are easily removed using skimage.morphology.remove_small_objects
. Since a bacterium is at least 20 \(\times\) 5 pixels (1.3$:nbsphinx-math:`times`$0.3 µm) in size, we delete any objects smaller than 100 total pixels.
[20]:
# Remove all the small objects
im_bw = skimage.morphology.remove_small_objects(im_bw, min_size=100)
# Show images
full = show_two_ims(im_float, im_bw, titles=['original', 'thresholded'])
zoomed = show_two_ims(im_float[zoom], im_bw[zoom])
bokeh.io.show(bokeh.layouts.column([full, zoomed]))
This is a pretty nice result. We could code up our own thresholding function to do this. Note that this function is not in the bebi103
module, nor is it a canonical method of thresholding. This is something we came up with while exploring our images; it is often the case that bespoke image processing methods are useful for specific images.
[21]:
def bebi103_thresh(im, selem, white_true=True, k_range=(0.5, 1.5), min_size=100):
"""
Threshold image as described above. Morphological mean filter is
applied using selem.
"""
# Determine comparison operator
if white_true:
compare = np.greater
sign = -1
else:
compare = np.less
sign = 1
# Do the mean filter
im_mean = skimage.filters.rank.mean(im, selem)
# Compute number of pixels in binary image as a function of k
k = np.linspace(k_range[0], k_range[1], 100)
n_pix = np.empty_like(k)
for i in range(len(k)):
n_pix[i] = compare(im, k[i] * im_mean).sum()
# Compute rough second derivative
dn_pix_dk2 = np.diff(np.diff(n_pix))
# Find index of maximal second derivative
max_ind = np.argmax(sign * dn_pix_dk2)
# Use this index to set k
k_opt = k[max_ind - sign * 2]
# Threshold with this k
im_bw = compare(im, k_opt * im_mean)
# Remove all the small objects
im_bw = skimage.morphology.remove_small_objects(im_bw, min_size=min_size)
return im_bw, k_opt
Let’s take our function for a spin!
[22]:
im_thresh, k = bebi103_thresh(
im, skimage.morphology.disk(25), white_true=False, min_size=100
)
bokeh.io.show(bebi103.image.imshow(im_thresh, cmap=colorcet.gray))
The purpose of this little exercise in thresholding is not to push the thresholding method we developed as the way to threshold. It most certainly is not. I hoped to show that there are straightforward ways we can go about thresholding an image, which is often the first step toward segmentation.
Note that there are several automated thresholding techniques available in Scikit-image. I find that Otsu’s method is a good multipurpose global thresholding method.
A first step in morphological processing: opening¶
Let’s look at some of our bacteria in our favorite zoomed region again.
[23]:
bokeh.io.show(show_two_ims(im_float[zoom], im_bw[zoom]))
We see that several differennt are in fact joined together. We would like to separate them. Two common morphological operations that might help us are erosion and dilation. The former whittles away the edges of white regions, while the latter expands them For erosion, as the structuring element is passed over the image, if all white pixels lie within the structuring element, the central pixel stays white. Otherwise, it becomes black. For dilation, if any of the white pixels lie within the structuring element, the central pixel becomes white. Let’s try eroding our image.
[24]:
# Structuring element is radius 2 disk
selem = skimage.morphology.disk(2)
# Erode binary image
im_bw_eroded = skimage.morphology.erosion(im_bw, selem)
# Look at the result
bokeh.io.show(
show_two_ims(im_bw[zoom], im_bw_eroded[zoom], titles=["original bw", "eroded"])
)
We managed to disconnect one of the bacteria. We should put the pixels we lost back in, though, using dilation.
[25]:
# Dilate eroded image
im_bw_redilated = skimage.morphology.dilation(im_bw_eroded, selem)
# Look at the result
bokeh.io.show(
show_two_ims(
im_bw[zoom], im_bw_redilated[zoom], titles=["original bw", "redilated"]
)
)
This operation, of eroding and then dilating, is called opening. The opposite is closing. Opening tends to disconnect blobs and closing connects them. Here, we only managed to disconnect one of the bacteria. We can directly open a binary image using skimage.morphology.binary_opening
.
[26]:
# Structuring element is radius 2 disk
selem = skimage.morphology.disk(2)
# Erode binary image
im_bw_opened = skimage.morphology.binary_opening(im_bw, selem)
# Show result
bokeh.io.show(
show_two_ims(im_bw[zoom], im_bw_opened[zoom], titles=["original bw", "opened"])
)
Separating the other bacteria is difficult and requires more sophisticated methods. It is better to have better images to use for segmentation, which is why the RFP channel is so useful. We will use that in the next tutorial as we continue with segmentation.
Computing environment¶
[27]:
%load_ext watermark
[28]:
%watermark -v -p numpy,skimage,bokeh,holoviews,colorcet,bebi103,jupyterlab
CPython 3.7.5
IPython 7.1.1
numpy 1.17.3
skimage 0.15.0
bokeh 1.4.0
holoviews 1.12.6
colorcet 1.0.0
bebi103 0.0.44
jupyterlab 1.1.4