Segmentation

Dataset download


[1]:
import numpy as np
import pandas as pd

import skimage.feature
import skimage.filters
import skimage.filters.rank
import skimage.io
import skimage.morphology
import skimage.segmentation

import scipy.ndimage

import colorcet

import bebi103

import bokeh_catplot

import bokeh
bokeh.io.output_notebook()
Loading BokehJS ...

In this part of the lesson, we will continue to use the Bacillus images, but use the more convenient RFP channel to segment our images.

The function we wrote in the last tutorial to display images next to each other is useful.

[2]:
def show_two_ims(
    im_1,
    im_2,
    titles=[None, None],
    interpixel_distances=[0.13, 0.13],
    cmap=None,
):
    """Convenient function for showing two images side by side."""
    p_1 = bebi103.image.imshow(
        im_1,
        frame_height=200,
        title=titles[0],
        cmap=cmap,
        interpixel_distance=interpixel_distances[0],
        length_units="µm",
    )
    p_2 = bebi103.image.imshow(
        im_2,
        frame_height=200,
        title=titles[1],
        cmap=cmap,
        interpixel_distance=interpixel_distances[0],
        length_units="µm",
    )
    p_2.x_range = p_1.x_range
    p_2.y_range = p_1.y_range

    return bokeh.layouts.gridplot([p_1, p_2], ncols=2)

Let’s look at the RFP channel more closely, including a convenient region of the image to consider.

[3]:
# Define directory containing files
im_file = "../data/bacillus_images/rfp.tif"

# So we have it, the interpixel distance
ip = 0.065  # microns

# Get image file
im = skimage.io.imread(im_file)

# Get convenient slice
zoom = np.s_[100:300, 850:1050]

# Display the image, including another nice zoom region
p_1 = bebi103.image.imshow(
    im, frame_height=200, interpixel_distance=ip, length_units="µm"
)
p_2 = bebi103.image.imshow(
    im[zoom], frame_height=200, interpixel_distance=ip, length_units="µm"
)
bokeh.io.show(bokeh.layouts.gridplot([p_1, p_2], ncols=2))

We see cleaner separation between cells in fluorescence than in phase. There is also substantially less noise than in the phase image. Let’s try thresholding using the method we devised last time.

Thresholding the RFP image

We will reuse our thresholding function from the previous part of the lesson.

[4]:
def bebi103_thresh(im, selem, white_true=True, k_range=(0.5, 1.5), min_size=100):
    """
    Threshold an images based on changes in number of pixels
    included in binary image. Morphological mean filter is
    applied using selem.
    """
    # Determine comparison operator
    if white_true:
        compare = np.greater
        sign = -1
    else:
        compare = np.less
        sign = 1

    # Do the mean filter
    im_mean = skimage.filters.rank.mean(im, selem)

    # Compute number of pixels in binary image as a function of k
    k = np.linspace(k_range[0], k_range[1], 100)
    n_pix = np.empty_like(k)
    for i in range(len(k)):
        n_pix[i] = compare(im, k[i] * im_mean).sum()

    # Compute rough second derivative
    dn_pix_dk2 = np.diff(np.diff(n_pix))

    # Find index of maximal second derivative
    max_ind = np.argmax(sign * dn_pix_dk2)

    # Use this index to set k
    k_opt = k[max_ind - sign * 2]

    # Threshold with this k
    im_bw = compare(im, k_opt * im_mean)

    # Remove all the small objects
    im_bw = skimage.morphology.remove_small_objects(im_bw, min_size=min_size)

    return im_bw, k_opt


# Make the structuring element 50 pixel radius disk
selem = skimage.morphology.disk(50)

# Threshhold based on mean filter
im_bw, k = bebi103_thresh(im, selem, white_true=True, min_size=400)

# Display the image, including another nice zoom region
p_1 = bebi103.image.imshow(
    im_bw, frame_height=200, interpixel_distance=ip, length_units="µm"
)
p_2 = bebi103.image.imshow(
    im_bw[zoom], frame_height=200, interpixel_distance=ip, length_units="µm"
)
bokeh.io.show(bokeh.layouts.gridplot([p_1, p_2], ncols=2))
/Users/bois/anaconda3/lib/python3.7/site-packages/skimage/filters/rank/generic.py:104: UserWarning: Bad rank filter performance is expected due to a large number of bins (1896), equivalent to an approximate bitdepth of 10.9.
  "bitdepth of {:.1f}.".format(n_bins, np.log2(n_bins)))

That worked all right, but we do see some issues at the boundaries. We can clean this up using skimage.segmentation.clear_border().

[5]:
# Clear the border
im_bw = skimage.segmentation.clear_border(im_bw)

# Show image
bokeh.io.show(
    bebi103.image.imshow(
        im_bw, frame_height=300, interpixel_distance=ip, length_units="µm"
    )
)

This is fine and good, but there is the problem that bacteria that are next to each other are connected. We can sometimes separate these with opening, but we would like to have a better method.

Segmentation by edge detection

While we were effective at separating bacteria from background, we were not as good at separating bacteria from each other. Let’s look closely at the bacterial image again and see if we can come up with a better method for segmentation. This time, we will look at the images with a divergent color map that goes from blue, through white, to red. The white highlights intermediate values of pixels.

[6]:
# Display the image, including another nice zoom region
p_1 = bebi103.image.imshow(
    im,
    cmap=colorcet.coolwarm,
    frame_height=300,
    interpixel_distance=ip,
    length_units="µm",
)
p_2 = bebi103.image.imshow(
    im[zoom],
    cmap=colorcet.coolwarm,
    frame_height=300,
    interpixel_distance=ip,
    length_units="µm",
)
bokeh.io.show(bokeh.layouts.gridplot([p_1, p_2], ncols=2))

When we look at the image, we see that the edges of the bacteria are highlighted in white. The high fluorescence intensity inside them appears in red, and the low intensity background in blue. If we could somehow detect the edges of the bacteria, shown in white with this colormap, we may be able to do effective segmentation.

Sobel filtering

The pixel values go from red toward blue around the borders of the bacteria. Therefore, if we can find there the magnitude of the gradient of pixel values is high, we can get the outline of the bacteria. This procedure is called edge detection. A simple way to compute the gradient is to apply a Sobel filter, which approximates the gradient in pixel intensities at each point.

Going forward, we will use float images, since the edge detection algorithms do operations that result in floats. When we use the floats, we will maximally stretch the pixel values from 0 to 1 so that we have concrete numbers for comparing gradients. This means we make the float image by hand, not by using skimage.img_as_float().

Prior to attempting edge detection, we could denoise the images with a total variation filter or a Gaussian blur, but we will go forward without that step.

[7]:
# Convert image to float
im_float = (im.astype(float) - im.min()) / (im.max() - im.min())

# Apply the Sobel filter to find the gradient
im_grad = skimage.filters.sobel(im_float)

# Look at gradient image
bokeh.io.show(
    show_two_ims(im_float[zoom], im_grad[zoom], titles=["original", "sobel filtered"])
)

Canny edge detection

The Sobel filter did give us gradient information, but we have to figure out how to process it. Fortunately, there is an automatic way of doing that, called Canny edge detection. The Canny edge detector does a series of operations, including computing the Sobel filter, to then return a binary image what is True where there is an edge. (The steps of the Canny edge detector are well-documented in the scikit-image documentation.) Prior to computing gradients, the Canny edge detector does a Gaussian blur because the subsequent steps are susceptible to noise. We therefore need to specify the σ value for the Gaussian blur when calling skimage.feature.canny().

[8]:
# Do Canny edge detection on image, use sigma = 1.4
im_edge = skimage.feature.canny(im_float, 1.4)

# Show the result
bokeh.io.show(show_two_ims(im_float[zoom], im_edge[zoom], titles=["original", "edges"]))

Much better! The bacteria are well separated. In order to finish the segmentation, we just need to fill them in. There is a convenient function in scipy.ndimage for this.

[9]:
# Fill the holes
im_bw = scipy.ndimage.morphology.binary_fill_holes(im_edge)

# Look at result
bokeh.io.show(
    show_two_ims(im_float[zoom], im_bw[zoom], titles=["original", "segmented"])
)

Oh no! What happened? If there is even a one-pixel opening in the contour of the edge, it is not closed off to the background, so that bacterium will be missed. We can remedy this by doing a closing of the edge image.

[10]:
# Close the edge image
selem = skimage.morphology.disk(2)
im_edge_closed = skimage.morphology.binary_closing(im_edge, selem)

# Fill these holes
im_bw = scipy.ndimage.morphology.binary_fill_holes(im_edge_closed)

# Check out results
bokeh.io.show(
    show_two_ims(im_float[zoom], im_bw[zoom], titles=["original", "segmented"])
)

This is good, but we have the problem that we just joined cells back together again. So, we would like another way to do edge detection that guarantees closed curves.

Marr-Hildreth edge detection

Another common method of edge detection is the Marr-Hildreth edge detector, which uses the Laplacian of Gaussian (LoG) with zero crossing detection. This is described in section 10.2 of Gonzalez and Woods. The idea here is that an edge features a sudden peak or valley in the gradient (first derivative) of the pixel values of the image. A peak or a valley in the first derivative means there is a zero-crossing in the second derivative. Taking higher order derivatives of real data results in highly amplified noise. Therefore, the Marr-Hildreth method takes the Laplacian of a Gaussian blurred image, hence the name Laplacian of Gaussian.

skimage does not yet have a LoG filter, but scipy.ndimage does. We have to specify \(\sigma\) for the Gaussian blur. We will choose 2 pixels, and it is better to err on the large side. First, we will compute the LoG of our image and then view it with a colorbar to see what the values of the resulting pixels are.

[11]:
# Compute LoG
im_LoG = scipy.ndimage.filters.gaussian_laplace(im_float, 2.0)

# Check out results
bokeh.io.show(
    bebi103.image.imshow(
        im_LoG[zoom],
        cmap=colorcet.coolwarm,
        interpixel_distance=ip,
        length_units="µm",
        colorbar=True,
    )
)

Referring to the colorbar, we can see zero crossings at the edges of cells. To automatically detect them, we take a square of nine pixels. If the center pixel has a different sign than at least one of its neighbors, there is a zero-crossing. We can cleverly implement this by finding the maximum and minumum of each nine-pixel neighborhood and then comparing the LoG to those minima and maxima.

[12]:
# 3x3 square structuring element
selem = skimage.morphology.square(3)

# Do max filter and min filter
im_LoG_max = scipy.ndimage.filters.maximum_filter(im_LoG, footprint=selem)
im_LoG_min = scipy.ndimage.filters.minimum_filter(im_LoG, footprint=selem)

# Image of zero-crossings
im_edge = ((im_LoG >= 0) & (im_LoG_min < 0)) | ((im_LoG <= 0) & (im_LoG_max > 0))

# Show result
bokeh.io.show(show_two_ims(im_float[zoom], im_edge[zoom], titles=["original", "edges"]))

Yeesh! There are lots of zero crossings in the background as well. We can ensure that we do not get these by setting a threshold value on what the gradient must be at the zero crossings. We can write a function to implement this.

[13]:
def zero_crossing_filter(im, thresh):
    """
    Returns image with 1 if there is a zero crossing and 0 otherwise.

    thresh is the the minimal value of the gradient, as computed by Sobel
    filter, at crossing to count as a crossing.
    """
    # Square structuring element
    selem = skimage.morphology.square(3)

    # Do max filter and min filter
    im_max = scipy.ndimage.filters.maximum_filter(im, footprint=selem)
    im_min = scipy.ndimage.filters.minimum_filter(im, footprint=selem)

    # Compute gradients using Sobel filter
    im_grad = skimage.filters.sobel(im)

    # Return edges
    return ( (  ((im >= 0) & (im_min < 0))
              | ((im <= 0) & (im_max > 0)))
            & (im_grad >= thresh) )

We’ll now use this function to get our edges.

[14]:
# Find zero-crossings
im_edge = zero_crossing_filter(im_LoG, 0.001)

# Show result
bokeh.io.show(show_two_ims(im_float[zoom], im_edge[zoom], titles=["original", "edges"]))

Much better! We would like single-pixel edges, though. We can accomplish this by skeletonization. This technique whittles down thick lines into a single pixel without breaking them.

[15]:
# Skeletonize edges
im_edge = skimage.morphology.skeletonize(im_edge)

# See result
bokeh.io.show(show_two_ims(im_float[zoom], im_edge[zoom], titles=["original", "edges"]))

Finally, we can fill the holes and remove small objects to get our segmented image.

[16]:
# Fill holes
im_bw = scipy.ndimage.morphology.binary_fill_holes(im_edge)

# Remove small objectes that are not bacteria
im_bw = skimage.morphology.remove_small_objects(im_bw, min_size=100)

# Show result
bokeh.io.show(
    show_two_ims(im_float[zoom], im_bw[zoom], titles=["original", "segmented"])
)

We should actually take one more step, though. If we are going to get quantitative information from the images, we need to make sure we only consider complete bacteria, and not those cropped by the border. We therefore clear the border.

[17]:
# Clear border with large buffer size b/c LoG procedure came off border
im_bw = skimage.segmentation.clear_border(im_bw, buffer_size=5)

# Show result
bokeh.io.show(
    bebi103.image.imshow(
        im_bw,
        cmap=colorcet.gray,
        interpixel_distance=ip,
        length_units="µm",
    )
)

Labeling binary images

We have now separated out our bacteria. We would like to label all individual bacteria so we can keep track of who’s who. The skimage.measure.label() function conveniently does this. It returns an “image” where the pixel values are the number that labels the connected regions. The background is labeled with 0, and all other regions are labeled with integers starting at 1.

[18]:
# Label binary image; backward kwarg says value in im_bw to consider backgr.
im_labeled, n_labels = skimage.measure.label(im_bw, background=0, return_num=True)

# Show number of bacteria
print("Number of individual bacteria = ", n_labels - 1)

# See result (one of the few times it's ok to use rainbow colormap!)
bokeh.io.show(
    bebi103.image.imshow(
        im_labeled, cmap=colorcet.b_glasbey_hv, interpixel_distance=ip, length_units="µm"
    )
)
Number of individual bacteria =  91

Computing region properties

This labeled image can now be used to identify bacteria in the YFP channel. The only problem is that the RFP image we just used for segmentation is not binned, while the YFP image is. We can losslessly upsample the YFP image, making each four pixel block the same value as the original pixel. This does not add or lose any information. We can do this using the scipy.ndimage.zoom() function. Let’s do that for the YFP channe.

[19]:
# Load images and store in list [phase, RFP, CFP, YFP]
im_y = skimage.io.imread('../data/bacillus_images/yfp.tif')

# Upsample (2 means 2x as big, order=0 means no interpolation)
im_y = scipy.ndimage.zoom(im_y, 2, order=0)

Now that the images is resampled, we can compute properties about the YFP image using the labeled image. This is done very conveniently using the skimage.measure.regionprops() function. It computes things about the labeled image, such as area, perimeter, etc., for each labeled region. If we pass a kwarg intensity_image, it uses the labels in the labeled image and computes properties of the intensity image, in our case the YFP channel. The result is returned as a list of regionprops objects.

[20]:
# Get properties about the YFP channel
im_y_props = skimage.measure.regionprops(im_labeled, intensity_image=im_y)

Having a list of region props objects can be annoying since that are not terribly accessible. Let’s store the result in a tidy data frame.

[21]:
data = [[prop.label, prop.area, prop.mean_intensity] for prop in im_y_props]
df = pd.DataFrame(
    data=data, columns=["label", "area (sq ip distance)", "mean intensity (a.u.)"]
)
df["area (sq µm)"] = df["area (sq ip distance)"] * ip ** 2

# Take a look
df.head()
[21]:
label area (sq ip distance) mean intensity (a.u.) area (sq µm)
0 1 386 1161.800518 1.630850
1 2 595 1118.060504 2.513875
2 3 355 630.143662 1.499875
3 4 573 1838.113438 2.420925
4 5 551 1432.422868 2.327975

Finally, let’s plot an ECDF of the intensities, the product of our work!

[22]:
bokeh.io.show(bokeh_catplot.ecdf(data=df, val='mean intensity (a.u.)'))

Displaying segementation

We complete this tutorial by showing how we can display segmentation over an image. We demonstrate this with the phase image and our segmented RFP image. The strategy is similar to what we did the the ROIs. We convert the gray scale image to RGB and then color one channel with the segmented image.

[23]:
# Load phase image
im_p = skimage.io.imread("../data/bacillus_images/phase.tif")

# Convert phase image to float RGB
im_p_float = (im_p - im_p.min()) / (im_p.max() - im_p.min())

# Up sample it
im_p_float = scipy.ndimage.zoom(im_p_float, 2, order=0)

# Make the green channel 1 wherever we have a bacterium
im_p_g = np.copy(im_p_float)
im_p_g[im_bw] = 1

# Build RGB image
im_p_rgb = np.dstack((im_p_float, im_p_g, im_p_float))

# Show result
bokeh.io.show(
    bebi103.image.imshow(
        im_p_rgb, interpixel_distance=ip, length_units="µm", cmap="rgb"
    )
)

Computing environment

[24]:
%load_ext watermark

%watermark -v -p numpy,pandas,scipy,skimage,bokeh,colorcet,bokeh_catplot,bebi103,jupyterlab
CPython 3.7.4
IPython 7.8.0

numpy 1.17.2
pandas 0.24.2
scipy 1.3.1
skimage 0.15.0
bokeh 1.3.4
colorcet 1.0.0
bokeh_catplot 0.1.6
bebi103 0.0.43
jupyterlab 1.1.4