Building Dashboards¶

This recitation was written by Cecelia Andrews with help from Suzy Beeler and Justin Bois.

[1]:

import numpy as np
import pandas as pd

import scipy.stats as st

import bebi103

import holoviews as hv

import bokeh.io

import panel as pn

bokeh.io.output_notebook()
pn.extension('mathjax')

hv.extension('bokeh')
bebi103.hv.set_defaults()

Loading BokehJS ...

For this recitation you will first need to install Panel and its JupyterLab extension:

conda install -c pyviz panel
conda install -c conda-forge jupyterlab
jupyter labextension install @pyviz/jupyterlab_pyviz

Furthermore, because it is all about interactive plotting that requires a running Python engine, you really should download this notebook and run it on your machine.

We will be using the Parker lab’s ant tracking data from Homework 3.1. You can download the dataset here.

Dashboarding with Panel¶

Data dashboards allow you to display your data with interactive controls. The viewer can adjust the controls to change what data you plot, change scales, zoom in on the data, etc. Panel is an open-source library that helps you create interactive dashboards, and has lots of cool widgets! Please take a look at the Panel User Guide for more explanation on how to use Panel. Panel is currently in prerelease status, which means that it is available for public use but has an API that is expected to change with each new release without detailed notice. You can find some cool examples of apps demonstrating Panel features here and a list of possible widgets and panes here. Check them out to explore the possibilities of Panel!

We’ll start with a simple example of a dashboard, then discuss how we could construct a dashboard for the Parker lab ant tracking data.

A simple example¶

Let’s start by plotting the PDF of the normal distribution.

[2]:

def plot_gaussian_pdf(mu=0, sigma=1):
    x = np.linspace(-10, 10, 200)
    y = st.norm.pdf(x, loc=mu, scale=sigma)
    return hv.Curve(data=(x, y), kdims=["x"], vdims=["f(x ; µ, σ)"])


plot_gaussian_pdf()

[2]:

Looks good, but what if we want to examine how the PDF changes with µ and σ? We can use Panel to create interactive sliders using the FloatSlider widget.

[3]:

mu_slider = pn.widgets.FloatSlider(name="µ", start=-5, end=5, step=0.1, value=0)
sigma_slider = pn.widgets.FloatSlider(name="σ", start=0.1, end=5, step=0.1, value=1)


@pn.depends(mu_slider.param.value, sigma_slider.param.value)
def plot_gaussian_pdf(mu, sigma):
    x = np.linspace(-10, 10, 200)
    y = st.norm.pdf(x, loc=mu, scale=sigma)
    return hv.Curve(data=(x, y), kdims=["x"], vdims=["f(x ; µ, σ)"])


widgets = pn.Column(
    pn.Spacer(height=30), mu_slider, pn.Spacer(height=15), sigma_slider, width=300
)
pn.Row(plot_gaussian_pdf, pn.Spacer(width=15), widgets)

[3]:

The sliders for µ and \(\sigma\) help us visualize how the Gaussian PDF depends on each variable.

Let’s go through each component. First, we define our widgets, mu_slider and sigma_slider. When building more complicated dashboards, we can look at the Panel documentation to choose which widgets we want to use.

Next, we define our plotting function, plot_gaussian_pdf. Here we will use Holoviews for simplicity. Notice the @pn.depends function decorator. This links the input from the widget to the computation in the function, so every time we change the interactive widget, the output of the function updates.

Finally, we set the layout of our dashboard. We can define rows and columns through pn.Row and pn.Column respectively. We can set their heights and widths and add spaces through pn.Spacer. You may have to play around a bit to get it in the format that looks best to you.

Creating a dashboard for the ant tracking data¶

Our goal in this recitation is to practice building a data dashboard to help us visualize the ant tracking data from Julian Wagner in the Parker lab. To remind you of the experiment, here are the comments from the data file:

# This data set was kindly donated by Julian Wagner from Joe Parker's lab at
# Caltech. In the experiment, an ant and a beetle were placed in a circular
# arena and recorded with video at a frame rate of 28 frames per second.
# The positions of the body parts the ant are tracked throughout the video
# recording.
#
# The experiment aims to distinguish the ant behavior in the presence of
# a beetle from the genus Sceptobius, which secretes a chemical that modifies
# the behavior of the ant, versus in the presence of a beetle from the species
# Dalotia, which does not.
#
# The data set has the following columns.
#  frame : frame number from the video acquisition
#  beetle_treatment : Either dalotia or sceptobius
#  ID : The unique integer identifier of the ant in the experiment
#  bodypart : The body part being tracked in the experiment. Possible values
#             are head, thorax, abdomen, antenna_left, antenna_right.
#  x_coord : x-coordinate of the body part in units of pixels
#  y_coord : y-coordinate of the body part in units of pixels
#  likelihood : A rating, ranging from zero to one, given by the deep learning
#               algorithm that approximately quantifies confidence that the
#               body part was correctly identified.
#
# The interpixel distance for this experiment was 0.8 millimeters.

First, we need to load in the data and create columns for the x and y positions in cm and the time in seconds:

[4]:

# Load data without comments
df = pd.read_csv("../data/ant_joint_locations.zip", comment="#")

interpixel_distance = 0.08  # cm

# Create position columns in units of cm
df["x (cm)"] = df["x_coord"] * interpixel_distance
df["y (cm)"] = df["y_coord"] * interpixel_distance

# Create time column in units of seconds
df["time (sec)"] = df["frame"] / 28

df.head(10)

[4]:

	frame	beetle_treatment	bodypart	x_coord	y_coord	likelihood	x (cm)	y (cm)	time (sec)
0	0	dalotia	head	73.086	193.835	1.0	5.84688	15.50680	0.000000
1	1	dalotia	head	73.730	194.385	1.0	5.89840	15.55080	0.035714
2	2	dalotia	head	75.673	195.182	1.0	6.05384	15.61456	0.071429
3	3	dalotia	head	77.319	196.582	1.0	6.18552	15.72656	0.107143
4	4	dalotia	head	78.128	197.891	1.0	6.25024	15.83128	0.142857
5	5	dalotia	head	79.208	198.697	1.0	6.33664	15.89576	0.178571
6	6	dalotia	head	79.663	198.069	1.0	6.37304	15.84552	0.214286
7	7	dalotia	head	81.485	198.142	1.0	6.51880	15.85136	0.250000
8	8	dalotia	head	81.835	198.350	1.0	6.54680	15.86800	0.285714
9	9	dalotia	head	83.263	197.934	1.0	6.66104	15.83472	0.321429

This data frame should look familiar.

This data set is kind of large to have smoothly responsive interactions, so we will work with a smaller data set (even though we just learned how to use Datashader), since this is really for demonstration purposes. We will instead decrease our sampling rate by a factor of seven, from 28 frames per second to 4.

[5]:

df = df.loc[df['frame'] % 7 == 0, :].copy()

Discussion Question:¶

What kind of plot(s) should we make to visualize this data? What parameters might we want to change through interactive widgets? What kind of Panel widgets might we use?

Take a minute to think/discuss, then we will sketch out our dashboard on the whiteboard. When creating dashboards, it is always helpful to sketch your dashboard on paper before coding! This will help you think about what kind of plots you want, what the spacing should look like, etc.

Visualizing ant position over time¶

To visualize the ant’s position over time, we will make a plot using a Path element and use color to indicate time. Since we will do this over and over again, we’ll write a function to do this. Such a function, which would be common in a workflow, would be in the package you write to analyze these kinds of data.

[6]:

def extract_sub_df(df, ant_ID, bodypart, time_range):
    """Extract sub data frame for body part of
    one ant over a time range."""
    inds = (
        (df["ID"] == ant_ID)
        & (df["bodypart"] == bodypart)
        & (df["time (sec)"] >= time_range[0])
        & (df["time (sec)"] <= time_range[1])
    )

    return df.loc[inds, :]


def plot_traj(df, ant_ID, bodypart, time_range=(-np.inf, np.inf)):
    """Plot the trajectory of a single ant over time."""
    sub_df = extract_sub_df(df, ant_ID, bodypart, time_range)

    return hv.Path(
        data=sub_df,
        kdims=["x (cm)", "y (cm)"],
        vdims=["time (sec)"]
    ).opts(
        color="time (sec)",
        colorbar=True,
        colorbar_opts={"title": "time (sec)"},
        frame_height=200,
        frame_width=200,
        xlim=(0, 20),
        ylim=(0, 20)
    )

Let’s use this function to plot the trajectory of the thorax of ant 0, which was treated with a Dalotia beetle.

Building an interaction¶

Since these trajectories can be long, we may want to be able to select only a portion of the trajectory. To do this, we can use Panel to make a range slider widget to select the interval. Let’s first make the widget.

[7]:

# Create time interval range slider
time_interval_slider = pn.widgets.RangeSlider(
    start=df["time (sec)"].min(),
    end=df["time (sec)"].max(),
    step=1,
    value=(df["time (sec)"].min(), df["time (sec)"].max()),
    name="time (sec)",
)

# Take a look
time_interval_slider

[7]:

We now want to link this slider to the plot. To do that, we wrap our plotting function in a function that can be under control of the slider. Again, we will do this for ant 0’s thorax.

[8]:

ant_ID = 0
bodypart = 'thorax'

@pn.depends(time_range=time_interval_slider.param.value)
def plot_traj_interactive(time_range):
    return plot_traj(df, ant_ID, bodypart, time_range)

We added the pn.depends() decorator to specify that the time range is linked to the value of the time interval slider. We can now lay out our panel using the pn.Row() and pn.Column() classes.

[9]:

# Set dashboard layout
widgets = pn.Column(pn.Spacer(height=30), time_interval_slider, width=300)

pn.Row(plot_traj_interactive, widgets)

[9]:

(Notice that the slider moves in both renderings in the notebook. Once created, the widgets are all linked.)

Speed improvements¶

Whenever a widget is updated, a new plot is generated. This can be time consuming, especially with the Boolean indexing we need to do to extract the relevant data. If we want this to happen faster, we can instead only update the underlying data of the plot (instead of re-rendering it) when the time interval slider is adjusted. This is much trickier to implement. Here, we do it generating a Bokeh plot and digging into its column data source. Note that this speed boost would also allow us to work with the entire data set.

First, we write a function to generate our trajectory plot using base Bokeh. We use dots instead of a line to make coloring easier (there are some tricks to coloring lines with a quantitative parameter that HoloViews takes care of for us), and we also omit the colorbar to keep the code brief (even though it is already not brief; high level plotting really helps us out!).

[10]:

def plot_traj_bokeh(df, ant_ID, bodypart, time_range=(-np.inf, np.inf)):
    """Make a plot of an ant trajectory."""
    sub_df = extract_sub_df(df, ant_ID, bodypart, time_range)

    p = bokeh.plotting.figure(
        frame_height=200,
        frame_width=200,
        x_range=[0, 20],
        y_range=[0, 20],
        x_axis_label="x (cm)",
        y_axis_label="y (cm)",
    )

    # Set up data source; this is what gets changed in the callback
    source = bokeh.models.ColumnDataSource(
        dict(
            x=sub_df["x (cm)"].values,
            y=sub_df["y (cm)"].values,
            t=sub_df["time (sec)"].values,
        )
    )

    # Mapping of color for glyphs
    mapper = bokeh.transform.linear_cmap(
        field_name="t",
        palette=bokeh.palettes.Viridis256,
        low=min(source.data["t"]),
        high=max(source.data["t"]),
    )

    p.circle(source=source, x="x", y="y", color=mapper, size=3, line_alpha=0)

    p.toolbar_location = 'above'

    return p


# Take a look
p = plot_traj_bokeh(df, 0, "thorax")
bokeh.io.show(p)

So that the slider we use in this aside on speed does not interact with other plots in this notebook, we will make a fresh time interval slider.

[11]:

# Create time interval range slider
time_interval_slider_speed_demo = pn.widgets.RangeSlider(
    start=df["time (sec)"].min(),
    end=df["time (sec)"].max(),
    step=1,
    value=(df["time (sec)"].min(), df["time (sec)"].max()),
    name="time (sec)",
)

Next, we need to make sure the Bokeh plot is in a pane, so that we can link it to the slider, which we need to do explicitly since we’re updating the data source and not just replotting.

[12]:

p_pane = pn.pane.Bokeh(plot_traj_bokeh(df, ant_ID, bodypart))

Next, we need a callback. This is a function that gets executed every time the slider changes. In our callback, we update the data source, and also the color mapping depending on the value of the range slider. We have to get into the guts of the Bokeh figure, pulling out the glyph renderer and adjusting its properties, including its data source.

[13]:

def time_interval_callback(target, event):
    t_range = event.new
    inds = (
        (df["ID"] == ant_ID)
        & (df["bodypart"] == bodypart)
        & (df["time (sec)"] >= t_range[0])
        & (df["time (sec)"] <= t_range[1])
    )

    sub_df = df.loc[inds, ["x (cm)", "y (cm)", "time (sec)"]]

    gr = target.object.renderers[0]
    source = gr.data_source
    source.data["x"] = sub_df["x (cm)"].values
    source.data["y"] = sub_df["y (cm)"].values
    source.data["t"] = sub_df["time (sec)"].values

    mapper = bokeh.transform.linear_cmap(
        field_name="t",
        palette=bokeh.palettes.Viridis256,
        low=min(source.data["t"]),
        high=max(source.data["t"]),
    )
    gr.glyph.fill_color = mapper

Finally, we need to link the slider to the pane with the plot.

[14]:

time_interval_slider_speed_demo.link(p_pane, callbacks={'value': time_interval_callback})

[14]:

Watcher(inst=RangeSlider(end=357.0, name='time (sec)', start=0.0, step=1, value=(0.0, 357.0)), cls=<class 'panel.widgets.slider.RangeSlider'>, fn=<function Reactive.link.<locals>.link at 0x1d23618e60>, mode='args', onlychanged=True, parameter_names=('value',), what='value', queued=False)

The result that was printed to the screen simply says that we have set up a watcher so that the plot will get updated whenever the slider is changed. Now, we can look at our result!

[15]:

pn.Row(p_pane, pn.Spacer(width=15), pn.Column(pn.Spacer(height=50), time_interval_slider_speed_demo))

[15]:

When using this slider, you will note that the plot is much quicker in its response because it is not being rerendered.

Going forward, though, for simplicity and ease of constructing our dashboard, we will not dive into the base Bokeh and will sacrifice the speed of response.

Adding more interaction to our plot¶

We know our data includes multiple ants, multiple body parts, and multiple beetle treatments. Rather than making a new plot for each possible combination, we can add multiple interactive elements to our dashboard. Here, we will add drop-down lists to choose the beetle treatment, ant ID, and body part to track.

Notice that the possible ant IDs change between beetle treatments. For the Dalotia beetle, we have ant IDs 0 - 5. For Sceptobius, we have 6 - 11. So, our Ant ID drop-down list must change when we change the beetle treatment. To do this, we add the helper function update_ant_ID_selector which updates the options in the ant_ID_selector when the beetle treatment is changed. Notice that this function also has the decorator @pn.depends. This tells the function that the ant ID values it returns should depend on the beetle_selector drop-down list. Additionally, @pn.depends contains the additional kwarg watch=True. This tells the function to listen to the beetle_selector widget and update every time it updates.

[16]:

# Create bodypart selector drop-down list
bodypart_selector = pn.widgets.Select(
    name="body part", options=sorted(list(df["bodypart"].unique())), value="thorax"
)


# Create beetle treatment selector drop-down list
beetle_selector = pn.widgets.Select(
    name="beetle treatment",
    options=sorted(list(df["beetle_treatment"].unique())),
    value="dalotia",
)


# Create ant ID selector drop-down list
ant_ID_selector = pn.widgets.Select(
    name="Ant ID",
    options=sorted(
        list(df.loc[df["beetle_treatment"] == df['beetle_treatment'].unique()[0], "ID"].unique())
    ),
)


# Create helper function to update ant_ID_selector options
# depending on selected beetle treatment
@pn.depends(beetle_selector.param.value, watch=True)
def update_ant_ID_selector(beetle):
    inds = df["beetle_treatment"] == beetle
    options = sorted(list(df.loc[inds, "ID"].unique()))
    ant_ID_selector.options = options


# Create plotting function
@pn.depends(
    ant_ID_selector.param.value,
    bodypart_selector.param.value,
    time_interval_slider.param.value,
)
def plot_traj_interactive(ant_ID, bodypart, time_range):
    return plot_traj(df, ant_ID, bodypart, time_range)


# Set dashboard layout
widgets = pn.Column(
    pn.Spacer(height=30),
    time_interval_slider,
    pn.Spacer(height=15),
    beetle_selector,
    pn.Spacer(height=15),
    pn.Row(ant_ID_selector, bodypart_selector, width=300),
    width=300,
)

pn.Row(plot_traj_interactive, pn.Spacer(width=20), widgets)

[16]:

Adding more plots to the dashboard¶

Let’s try adding another plot to our dashboard. We want to add a plot of the x and y position vs time, plotting x and y as a separate path. First, we will build and test the plotting function to make sure it works.

[17]:

def plot_xy(df, ant_ID, bodypart, time_range=(-np.inf, np.inf)):
    sub_df = extract_sub_df(df, ant_ID, bodypart, time_range)

    x_plot = (
        hv.Curve(data=sub_df, kdims=["time (sec)"], vdims=["x (cm)"], label="x")
        .opts(
            frame_height=100,
            frame_width=500,
            color=bebi103.hv.default_categorical_cmap[0],
        )
        .opts(ylabel="position (cm)")
    )

    y_plot = (
        hv.Curve(data=sub_df, kdims=["time (sec)"], vdims=["y (cm)"], label="y")
        .opts(
            frame_height=100,
            frame_width=500,
            color=bebi103.hv.default_categorical_cmap[1],
        )
        .opts(ylabel="position (cm)")
    )

    return (x_plot * y_plot).opts(legend_offset=(10, 20))


plot_xy(df, 0, "thorax")

[17]:

Then, we will add our plotting function to our dashboard and use our @pn.depends decorator to link our new plot to our interactive elements. Finally, we have to adjust the dashboard layout to our liking.

[18]:

# Create plotting function for x and y vs time
@pn.depends(
    ant_ID_selector.param.value,
    bodypart_selector.param.value,
    time_interval_slider.param.value,
)
def plot_xy_interactive(ant_ID, bodypart, time_range):
    return plot_xy(df, ant_ID, bodypart, time_range)

# Build the layout of the dashboard
row1 = pn.Row(plot_traj_interactive, widgets)
row2 = pn.Row(plot_xy_interactive)
pn.Column(row1, pn.Spacer(height=15), row2)

[18]:

Looks nice! Let’s add one more plot, one for the cumulative distance traveled by an ant over time. First, we can compute the distance traveld for each ant.

[19]:

def distance_traveled(df):
    x_diff = df['x (cm)'].diff()
    y_diff = df['y (cm)'].diff()
    return np.cumsum(np.sqrt(x_diff**2 + y_diff**2))


df["distance traveled (cm)"] = (
    df.groupby(["ID", "bodypart"])
    .apply(distance_traveled)
    .reset_index(level=["ID", "bodypart"], drop=True)
)

Now we can write a function to make the plot we want and take a look at it. (Again, such a function would be in the package you develop for your analysis pipeline.)

[20]:

def plot_distance_traveled(df, ant_ID, bodypart, time_range=(-np.inf, np.inf)):
    """Make a plot of distance traveled."""
    sub_df = extract_sub_df(df, ant_ID, bodypart, time_range)

    return hv.Curve(
        data=sub_df,
        kdims=['time (sec)'],
        vdims=['distance traveled (cm)', 'ID', 'bodypart']
    ).opts(
        frame_height=200,
        frame_width=200
    )

plot_distance_traveled(df, 0, 'thorax')

[20]:

Looks good! Now let’s put on a wrapper and a decorator and add it to our dashboard!

[21]:

@pn.depends(
    ant_ID_selector.param.value,
    bodypart_selector.param.value,
    time_interval_slider.param.value,
)
def plot_distance_traveled_interactive(ant_ID, bodypart, time_range):
    return plot_distance_traveled(df, ant_ID, bodypart, time_range)


row1 = pn.Row(plot_traj_interactive, pn.Spacer(width=20), plot_distance_traveled_interactive)
row2 = pn.Row(plot_xy_interactive)
col1 = pn.Column(row1, pn.Spacer(height=15), row2)
pn.Row(col1, pn.Spacer(width=20), widgets)

[21]:

Looks great! The rendering has a bit of performance issues, though, and if we wanted to speed it up, we could use some of the strategies discussed above.

Computing environment¶

[22]:

%load_ext watermark
%watermark -v -p numpy,scipy,pandas,bokeh,holoviews,panel,bebi103,jupyterlab

CPython 3.7.5
IPython 7.1.1

numpy 1.17.2
scipy 1.3.1
pandas 0.24.2
bokeh 1.3.4
holoviews 1.12.6
panel 0.6.3
bebi103 0.0.43
jupyterlab 1.1.4