Selecting data and serving a dashboard

Data set download


[1]:
import os

# Special for this notebook, we will always take the data set from the internet
data_path = "https://s3.amazonaws.com/bebi103.caltech.edu/data/"

import numpy as np
import pandas as pd

import scipy.stats as st

import bebi103
import iqplot

import holoviews as hv

import bokeh.io

import panel as pn

bokeh.io.output_notebook()
pn.extension()

hv.extension('bokeh')
bebi103.hv.set_defaults()
/Users/bois/opt/anaconda3/lib/python3.8/site-packages/arviz/__init__.py:317: UserWarning: Trying to register the cmap 'cet_gray' which already exists.
  register_cmap("cet_" + name, cmap=cmap)
/Users/bois/opt/anaconda3/lib/python3.8/site-packages/arviz/__init__.py:317: UserWarning: Trying to register the cmap 'cet_gray_r' which already exists.
  register_cmap("cet_" + name, cmap=cmap)
Loading BokehJS ...

Because it is all about interactive plotting that requires a running Python engine, you really should download this notebook and run it on your machine. Note that Panel will not work on Google Colab as of October 2020.

In the dashboard we built in the previous part of this lesson, we selected which data we wanted displayed based on the beetle treatment and the ant ID, as well as the time interval slider. While this is useful, we often want to select data based on selected data in other plots. This idea might not be so clear right now, so let’s proceed to an example. You will see this is a very powerful idea.

A plot of summary data

We will use the same beetle data set. Let’s load it in and prep it, including computing the distance traveled, as we did in the last part of the lesson.

[2]:
# Load data without comments
df = pd.read_csv(os.path.join(data_path, "ant_joint_locations.zip"), comment="#")

interpixel_distance = 0.08  # cm

# Create position columns in units of cm
df["x (cm)"] = df["x_coord"] * interpixel_distance
df["y (cm)"] = df["y_coord"] * interpixel_distance

# Create time column in units of seconds
df["time (sec)"] = df["frame"] / 28


def distance_traveled(df):
    """Compute distance traveled for a given beetle."""
    x_diff = df["x (cm)"].diff()
    y_diff = df["y (cm)"].diff()
    return np.cumsum(np.sqrt(x_diff ** 2 + y_diff ** 2))


df["distance traveled (cm)"] = (
    df.groupby(["ID", "bodypart"])
    .apply(distance_traveled)
    .reset_index(level=["ID", "bodypart"], drop=True)
)

# Take a look to remind ourselves
df.head()
[2]:
frame beetle_treatment ID bodypart x_coord y_coord likelihood x (cm) y (cm) time (sec) distance traveled (cm)
0 0 dalotia 0 head 73.086 193.835 1.0 5.84688 15.50680 0.000000 NaN
1 1 dalotia 0 head 73.730 194.385 1.0 5.89840 15.55080 0.035714 0.067752
2 2 dalotia 0 head 75.673 195.182 1.0 6.05384 15.61456 0.071429 0.235761
3 3 dalotia 0 head 77.319 196.582 1.0 6.18552 15.72656 0.107143 0.408629
4 4 dalotia 0 head 78.128 197.891 1.0 6.25024 15.83128 0.142857 0.531735

We may want to summarize the motion of the ants by the total distance traveled. Let’s compute that and store the result in a new data frame.

[3]:
df_dist = (
    df.groupby(["beetle_treatment", "ID", "bodypart"])["distance traveled (cm)"]
    .apply(lambda x: x.max())
    .reset_index()
)

# Take a look
df_dist
[3]:
beetle_treatment ID bodypart distance traveled (cm)
0 dalotia 0 abdomen 1256.637437
1 dalotia 0 antenna_left 2688.416512
2 dalotia 0 antenna_right 2800.528436
3 dalotia 0 head 1647.465193
4 dalotia 0 thorax 1266.693540
5 dalotia 1 abdomen 1143.534573
6 dalotia 1 antenna_left 2892.714768
7 dalotia 1 antenna_right 2856.248616
8 dalotia 1 head 1575.459175
9 dalotia 1 thorax 1135.457988
10 dalotia 2 abdomen 1068.647668
11 dalotia 2 antenna_left 3427.051189
12 dalotia 2 antenna_right 3810.080622
13 dalotia 2 head 1870.329342
14 dalotia 2 thorax 1289.644406
15 dalotia 3 abdomen 2169.129372
16 dalotia 3 antenna_left 4687.207298
17 dalotia 3 antenna_right 5569.482037
18 dalotia 3 head 3383.078720
19 dalotia 3 thorax 2435.800519
20 dalotia 4 abdomen 1879.510454
21 dalotia 4 antenna_left 3449.479980
22 dalotia 4 antenna_right 3330.416362
23 dalotia 4 head 2059.613435
24 dalotia 4 thorax 1642.574170
25 dalotia 5 abdomen 1383.706414
26 dalotia 5 antenna_left 2677.861333
27 dalotia 5 antenna_right 2561.370168
28 dalotia 5 head 1735.869976
29 dalotia 5 thorax 1326.177297
30 sceptobius 6 abdomen 912.727949
31 sceptobius 6 antenna_left 2657.582883
32 sceptobius 6 antenna_right 2287.451179
33 sceptobius 6 head 1205.616500
34 sceptobius 6 thorax 588.067617
35 sceptobius 7 abdomen 339.701993
36 sceptobius 7 antenna_left 1531.338615
37 sceptobius 7 antenna_right 2389.643450
38 sceptobius 7 head 420.652691
39 sceptobius 7 thorax 238.159884
40 sceptobius 8 abdomen 500.156206
41 sceptobius 8 antenna_left 2853.945585
42 sceptobius 8 antenna_right 2777.918093
43 sceptobius 8 head 1085.719023
44 sceptobius 8 thorax 703.824390
45 sceptobius 9 abdomen 357.735190
46 sceptobius 9 antenna_left 2382.851423
47 sceptobius 9 antenna_right 2488.580833
48 sceptobius 9 head 887.401463
49 sceptobius 9 thorax 546.723268
50 sceptobius 10 abdomen 661.166480
51 sceptobius 10 antenna_left 2693.747130
52 sceptobius 10 antenna_right 2614.627036
53 sceptobius 10 head 1181.083980
54 sceptobius 10 thorax 826.188143
55 sceptobius 11 abdomen 504.217197
56 sceptobius 11 antenna_left 2003.843440
57 sceptobius 11 antenna_right 2048.341855
58 sceptobius 11 head 726.746776
59 sceptobius 11 thorax 514.148107

To visualize this summary, we can make a strip plot, using the thorax as the body part. Thinking ahead, when we make it, we will include a tap tool, which enables selection of a glyph by clicking on it. We will also include a hover tool so we can see which ant/beetle treatment each glyph represents.

[4]:
strip = iqplot.strip(
    df_dist.loc[df_dist["bodypart"]=="thorax", :],
    q="distance traveled (cm)",
    cats="beetle_treatment",
    q_axis="y",
    palette=["#7570b3", "#1b9e77"],
    y_axis_label="distance traveled (cm)",
    frame_height=300,
    frame_width=150,
    tools="pan,box_zoom,wheel_zoom,reset,tap,save",
    tooltips=[("ant ID", "@ID"), ("beetle", "@beetle_treatment")],
)

# Always start at zero
strip.y_range.start = 0

bokeh.io.show(strip)

This summary plot exposes, for example, that ant 3 is highly active (you can see it’s ant 3 by hovering over the top point), and ant 11 is lethargic. In our dashboard, we would like to include this summary plot and enable clicking on the glyphs and automatically update the displayed plots to be for the selected ant/beetle treatment.

To achieve this goal, let’s first rebuild the app from the previous section.

Building the dashboard

We will use exactly the same code as in the previous part of this lesson, only with slight changes in the spacing of the layout to allow for the addition of the above summary plot. Get ready for a large code cell!

[5]:
def extract_sub_df(df, ant_ID, bodypart, time_range):
    """Extract sub data frame for body part of
    one ant over a time range."""
    inds = (
        (df["ID"] == ant_ID)
        & (df["bodypart"] == bodypart)
        & (df["time (sec)"] >= time_range[0])
        & (df["time (sec)"] <= time_range[1])
    )

    return df.loc[inds, :]


def plot_traj(df, ant_ID, bodypart, time_range=(-np.inf, np.inf)):
    """Plot the trajectory of a single ant over time."""
    sub_df = extract_sub_df(df, ant_ID, bodypart, time_range)

    return hv.Path(
        data=sub_df,
        kdims=["x (cm)", "y (cm)"],
        vdims=["time (sec)"]
    ).opts(
        color="time (sec)",
        colorbar=True,
        colorbar_opts={"title": "time (sec)"},
        frame_height=200,
        frame_width=200,
        xlim=(0, 20),
        ylim=(0, 20)
    )


def plot_xy(df, ant_ID, bodypart, time_range=(-np.inf, np.inf)):
    """Plot the x and y positions of a beetle over time."""
    sub_df = extract_sub_df(df, ant_ID, bodypart, time_range)

    x_plot = (
        hv.Curve(data=sub_df, kdims=["time (sec)"], vdims=["x (cm)"], label="x")
        .opts(
            frame_height=100,
            frame_width=500,
            color=bebi103.hv.default_categorical_cmap[0],
        )
        .opts(ylabel="position (cm)")
    )

    y_plot = (
        hv.Curve(data=sub_df, kdims=["time (sec)"], vdims=["y (cm)"], label="y")
        .opts(
            frame_height=100,
            frame_width=500,
            color=bebi103.hv.default_categorical_cmap[1],
        )
        .opts(ylabel="position (cm)")
    )

    return (x_plot * y_plot).opts(legend_offset=(10, 20))


def plot_distance_traveled(df, ant_ID, bodypart, time_range=(-np.inf, np.inf)):
    """Make a plot of distance traveled."""
    sub_df = extract_sub_df(df, ant_ID, bodypart, time_range)

    return hv.Curve(
        data=sub_df,
        kdims=['time (sec)'],
        vdims=['distance traveled (cm)', 'ID', 'bodypart']
    ).opts(
        frame_height=200,
        frame_width=200
    )


# Create bodypart selector drop-down list
bodypart_selector = pn.widgets.Select(
    name="body part", options=sorted(list(df["bodypart"].unique())), value="thorax"
)


# Create beetle treatment selector drop-down list
beetle_selector = pn.widgets.Select(
    name="beetle treatment",
    options=sorted(list(df["beetle_treatment"].unique())),
    value="dalotia",
)


# Create ant ID selector drop-down list
ant_ID_selector = pn.widgets.Select(
    name="Ant ID",
    options=sorted(
        list(df.loc[df["beetle_treatment"] == df['beetle_treatment'].unique()[0], "ID"].unique())
    ),
)

# Ranges of times for convenience
start = df["time (sec)"].min()
end = df["time (sec)"].max()

# Create throttled time interval range slider
time_interval_slider = pn.widgets.RangeSlider(
    start=start,
    end=end,
    step=1,
    value=(df["time (sec)"].min(), df["time (sec)"].max()),
    name="time (sec)",
    value_throttled=(start, end),
)


# Create helper function to update ant_ID_selector options
# depending on selected beetle treatment
@pn.depends(beetle_selector.param.value, watch=True)
def update_ant_ID_selector(beetle):
    inds = df["beetle_treatment"] == beetle
    options = sorted(list(df.loc[inds, "ID"].unique()))
    ant_ID_selector.options = options


# Create plotting function
@pn.depends(
    ant_ID_selector.param.value,
    bodypart_selector.param.value,
    time_interval_slider.param.value_throttled,
)
def plot_traj_interactive(ant_ID, bodypart, time_range):
    return plot_traj(df, ant_ID, bodypart, time_range)


# Create plotting function for x and y vs time
@pn.depends(
    ant_ID_selector.param.value,
    bodypart_selector.param.value,
    time_interval_slider.param.value_throttled,
)
def plot_xy_interactive(ant_ID, bodypart, time_range):
    return plot_xy(df, ant_ID, bodypart, time_range)


@pn.depends(
    ant_ID_selector.param.value,
    bodypart_selector.param.value,
    time_interval_slider.param.value_throttled,
)
def plot_distance_traveled_interactive(ant_ID, bodypart, time_range):
    return plot_distance_traveled(df, ant_ID, bodypart, time_range)


widgets = pn.Column(
    time_interval_slider,
    pn.Spacer(height=10),
    beetle_selector,
    pn.Spacer(height=10),
    pn.Row(ant_ID_selector, bodypart_selector, width=300),
    width=300,
)

We have made and connected all of the plots and widgets (but have not rendered them). Whenever the ant ID, body part, or time interval selection changes, the plots will update.

Our task now is the add the summary plot. It should respond to the body part widget so that the plot updates with the appropriate widget. So, let’s write a properly decorated function to do that. We will not regenerate the whole plot, but rather update its data source. To extract the data source from a Bokeh plot, we need to dig in to its glyph renderers. If the plot is called p, its ColumnDataSource is p.renderers[i].data_source, where i is the index of the set of glyphs we are considering. For strip plots generated by iqplot, there is only a single data source, so i is always 0.

The update function takes as an argument a Panel Event object (described in the docs) that has the attribute new, which is the new value of the widget. We then set up a watcher so that the update function gets triggered whenever the body part selector widget is changed.

[6]:
def update_strip(event):
    # Update data source
    strip.renderers[0].data_source.data["distance traveled (cm)"] = df_dist.loc[
        df_dist["bodypart"] == event.new, "distance traveled (cm)"
    ].values


watcher = bodypart_selector.param.watch(update_strip, 'value', onlychanged=True)

Now that we have the plot set up, we can write a callback for when data are selected. The callback must take three arguments, attr, old, and new, which refer to the index of the data point that is collected.

[7]:
def select_ant(attr, old, new):
    """Update widgets for selection on strip plot."""
    # Extract data source
    source = strip.renderers[0].data_source

    # Use try block in case no data are selected (then pass)
    try:
        # Get index of selected glyph
        ind = new[0]

        # Set widget values
        beetle_selector.value = source.data["beetle_treatment"][ind]
        ant_ID_selector.value = source.data["ID"][ind]
    except:
        pass

Now that the callback is defined, we need to make the data source get updated whenever we change selection. We do this with the selected.on_change() method of a ColumnDataSource.

[8]:
strip.renderers[0].data_source.selected.on_change("indices", select_ant)

All the pieces are now in place! Let’s lay it out!

[9]:
row1 = pn.Row(plot_traj_interactive, pn.Spacer(width=20), plot_distance_traveled_interactive)
row2 = pn.Row(plot_xy_interactive)
col1 = pn.Column(pn.Spacer(height=25), row1, pn.Spacer(height=35), row2)
col2 = pn.Column(widgets, pn.Spacer(height=15), strip)
dashboard = pn.Row(col1, pn.Spacer(width=30), col2)

We now have our dashboard, and we can take a look at it with a code cell

dashboard

But before we do….

Deploying a dashboard on a stand-alone browser tab

Panel ingeniously lets you move your dashboard from prototypes in a notebook to its own stand-alone app in a separate tab in your browser (you can read the docs about that). All you need to do is put .servable() behind a Panel object in your notebook. You can then serve the dashboard by entering panel serve --show name_of_notebook.ipynb on the command line.

I invite you to download this notebook, which is named selecting_data_and_deploying.ipynb and serve it up using

panel serve --show selecting_data_and_deploying.ipynb

You will see the dashboard on its own tab as you see it below, because I include the code cell below. (You do not need to worry about the data set, since I set this notebook up to always download it from the internet.)

[10]:
dashboard.servable()
[10]:

This new layout affords us much more rapid exploration of the data. Using a clickable plot with summary statistics that then updates more detailed plots is a very power exploratory method. I use it in most dashboards I build.

…And, you can share this dashboard with your colleagues by sending them the Jupyter notebook. If they are not interested in the logic of how you built the dashboard which you naturally expertly explain in your markdown cells, or in the guts of the code, they can just serve it to themselves from the command line and explore the whole data set with the dashboard. This is truly excellent.

Computing environment

[11]:
%load_ext watermark
%watermark -v -p numpy,scipy,pandas,bokeh,holoviews,panel,iqplot,bebi103,jupyterlab
Python implementation: CPython
Python version       : 3.8.12
IPython version      : 7.27.0

numpy     : 1.21.2
scipy     : 1.7.1
pandas    : 1.3.3
bokeh     : 2.3.3
holoviews : 1.14.6
panel     : 0.12.1
iqplot    : 0.2.3
bebi103   : 0.1.8
jupyterlab: 3.1.7