Selecting data and serving a dashboard

Data set download

[1]:

import os

# Special for this notebook, we will always take the data set from the internet
data_path = "https://s3.amazonaws.com/bebi103.caltech.edu/data/"

import numpy as np
import pandas as pd

import scipy.stats as st

import bebi103
import iqplot

import holoviews as hv

import bokeh.io

import panel as pn

bokeh.io.output_notebook()
pn.extension()

hv.extension('bokeh')
bebi103.hv.set_defaults()

/Users/bois/opt/anaconda3/lib/python3.8/site-packages/arviz/__init__.py:317: UserWarning: Trying to register the cmap 'cet_gray' which already exists.
  register_cmap("cet_" + name, cmap=cmap)
/Users/bois/opt/anaconda3/lib/python3.8/site-packages/arviz/__init__.py:317: UserWarning: Trying to register the cmap 'cet_gray_r' which already exists.
  register_cmap("cet_" + name, cmap=cmap)

Loading BokehJS ...

Because it is all about interactive plotting that requires a running Python engine, you really should download this notebook and run it on your machine. Note that Panel will not work on Google Colab as of October 2020.

In the dashboard we built in the previous part of this lesson, we selected which data we wanted displayed based on the beetle treatment and the ant ID, as well as the time interval slider. While this is useful, we often want to select data based on selected data in other plots. This idea might not be so clear right now, so let’s proceed to an example. You will see this is a very powerful idea.

A plot of summary data

We will use the same beetle data set. Let’s load it in and prep it, including computing the distance traveled, as we did in the last part of the lesson.

[2]:

# Load data without comments
df = pd.read_csv(os.path.join(data_path, "ant_joint_locations.zip"), comment="#")

interpixel_distance = 0.08  # cm

# Create position columns in units of cm
df["x (cm)"] = df["x_coord"] * interpixel_distance
df["y (cm)"] = df["y_coord"] * interpixel_distance

# Create time column in units of seconds
df["time (sec)"] = df["frame"] / 28


def distance_traveled(df):
    """Compute distance traveled for a given beetle."""
    x_diff = df["x (cm)"].diff()
    y_diff = df["y (cm)"].diff()
    return np.cumsum(np.sqrt(x_diff ** 2 + y_diff ** 2))


df["distance traveled (cm)"] = (
    df.groupby(["ID", "bodypart"])
    .apply(distance_traveled)
    .reset_index(level=["ID", "bodypart"], drop=True)
)

# Take a look to remind ourselves
df.head()

[2]:

	frame	beetle_treatment	bodypart	x_coord	y_coord	likelihood	x (cm)	y (cm)	time (sec)	distance traveled (cm)
0	0	dalotia	head	73.086	193.835	1.0	5.84688	15.50680	0.000000	NaN
1	1	dalotia	head	73.730	194.385	1.0	5.89840	15.55080	0.035714	0.067752
2	2	dalotia	head	75.673	195.182	1.0	6.05384	15.61456	0.071429	0.235761
3	3	dalotia	head	77.319	196.582	1.0	6.18552	15.72656	0.107143	0.408629
4	4	dalotia	head	78.128	197.891	1.0	6.25024	15.83128	0.142857	0.531735

We may want to summarize the motion of the ants by the total distance traveled. Let’s compute that and store the result in a new data frame.

[3]:

df_dist = (
    df.groupby(["beetle_treatment", "ID", "bodypart"])["distance traveled (cm)"]
    .apply(lambda x: x.max())
    .reset_index()
)

# Take a look
df_dist

[3]:

	beetle_treatment	ID	bodypart	distance traveled (cm)
0	dalotia	0	abdomen	1256.637437
1	dalotia	0	antenna_left	2688.416512
2	dalotia	0	antenna_right	2800.528436
3	dalotia	0	head	1647.465193
4	dalotia	0	thorax	1266.693540
5	dalotia	1	abdomen	1143.534573
6	dalotia	1	antenna_left	2892.714768
7	dalotia	1	antenna_right	2856.248616
8	dalotia	1	head	1575.459175
9	dalotia	1	thorax	1135.457988
10	dalotia	2	abdomen	1068.647668
11	dalotia	2	antenna_left	3427.051189
12	dalotia	2	antenna_right	3810.080622
13	dalotia	2	head	1870.329342
14	dalotia	2	thorax	1289.644406
15	dalotia	3	abdomen	2169.129372
16	dalotia	3	antenna_left	4687.207298
17	dalotia	3	antenna_right	5569.482037
18	dalotia	3	head	3383.078720
19	dalotia	3	thorax	2435.800519
20	dalotia	4	abdomen	1879.510454
21	dalotia	4	antenna_left	3449.479980
22	dalotia	4	antenna_right	3330.416362
23	dalotia	4	head	2059.613435
24	dalotia	4	thorax	1642.574170
25	dalotia	5	abdomen	1383.706414
26	dalotia	5	antenna_left	2677.861333
27	dalotia	5	antenna_right	2561.370168
28	dalotia	5	head	1735.869976
29	dalotia	5	thorax	1326.177297
30	sceptobius	6	abdomen	912.727949
31	sceptobius	6	antenna_left	2657.582883
32	sceptobius	6	antenna_right	2287.451179
33	sceptobius	6	head	1205.616500
34	sceptobius	6	thorax	588.067617
35	sceptobius	7	abdomen	339.701993
36	sceptobius	7	antenna_left	1531.338615
37	sceptobius	7	antenna_right	2389.643450
38	sceptobius	7	head	420.652691
39	sceptobius	7	thorax	238.159884
40	sceptobius	8	abdomen	500.156206
41	sceptobius	8	antenna_left	2853.945585
42	sceptobius	8	antenna_right	2777.918093
43	sceptobius	8	head	1085.719023
44	sceptobius	8	thorax	703.824390
45	sceptobius	9	abdomen	357.735190
46	sceptobius	9	antenna_left	2382.851423
47	sceptobius	9	antenna_right	2488.580833
48	sceptobius	9	head	887.401463
49	sceptobius	9	thorax	546.723268
50	sceptobius	10	abdomen	661.166480
51	sceptobius	10	antenna_left	2693.747130
52	sceptobius	10	antenna_right	2614.627036
53	sceptobius	10	head	1181.083980
54	sceptobius	10	thorax	826.188143
55	sceptobius	11	abdomen	504.217197
56	sceptobius	11	antenna_left	2003.843440
57	sceptobius	11	antenna_right	2048.341855
58	sceptobius	11	head	726.746776
59	sceptobius	11	thorax	514.148107

To visualize this summary, we can make a strip plot, using the thorax as the body part. Thinking ahead, when we make it, we will include a tap tool, which enables selection of a glyph by clicking on it. We will also include a hover tool so we can see which ant/beetle treatment each glyph represents.

[4]:

strip = iqplot.strip(
    df_dist.loc[df_dist["bodypart"]=="thorax", :],
    q="distance traveled (cm)",
    cats="beetle_treatment",
    q_axis="y",
    palette=["#7570b3", "#1b9e77"],
    y_axis_label="distance traveled (cm)",
    frame_height=300,
    frame_width=150,
    tools="pan,box_zoom,wheel_zoom,reset,tap,save",
    tooltips=[("ant ID", "@ID"), ("beetle", "@beetle_treatment")],
)

# Always start at zero
strip.y_range.start = 0

bokeh.io.show(strip)

This summary plot exposes, for example, that ant 3 is highly active (you can see it’s ant 3 by hovering over the top point), and ant 11 is lethargic. In our dashboard, we would like to include this summary plot and enable clicking on the glyphs and automatically update the displayed plots to be for the selected ant/beetle treatment.

To achieve this goal, let’s first rebuild the app from the previous section.

Building the dashboard

We will use exactly the same code as in the previous part of this lesson, only with slight changes in the spacing of the layout to allow for the addition of the above summary plot. Get ready for a large code cell!

[5]:

def extract_sub_df(df, ant_ID, bodypart, time_range):
    """Extract sub data frame for body part of
    one ant over a time range."""
    inds = (
        (df["ID"] == ant_ID)
        & (df["bodypart"] == bodypart)
        & (df["time (sec)"] >= time_range[0])
        & (df["time (sec)"] <= time_range[1])
    )

    return df.loc[inds, :]


def plot_traj(df, ant_ID, bodypart, time_range=(-np.inf, np.inf)):
    """Plot the trajectory of a single ant over time."""
    sub_df = extract_sub_df(df, ant_ID, bodypart, time_range)

    return hv.Path(
        data=sub_df,
        kdims=["x (cm)", "y (cm)"],
        vdims=["time (sec)"]
    ).opts(
        color="time (sec)",
        colorbar=True,
        colorbar_opts={"title": "time (sec)"},
        frame_height=200,
        frame_width=200,
        xlim=(0, 20),
        ylim=(0, 20)
    )


def plot_xy(df, ant_ID, bodypart, time_range=(-np.inf, np.inf)):
    """Plot the x and y positions of a beetle over time."""
    sub_df = extract_sub_df(df, ant_ID, bodypart, time_range)

    x_plot = (
        hv.Curve(data=sub_df, kdims=["time (sec)"], vdims=["x (cm)"], label="x")
        .opts(
            frame_height=100,
            frame_width=500,
            color=bebi103.hv.default_categorical_cmap[0],
        )
        .opts(ylabel="position (cm)")
    )

    y_plot = (
        hv.Curve(data=sub_df, kdims=["time (sec)"], vdims=["y (cm)"], label="y")
        .opts(
            frame_height=100,
            frame_width=500,
            color=bebi103.hv.default_categorical_cmap[1],
        )
        .opts(ylabel="position (cm)")
    )

    return (x_plot * y_plot).opts(legend_offset=(10, 20))


def plot_distance_traveled(df, ant_ID, bodypart, time_range=(-np.inf, np.inf)):
    """Make a plot of distance traveled."""
    sub_df = extract_sub_df(df, ant_ID, bodypart, time_range)

    return hv.Curve(
        data=sub_df,
        kdims=['time (sec)'],
        vdims=['distance traveled (cm)', 'ID', 'bodypart']
    ).opts(
        frame_height=200,
        frame_width=200
    )


# Create bodypart selector drop-down list
bodypart_selector = pn.widgets.Select(
    name="body part", options=sorted(list(df["bodypart"].unique())), value="thorax"
)


# Create beetle treatment selector drop-down list
beetle_selector = pn.widgets.Select(
    name="beetle treatment",
    options=sorted(list(df["beetle_treatment"].unique())),
    value="dalotia",
)


# Create ant ID selector drop-down list
ant_ID_selector = pn.widgets.Select(
    name="Ant ID",
    options=sorted(
        list(df.loc[df["beetle_treatment"] == df['beetle_treatment'].unique()[0], "ID"].unique())
    ),
)

# Ranges of times for convenience
start = df["time (sec)"].min()
end = df["time (sec)"].max()

# Create throttled time interval range slider
time_interval_slider = pn.widgets.RangeSlider(
    start=start,
    end=end,
    step=1,
    value=(df["time (sec)"].min(), df["time (sec)"].max()),
    name="time (sec)",
    value_throttled=(start, end),
)


# Create helper function to update ant_ID_selector options
# depending on selected beetle treatment
@pn.depends(beetle_selector.param.value, watch=True)
def update_ant_ID_selector(beetle):
    inds = df["beetle_treatment"] == beetle
    options = sorted(list(df.loc[inds, "ID"].unique()))
    ant_ID_selector.options = options


# Create plotting function
@pn.depends(
    ant_ID_selector.param.value,
    bodypart_selector.param.value,
    time_interval_slider.param.value_throttled,
)
def plot_traj_interactive(ant_ID, bodypart, time_range):
    return plot_traj(df, ant_ID, bodypart, time_range)


# Create plotting function for x and y vs time
@pn.depends(
    ant_ID_selector.param.value,
    bodypart_selector.param.value,
    time_interval_slider.param.value_throttled,
)
def plot_xy_interactive(ant_ID, bodypart, time_range):
    return plot_xy(df, ant_ID, bodypart, time_range)


@pn.depends(
    ant_ID_selector.param.value,
    bodypart_selector.param.value,
    time_interval_slider.param.value_throttled,
)
def plot_distance_traveled_interactive(ant_ID, bodypart, time_range):
    return plot_distance_traveled(df, ant_ID, bodypart, time_range)


widgets = pn.Column(
    time_interval_slider,
    pn.Spacer(height=10),
    beetle_selector,
    pn.Spacer(height=10),
    pn.Row(ant_ID_selector, bodypart_selector, width=300),
    width=300,
)

We have made and connected all of the plots and widgets (but have not rendered them). Whenever the ant ID, body part, or time interval selection changes, the plots will update.

Our task now is the add the summary plot. It should respond to the body part widget so that the plot updates with the appropriate widget. So, let’s write a properly decorated function to do that. We will not regenerate the whole plot, but rather update its data source. To extract the data source from a Bokeh plot, we need to dig in to its glyph renderers. If the plot is called p, its ColumnDataSource is p.renderers[i].data_source, where i is the index of the set of glyphs we are considering. For strip plots generated by iqplot, there is only a single data source, so i is always 0.

The update function takes as an argument a Panel Event object (described in the docs) that has the attribute new, which is the new value of the widget. We then set up a watcher so that the update function gets triggered whenever the body part selector widget is changed.

[6]:

def update_strip(event):
    # Update data source
    strip.renderers[0].data_source.data["distance traveled (cm)"] = df_dist.loc[
        df_dist["bodypart"] == event.new, "distance traveled (cm)"
    ].values


watcher = bodypart_selector.param.watch(update_strip, 'value', onlychanged=True)

Now that we have the plot set up, we can write a callback for when data are selected. The callback must take three arguments, attr, old, and new, which refer to the index of the data point that is collected.

[7]:

def select_ant(attr, old, new):
    """Update widgets for selection on strip plot."""
    # Extract data source
    source = strip.renderers[0].data_source

    # Use try block in case no data are selected (then pass)
    try:
        # Get index of selected glyph
        ind = new[0]

        # Set widget values
        beetle_selector.value = source.data["beetle_treatment"][ind]
        ant_ID_selector.value = source.data["ID"][ind]
    except:
        pass

Now that the callback is defined, we need to make the data source get updated whenever we change selection. We do this with the selected.on_change() method of a ColumnDataSource.

[8]:

strip.renderers[0].data_source.selected.on_change("indices", select_ant)

All the pieces are now in place! Let’s lay it out!

[9]:

row1 = pn.Row(plot_traj_interactive, pn.Spacer(width=20), plot_distance_traveled_interactive)
row2 = pn.Row(plot_xy_interactive)
col1 = pn.Column(pn.Spacer(height=25), row1, pn.Spacer(height=35), row2)
col2 = pn.Column(widgets, pn.Spacer(height=15), strip)
dashboard = pn.Row(col1, pn.Spacer(width=30), col2)

We now have our dashboard, and we can take a look at it with a code cell

dashboard

But before we do….

Deploying a dashboard on a stand-alone browser tab

Panel ingeniously lets you move your dashboard from prototypes in a notebook to its own stand-alone app in a separate tab in your browser (you can read the docs about that). All you need to do is put .servable() behind a Panel object in your notebook. You can then serve the dashboard by entering panel serve --show name_of_notebook.ipynb on the command line.

I invite you to download this notebook, which is named selecting_data_and_deploying.ipynb and serve it up using

panel serve --show selecting_data_and_deploying.ipynb

You will see the dashboard on its own tab as you see it below, because I include the code cell below. (You do not need to worry about the data set, since I set this notebook up to always download it from the internet.)

[10]:

dashboard.servable()

[10]:

This new layout affords us much more rapid exploration of the data. Using a clickable plot with summary statistics that then updates more detailed plots is a very power exploratory method. I use it in most dashboards I build.

…And, you can share this dashboard with your colleagues by sending them the Jupyter notebook. If they are not interested in the logic of how you built the dashboard which you naturally expertly explain in your markdown cells, or in the guts of the code, they can just serve it to themselves from the command line and explore the whole data set with the dashboard. This is truly excellent.

Computing environment

[11]:

%load_ext watermark
%watermark -v -p numpy,scipy,pandas,bokeh,holoviews,panel,iqplot,bebi103,jupyterlab

Python implementation: CPython
Python version       : 3.8.12
IPython version      : 7.27.0

numpy     : 1.21.2
scipy     : 1.7.1
pandas    : 1.3.3
bokeh     : 2.3.3
holoviews : 1.14.6
panel     : 0.12.1
iqplot    : 0.2.3
bebi103   : 0.1.8
jupyterlab: 3.1.7