Selecting data and serving a dashboardΒΆ
[1]:
# Colab setup ------------------
import os, sys, subprocess
if "google.colab" in sys.modules:
cmd = "pip install --upgrade iqplot colorcet bebi103 watermark"
process = subprocess.Popen(cmd.split(), stdout=subprocess.PIPE, stderr=subprocess.PIPE)
stdout, stderr = process.communicate()
# ------------------------------
# Special for this notebook, we will always take the data set from the internet
data_path = "https://s3.amazonaws.com/bebi103.caltech.edu/data/"
import numpy as np
import pandas as pd
import scipy.stats as st
import bebi103
import iqplot
import holoviews as hv
import bokeh.io
import panel as pn
bokeh.io.output_notebook()
pn.extension()
hv.extension('bokeh')
bebi103.hv.set_defaults()
Because it is all about interactive plotting that requires a running Python engine, you really should download this notebook and run it on your machine. Note that Panel will not work on Google Colab as of October 2020.
In the dashboard we built in the previous part of this lesson, we selected which data we wanted displayed based on the beetle treatment and the ant ID, as well as the time interval slider. While this is useful, we often want to select data based on selected data in other plots. This idea might not be so clear right now, so letβs proceed to an example. You will see this is a very powerful idea.
A plot of summary dataΒΆ
We will use the same beetle data set. Letβs load it in and prep it, including computing the distance traveled, as we did in the last part of the lesson.
[2]:
# Load data without comments
df = pd.read_csv(os.path.join(data_path, "ant_joint_locations.zip"), comment="#")
interpixel_distance = 0.08 # cm
# Create position columns in units of cm
df["x (cm)"] = df["x_coord"] * interpixel_distance
df["y (cm)"] = df["y_coord"] * interpixel_distance
# Create time column in units of seconds
df["time (sec)"] = df["frame"] / 28
def distance_traveled(df):
"""Compute distance traveled for a given beetle."""
x_diff = df["x (cm)"].diff()
y_diff = df["y (cm)"].diff()
return np.cumsum(np.sqrt(x_diff ** 2 + y_diff ** 2))
df["distance traveled (cm)"] = (
df.groupby(["ID", "bodypart"])
.apply(distance_traveled)
.reset_index(level=["ID", "bodypart"], drop=True)
)
# Take a look to remind ourselves
df.head()
[2]:
frame | beetle_treatment | ID | bodypart | x_coord | y_coord | likelihood | x (cm) | y (cm) | time (sec) | distance traveled (cm) | |
---|---|---|---|---|---|---|---|---|---|---|---|
0 | 0 | dalotia | 0 | head | 73.086 | 193.835 | 1.0 | 5.84688 | 15.50680 | 0.000000 | NaN |
1 | 1 | dalotia | 0 | head | 73.730 | 194.385 | 1.0 | 5.89840 | 15.55080 | 0.035714 | 0.067752 |
2 | 2 | dalotia | 0 | head | 75.673 | 195.182 | 1.0 | 6.05384 | 15.61456 | 0.071429 | 0.235761 |
3 | 3 | dalotia | 0 | head | 77.319 | 196.582 | 1.0 | 6.18552 | 15.72656 | 0.107143 | 0.408629 |
4 | 4 | dalotia | 0 | head | 78.128 | 197.891 | 1.0 | 6.25024 | 15.83128 | 0.142857 | 0.531735 |
We may want to summarize the motion of the ants by the total distance traveled. Letβs compute that and store the result in a new data frame.
[3]:
df_dist = (
df.groupby(["beetle_treatment", "ID", "bodypart"])["distance traveled (cm)"]
.apply(lambda x: x.max())
.reset_index()
)
# Take a look
df_dist
[3]:
beetle_treatment | ID | bodypart | distance traveled (cm) | |
---|---|---|---|---|
0 | dalotia | 0 | abdomen | 1256.637437 |
1 | dalotia | 0 | antenna_left | 2688.416512 |
2 | dalotia | 0 | antenna_right | 2800.528436 |
3 | dalotia | 0 | head | 1647.465193 |
4 | dalotia | 0 | thorax | 1266.693540 |
5 | dalotia | 1 | abdomen | 1143.534573 |
6 | dalotia | 1 | antenna_left | 2892.714768 |
7 | dalotia | 1 | antenna_right | 2856.248616 |
8 | dalotia | 1 | head | 1575.459175 |
9 | dalotia | 1 | thorax | 1135.457988 |
10 | dalotia | 2 | abdomen | 1068.647668 |
11 | dalotia | 2 | antenna_left | 3427.051189 |
12 | dalotia | 2 | antenna_right | 3810.080622 |
13 | dalotia | 2 | head | 1870.329342 |
14 | dalotia | 2 | thorax | 1289.644406 |
15 | dalotia | 3 | abdomen | 2169.129372 |
16 | dalotia | 3 | antenna_left | 4687.207298 |
17 | dalotia | 3 | antenna_right | 5569.482037 |
18 | dalotia | 3 | head | 3383.078720 |
19 | dalotia | 3 | thorax | 2435.800519 |
20 | dalotia | 4 | abdomen | 1879.510454 |
21 | dalotia | 4 | antenna_left | 3449.479980 |
22 | dalotia | 4 | antenna_right | 3330.416362 |
23 | dalotia | 4 | head | 2059.613435 |
24 | dalotia | 4 | thorax | 1642.574170 |
25 | dalotia | 5 | abdomen | 1383.706414 |
26 | dalotia | 5 | antenna_left | 2677.861333 |
27 | dalotia | 5 | antenna_right | 2561.370168 |
28 | dalotia | 5 | head | 1735.869976 |
29 | dalotia | 5 | thorax | 1326.177297 |
30 | sceptobius | 6 | abdomen | 912.727949 |
31 | sceptobius | 6 | antenna_left | 2657.582883 |
32 | sceptobius | 6 | antenna_right | 2287.451179 |
33 | sceptobius | 6 | head | 1205.616500 |
34 | sceptobius | 6 | thorax | 588.067617 |
35 | sceptobius | 7 | abdomen | 339.701993 |
36 | sceptobius | 7 | antenna_left | 1531.338615 |
37 | sceptobius | 7 | antenna_right | 2389.643450 |
38 | sceptobius | 7 | head | 420.652691 |
39 | sceptobius | 7 | thorax | 238.159884 |
40 | sceptobius | 8 | abdomen | 500.156206 |
41 | sceptobius | 8 | antenna_left | 2853.945585 |
42 | sceptobius | 8 | antenna_right | 2777.918093 |
43 | sceptobius | 8 | head | 1085.719023 |
44 | sceptobius | 8 | thorax | 703.824390 |
45 | sceptobius | 9 | abdomen | 357.735190 |
46 | sceptobius | 9 | antenna_left | 2382.851423 |
47 | sceptobius | 9 | antenna_right | 2488.580833 |
48 | sceptobius | 9 | head | 887.401463 |
49 | sceptobius | 9 | thorax | 546.723268 |
50 | sceptobius | 10 | abdomen | 661.166480 |
51 | sceptobius | 10 | antenna_left | 2693.747130 |
52 | sceptobius | 10 | antenna_right | 2614.627036 |
53 | sceptobius | 10 | head | 1181.083980 |
54 | sceptobius | 10 | thorax | 826.188143 |
55 | sceptobius | 11 | abdomen | 504.217197 |
56 | sceptobius | 11 | antenna_left | 2003.843440 |
57 | sceptobius | 11 | antenna_right | 2048.341855 |
58 | sceptobius | 11 | head | 726.746776 |
59 | sceptobius | 11 | thorax | 514.148107 |
To visualize this summary, we can make a strip plot, using the thorax as the body part. Thinking ahead, when we make it, we will include a tap tool, which enables selection of a glyph by clicking on it. We will also include a hover tool so we can see which ant/beetle treatment each glyph represents.
[4]:
strip = iqplot.strip(
df_dist.loc[df_dist["bodypart"]=="thorax", :],
q="distance traveled (cm)",
cats="beetle_treatment",
q_axis="y",
palette=["#7570b3", "#1b9e77"],
y_axis_label="distance traveled (cm)",
frame_height=300,
frame_width=150,
tools="pan,box_zoom,wheel_zoom,reset,tap,save",
tooltips=[("ant ID", "@ID"), ("beetle", "@beetle_treatment")],
)
# Always start at zero
strip.y_range.start = 0
bokeh.io.show(strip)
This summary plot exposes, for example, that ant 3 is highly active (you can see itβs ant 3 by hovering over the top point), and ant 11 is lethargic. In our dashboard, we would like to include this summary plot and enable clicking on the glyphs and automatically update the displayed plots to be for the selected ant/beetle treatment.
To achieve this goal, letβs first rebuild the app from the previous section.
Building the dashboardΒΆ
We will use exactly the same code as in the previous part of this lesson, only with slight changes in the spacing of the layout to allow for the addition of the above summary plot. Get ready for a large code cell!
[5]:
def extract_sub_df(df, ant_ID, bodypart, time_range):
"""Extract sub data frame for body part of
one ant over a time range."""
inds = (
(df["ID"] == ant_ID)
& (df["bodypart"] == bodypart)
& (df["time (sec)"] >= time_range[0])
& (df["time (sec)"] <= time_range[1])
)
return df.loc[inds, :]
def plot_traj(df, ant_ID, bodypart, time_range=(-np.inf, np.inf)):
"""Plot the trajectory of a single ant over time."""
sub_df = extract_sub_df(df, ant_ID, bodypart, time_range)
return hv.Path(
data=sub_df,
kdims=["x (cm)", "y (cm)"],
vdims=["time (sec)"]
).opts(
color="time (sec)",
colorbar=True,
colorbar_opts={"title": "time (sec)"},
frame_height=200,
frame_width=200,
xlim=(0, 20),
ylim=(0, 20)
)
def plot_xy(df, ant_ID, bodypart, time_range=(-np.inf, np.inf)):
"""Plot the x and y positions of a beetle over time."""
sub_df = extract_sub_df(df, ant_ID, bodypart, time_range)
x_plot = (
hv.Curve(data=sub_df, kdims=["time (sec)"], vdims=["x (cm)"], label="x")
.opts(
frame_height=100,
frame_width=500,
color=bebi103.hv.default_categorical_cmap[0],
)
.opts(ylabel="position (cm)")
)
y_plot = (
hv.Curve(data=sub_df, kdims=["time (sec)"], vdims=["y (cm)"], label="y")
.opts(
frame_height=100,
frame_width=500,
color=bebi103.hv.default_categorical_cmap[1],
)
.opts(ylabel="position (cm)")
)
return (x_plot * y_plot).opts(legend_offset=(10, 20))
def plot_distance_traveled(df, ant_ID, bodypart, time_range=(-np.inf, np.inf)):
"""Make a plot of distance traveled."""
sub_df = extract_sub_df(df, ant_ID, bodypart, time_range)
return hv.Curve(
data=sub_df,
kdims=['time (sec)'],
vdims=['distance traveled (cm)', 'ID', 'bodypart']
).opts(
frame_height=200,
frame_width=200
)
# Create bodypart selector drop-down list
bodypart_selector = pn.widgets.Select(
name="body part", options=sorted(list(df["bodypart"].unique())), value="thorax"
)
# Create beetle treatment selector drop-down list
beetle_selector = pn.widgets.Select(
name="beetle treatment",
options=sorted(list(df["beetle_treatment"].unique())),
value="dalotia",
)
# Create ant ID selector drop-down list
ant_ID_selector = pn.widgets.Select(
name="Ant ID",
options=sorted(
list(df.loc[df["beetle_treatment"] == df['beetle_treatment'].unique()[0], "ID"].unique())
),
)
# Ranges of times for convenience
start = df["time (sec)"].min()
end = df["time (sec)"].max()
# Create throttled time interval range slider
time_interval_slider = pn.widgets.RangeSlider(
start=start,
end=end,
step=1,
value=(df["time (sec)"].min(), df["time (sec)"].max()),
name="time (sec)",
value_throttled=(start, end),
)
# Create helper function to update ant_ID_selector options
# depending on selected beetle treatment
@pn.depends(beetle_selector.param.value, watch=True)
def update_ant_ID_selector(beetle):
inds = df["beetle_treatment"] == beetle
options = sorted(list(df.loc[inds, "ID"].unique()))
ant_ID_selector.options = options
# Create plotting function
@pn.depends(
ant_ID_selector.param.value,
bodypart_selector.param.value,
time_interval_slider.param.value_throttled,
)
def plot_traj_interactive(ant_ID, bodypart, time_range):
return plot_traj(df, ant_ID, bodypart, time_range)
# Create plotting function for x and y vs time
@pn.depends(
ant_ID_selector.param.value,
bodypart_selector.param.value,
time_interval_slider.param.value_throttled,
)
def plot_xy_interactive(ant_ID, bodypart, time_range):
return plot_xy(df, ant_ID, bodypart, time_range)
@pn.depends(
ant_ID_selector.param.value,
bodypart_selector.param.value,
time_interval_slider.param.value_throttled,
)
def plot_distance_traveled_interactive(ant_ID, bodypart, time_range):
return plot_distance_traveled(df, ant_ID, bodypart, time_range)
widgets = pn.Column(
time_interval_slider,
pn.Spacer(height=10),
beetle_selector,
pn.Spacer(height=10),
pn.Row(ant_ID_selector, bodypart_selector, width=300),
width=300,
)
We have made and connected all of the plots and widgets (but have not rendered them). Whenever the ant ID, body part, or time interval selection changes, the plots will update.
Our task now is the add the summary plot. It should respond to the body part widget so that the plot updates with the appropriate widget. So, letβs write a properly decorated function to do that. We will not regenerate the whole plot, but rather update its data source. To extract the data source from a Bokeh plot, we need to dig in to its glyph renderers. If the plot is called p
, its ColumnDataSource
is p.renderers[i].data_source
, where i
is the index of the set of glyphs we are
considering. For strip plots generated by iqplot, there is only a single data source, so i
is always 0
.
The update function takes as an argument a Panel Event
object (described in the docs) that has the attribute new
, which is the new value of the widget. We then set up a watcher so that the update function gets triggered whenever the body part selector widget is changed.
[6]:
def update_strip(event):
# Update data source
strip.renderers[0].data_source.data["distance traveled (cm)"] = df_dist.loc[
df_dist["bodypart"] == event.new, "distance traveled (cm)"
].values
watcher = bodypart_selector.param.watch(update_strip, 'value', onlychanged=True)
Now that we have the plot set up, we can write a callback for when data are selected. The callback must take three arguments, attr
, old
, and new
, which refer to the index of the data point that is collected.
[7]:
def select_ant(attr, old, new):
"""Update widgets for selection on strip plot."""
# Extract data source
source = strip.renderers[0].data_source
# Use try block in case no data are selected (then pass)
try:
# Get index of selected glyph
ind = new[0]
# Set widget values
beetle_selector.value = source.data["beetle_treatment"][ind]
ant_ID_selector.value = source.data["ID"][ind]
except:
pass
Now that the callback is defined, we need to make the data source get updated whenever we change selection. We do this with the selected.on_change()
method of a ColumnDataSource
.
[8]:
strip.renderers[0].data_source.selected.on_change("indices", select_ant)
All the pieces are now in place! Letβs lay it out!
[9]:
row1 = pn.Row(plot_traj_interactive, pn.Spacer(width=20), plot_distance_traveled_interactive)
row2 = pn.Row(plot_xy_interactive)
col1 = pn.Column(pn.Spacer(height=25), row1, pn.Spacer(height=35), row2)
col2 = pn.Column(widgets, pn.Spacer(height=15), strip)
dashboard = pn.Row(col1, pn.Spacer(width=30), col2)
We now have our dashboard, and we can take a look at it with a code cell
dashboard
But before we doβ¦.
Deploying a dashboard on a stand-alone browser tabΒΆ
Panel ingeniously lets you move your dashboard from prototypes in a notebook to its own stand-alone app in a separate tab in your browser (you can read the docs about that). All you need to do is put .servable()
behind a Panel object in your notebook. You can then serve the dashboard by entering panel serve --show name_of_notebook.ipynb
on the command line.
I invite you to download this notebook, which is named selecting_data_and_deploying.ipynb
and serve it up using
panel serve --show selecting_data_and_deploying.ipynb
You will see the dashboard on its own tab as you see it below, because I include the code cell below. (You do not need to worry about the data set, since I set this notebook up to always download it from the internet.)
[10]:
dashboard.servable()
[10]:
This new layout affords us much more rapid exploration of the data. Using a clickable plot with summary statistics that then updates more detailed plots is a very power exploratory method. I use it in most dashboards I build.
β¦And, you can share this dashboard with your colleagues by sending them the Jupyter notebook. If they are not interested in the logic of how you built the dashboard which you naturally expertly explain in your markdown cells, or in the guts of the code, they can just serve it to themselves from the command line and explore the whole data set with the dashboard. This is truly excellent.
Computing environmentΒΆ
[11]:
%load_ext watermark
%watermark -v -p numpy,scipy,pandas,bokeh,holoviews,panel,iqplot,bebi103,jupyterlab
CPython 3.8.5
IPython 7.18.1
numpy 1.19.1
scipy 1.5.0
pandas 1.1.3
bokeh 2.2.3
holoviews 1.13.4
panel 0.9.7
iqplot 0.1.6
bebi103 0.1.1
jupyterlab 2.2.6