This tutorial was generated from an Jupyter notebook. You can download the notebook here.
# Our numerical workhorses
import numpy as np
import pandas as pd
# Import Bokeh modules for interactive plotting
import bokeh.charts
import bokeh.charts.utils
import bokeh.io
import bokeh.models
import bokeh.palettes
import bokeh.plotting
# Display graphics in this notebook
bokeh.io.output_notebook()
In this tutorial, we will explore browser-based interactive plotting using Bokeh. It is important that you are using the latest version of Bokeh, v. 0.10.0. After importing, verify that this is the case.
bokeh.__version__
If we do not have the most recent version, you can update it:
conda update bokeh
Why is it so important to use the most recent version? Bokeh is currently in very active development. It is certainly not feature-full yet, and there are lots and lots of features slated to be added.
For browser-based interactive data visualization, D3.js is the most widely used and feature-full. However, it is a lower level package, and requires writing JavaScript. Bokeh, like Shiny(http://shiny.rstudio.com) for R, and others, is an attempt to bring the type of functionality D3 offers, using high level languages like Python. In other words, the goal is that you can achieve browser-based interactive data visualizations with few lines of code. Bokeh has the additional goal of being able to handle big data sets, including streaming data.
I think the interactive part is easy to answer. The most you an interact with your data, particularly during the exploratory phase of data analysis, the more you can learn. When doing exploratory data analysis, we typically make lots and lots of plots to see patterns. If we can expedite this process, we can be more efficient and effective in our analysis.
Why browser-based? There are two simple answers to this. First, everyone has one, and they are relatively standardized. This makes your graphics very portable. Second, there are lots of tools for efficiently rendering graphics in browsers. Bokeh uses HTML5 canvas elements to accomplish this. These tools are mature and stable, thereby making backend rendering of the graphics easy.
We will use the tidy DataFrame
s from the first couple weeks of class as we explore Bokeh's features and do some interactive visualizations. So, let's load in the DataFrame
s now.
# The frog data from tutorial 1a
df_frog = pd.read_csv('../data/frog_tongue_adhesion.csv', comment='#')
# The MT catastrophe data
df_mt = pd.read_csv(
'../data/gardner_et_al/gardner_et_al_2011_time_to_catastrophe_dic.csv',
comment='#')
# These were generated in tutorial 2a
df_fish = pd.read_csv('../data/130315_10_minute_intervals.csv')
Before moving on, we'll go ahead and tidy the MT catastrophe DataFrame
.
# Tidy MT_catastrophe DataFrame
df_mt.columns = ['labeled', 'unlabeled']
df_mt = pd.melt(df_mt, var_name='fluor', value_name='tau').dropna()
Perhaps the easiest way to get started with Bokeh is to use its high-level charts. These allow for rapid plotting of data coming from Pandas DataFrame
s, much like the plotting utilities in Pandas itself.
We'll start with a simple line plot of zebrafish sleep data.
# Pull out fish record
df_fish2 = df_fish[df_fish['fish']=='FISH2']
# Use Bokeh chart to make plot
p = bokeh.charts.Line(df_fish2, x='zeit', y='activity', color='firebrick')
# Display it
bokeh.io.show(p)
There are many things to note here. First, and most obviously, you can play with the various tools. You can select the tools in the upper right corner of the plot. Hovering over an icon will reveal what the tool does.
When we instantiate the bokeh.charts.Line
object, we plot is returned, which we assigned to variable p
. We can further modify/add attributes to this object. Importantly, the bokher.io.show()
function displays the object. We have specified that the graphics will be shown in the current notebook with our import statements. We can also export the plot as its own standalone HTML document. We won't do it here, but simply put
bokeh.plotting.output_file('filename.html')
before the bokeh.io.show(p)
function call.
Note also that we chose a color of "firebrick
." We can choose any of the named CSS colors, or specify a hexadecimal color. Notice also that the axes were automatically labeled with the column headings of the DataFrame
. We can specify the axis labels with keyword arguments as well.
# Use Bokeh chart to make plot
p = bokeh.charts.Line(df_fish2, x='zeit', y='activity', color='firebrick',
xlabel='time (h)', ylabel='sec of activity / 10 min')
# Display it
bokeh.io.show(p)
We can also put multiple lines on the same plot.
# Select three fish to plot
df_fish_multi = df_fish[df_fish['fish'].isin(['FISH11', 'FISH12', 'FISH23'])]
# Use Bokeh chart to make plot
p = bokeh.charts.Line(df_fish_multi, x='zeit', y='activity', color='fish',
legend="top_left")
# Display it
bokeh.io.show(p)
Bokeh's high-level charts interface also allows for easy construction of box plots. As an example, we'll make box plots of the striking force of the frog tongues.
# Use Bokeh chart to make plot
p = bokeh.charts.BoxPlot(df_frog, values='impact force (mN)', label='ID',
color='ID', xlabel='frog', ylabel='impact force (mN)')
# Display it
bokeh.io.show(p)
The problem with bokeh.charts
's way of doing box plots is that they choose the convention that the whisker always go $\pm 1.5\,\text{IQR}$, even when there are no outliers. I.e., the whiskers can extend past the actual measurement. I prefer to have the whiskers show the extent of the data. So, I wrote my own box plot function (below) to do the task more to my specification. This highlights a disadvantage of using the higher level tools; you have less control. Of course, sacrificing control to have a one-liner is often worth it.
We can also make scatter plots. As a useful feature, we can color the points in the scatter plot according to values in the DataFrame
.
# Use Bokeh chart to make plot
p = bokeh.charts.Scatter(df_frog, x='impact force (mN)', y='adhesive force (mN)',
color='ID', ylabel='adhesive force (mN)',
xlabel='impact force (mN)', legend='top_right')
# Display it
bokeh.io.show(p)
And, of course, we can do histograms. We'll use the microtubule catastrophe data to do that.
# Use Bokeh chart to make plot
p = bokeh.charts.Histogram(df_mt, values='tau', color='fluor',
bins=20, legend='top_right')
# Display it
bokeh.io.show(p)
plotting
interface¶Bokeh's charts
interface is useful for quickly making plots from DataFrame
s, but the lower level bokeh.plotting
interface allows more control over the plots. For example, we'll use my favorite background fill with white grid for our plot.
# Set up the figure (this is like a canvas you will paint on)
p = bokeh.plotting.figure(background_fill='#DFDFE5', plot_width=650,
plot_height=450)
p.xgrid.grid_line_color = 'white'
p.ygrid.grid_line_color = 'white'
p.xaxis.axis_label = 'Impact force (mN)'
p.yaxis.axis_label = 'Adhesive force (mN)'
# Specify the glyphs
p.circle(df_frog['impact force (mN)'], df_frog['adhesive force (mN)'], size=7,
alpha=0.5)
bokeh.io.show(p)
We can also add multiple glyphs to the same plot.
p = bokeh.plotting.figure(background_fill='#DFDFE5', plot_width=650,
plot_height=450)
p.xgrid.grid_line_color = 'white'
p.ygrid.grid_line_color = 'white'
p.xaxis.axis_label = 'Ï„ (s)'
p.yaxis.axis_label = 'ECDF'
p.legend.orientation = 'lower_right'
# Build ECDFs
ecdf_lab_x = np.sort(df_mt[df_mt.fluor=='labeled']['tau'].values)
ecdf_lab_y = np.arange(1, len(ecdf_lab_x)+1) / len(ecdf_lab_x)
ecdf_un_x = np.sort(df_mt[df_mt.fluor=='unlabeled']['tau'].values)
ecdf_un_y = np.arange(1, len(ecdf_un_x)+1) / len(ecdf_un_x)
# Specify the glyphs
p.circle(ecdf_lab_x, ecdf_lab_y, size=7, alpha=0.5, legend='labeled',
color='dodgerblue')
p.circle(ecdf_un_x, ecdf_un_y, size=7, alpha=0.5, legend='unlabeled',
color='tomato')
p.legend.orientation = 'bottom_right'
bokeh.io.show(p)
We can also exercise this increased control with the fish activity data. First, we'll write a small function to get the starting and ending points of nights.
def nights(df):
"""
Takes light series from a single fish and gives the start and end of nights.
"""
lefts = df.zeit[np.where(np.diff(df.light.astype(int)) == -1)[0]].values
rights = df.zeit[np.where(np.diff(df.light.astype(int)) == 1)[0]].values
return lefts, rights
Now that we have this function, we can proceed to make our nicely shaded plot.
# Create figure
p = bokeh.plotting.figure(background_fill='#DFDFE5', plot_width=650,
plot_height=450)
p.xgrid.grid_line_color = 'white'
p.ygrid.grid_line_color = 'white'
p.xaxis.axis_label ='time (hours)'
p.yaxis.axis_label ='sec. of activity / 10 min.'
# Specify colors
colors = ['dodgerblue', 'tomato', 'indigo']
# Populate glyphs
for i, fish in enumerate(['FISH11', 'FISH12', 'FISH23']):
source = bokeh.models.ColumnDataSource(df_fish[df_fish['fish']==fish])
p.line(x='zeit', y='activity', line_width=0.5, alpha=0.75, source=source,
color=colors[i])
# Determine when nights start and end
lefts, rights = nights(df_fish[df_fish.fish=='FISH1'])
# Make shaded boxes for nights
night_boxes = []
for i, left in enumerate(lefts):
night_boxes.append(
bokeh.models.BoxAnnotation(plot=p, left=left, right=rights[i],
fill_alpha=0.3, fill_color='gray'))
p.renderers.extend(night_boxes)
bokeh.io.show(p)
As I mentioned before, I would prefer to do box plots differently than the Bokeh default. With the added control of the bokeh.plotting
module, I can do that.
def box_plot(df, vals, label, ylabel=None):
"""
Make a Bokeh box plot from a tidy DataFrame.
Parameters
----------
df : tidy Pandas DataFrame
DataFrame to be used for plotting
vals : hashable object
Column of DataFrame containing data to be used.
label : hashable object
Column of DataFrame use to categorize.
ylabel : str, default None
Text for y-axis label
Returns
-------
output : Bokeh plotting object
Bokeh plotting object that can be rendered with
bokeh.io.show()
Notes
-----
.. Based largely on example code found here:
https://github.com/bokeh/bokeh/blob/master/examples/plotting/file/boxplot.py
"""
# Get the categories
cats = list(df[label].unique())
# Group Data frame
df_gb = df.groupby(label)
# Compute quartiles for each group
q1 = df_gb[vals].quantile(q=0.25)
q2 = df_gb[vals].quantile(q=0.5)
q3 = df_gb[vals].quantile(q=0.75)
# Compute interquartile region and upper and lower bounds for outliers
iqr = q3 - q1
upper_cutoff = q3 + 1.5*iqr
lower_cutoff = q1 - 1.5*iqr
# Find the outliers for each category
def outliers(group):
cat = group.name
outlier_inds = (group[vals] > upper_cutoff[cat]) \
| (group[vals] < lower_cutoff[cat])
return group[vals][outlier_inds]
# Apply outlier finder
out = df_gb.apply(outliers).dropna()
# Points of outliers for plotting
outx = []
outy = []
for cat in cats:
# only add outliers if they exist
if not out[cat].empty:
for value in out[cat]:
outx.append(cat)
outy.append(value)
# If outliers, shrink whiskers to smallest and largest non-outlier
qmin = df_gb[vals].min()
qmax = df_gb[vals].max()
upper = [min([x,y]) for (x,y) in zip(qmax, upper_cutoff)]
lower = [max([x,y]) for (x,y) in zip(qmin, lower_cutoff)]
# Build figure
p = bokeh.plotting.figure(background_fill='#DFDFE5', plot_width=650,
plot_height=450, x_range=cats)
p.ygrid.grid_line_color = 'white'
p.xgrid.grid_line_color = None
p.ygrid.grid_line_width = 2
p.yaxis.axis_label = ylabel
# stems
p.segment(cats, upper, cats, q3, line_width=2, line_color="black")
p.segment(cats, lower, cats, q1, line_width=2, line_color="black")
# boxes
p.rect(cats, (q3 + q1)/2, 0.5, q3 - q1, fill_color="mediumpurple",
alpha=0.7, line_width=2, line_color="black")
# median (almost-0 height rects simpler than segments)
p.rect(cats, q2, 0.5, 0.01, line_color="black", line_width=2)
# whiskers (almost-0 height rects simpler than segments)
p.rect(cats, lower, 0.2, 0.01, line_color="black")
p.rect(cats, upper, 0.2, 0.01, line_color="black")
# outliers
p.circle(outx, outy, size=6, color="black")
return p
p = box_plot(df_frog, 'impact force (mN)', 'ID', ylabel='Impact force (mN)')
bokeh.io.show(p)
Using the bokeh.plotting
interface, we can also specify which tools we want available. For example, we can add a HoverTool
that will give information about each data point if we hover the mouse over it.
# Eliminate spaces from column headings to allow tooltip to work
df_frog = df_frog.rename(columns={'impact force (mN)': 'impf',
'adhesive force (mN)': 'adhf'})
# Specify data source
source = bokeh.models.ColumnDataSource(df_frog)
# What pops up on hover?
tooltips = [('frog', '@ID'),
('imp', '@impf'),
('adh', '@adhf')]
# Make the hover tool
hover = bokeh.models.HoverTool(tooltips=tooltips)
# Create figure
p = bokeh.plotting.figure(background_fill='#DFDFE5', plot_width=650,
plot_height=450)
p.xgrid.grid_line_color = 'white'
p.ygrid.grid_line_color = 'white'
# Add the hover tool
p.add_tools(hover)
# Populate glyphs
p.circle(x='adhf', y='impf', size=7, alpha=0.5, source=source)
bokeh.io.show(p)
Bokeh also has the wonderful capability of linking subplots. The key here is to specify that the plots have the same ranges of the $x$ and $y$ variables. To enable linked selections, they also need to have their data come from the same source. We can construct a ColumnDataSource
from a Pandas DataFrame
. We need to untidy our data first, since the Bokeh ColumnDataSource
object expects columnar data to plot.
# Unmelt the DataFrame
df_fish_unmelt = df_fish.pivot_table(index=['zeit', 'light', 'day', 'CLOCK'],
columns='fish', values='activity').reset_index()
# Creat data source
source = bokeh.plotting.ColumnDataSource(df_fish_unmelt)
# Determine when nights start and end
lefts, rights = nights(df_fish[df_fish.fish=='FISH1'])
# Create figures
ps = [bokeh.plotting.figure(background_fill='#DFDFE5', plot_width=650,
plot_height=250) for i in range(3)]
# Link ranges (enable linked panning/zooming)
for i in (1, 2):
ps[1].x_range = ps[0].x_range
ps[2].x_range = ps[0].x_range
ps[1].y_range = ps[0].y_range
ps[2].y_range = ps[0].y_range
# Label the axes
for i in range(3):
ps[i].yaxis.axis_label = 'sec of activity / 10 min'
ps[i].xaxis.axis_label = 'time (h)'
# Specify colors
colors = ['dodgerblue', 'tomato', 'indigo']
# Stylize
for i, _ in enumerate(ps):
ps[i].xgrid.grid_line_color='white'
ps[i].ygrid.grid_line_color='white'
# Populate glyphs
for i, fish in enumerate(['FISH11', 'FISH12', 'FISH23']):
# Put in line
ps[i].line(x='zeit', y=fish, line_width=1, source=source,
color=colors[i])
# Label with title
ps[i].title = fish
# Make shaded boxes for nights
night_boxes = []
for j, left in enumerate(lefts):
night_boxes.append(
bokeh.models.BoxAnnotation(plot=ps[i], left=left, right=rights[j],
fill_alpha=0.3, fill_color='gray'))
ps[i].renderers.extend(night_boxes)
my_plot = bokeh.plotting.vplot(*tuple(ps))
bokeh.io.show(my_plot)