Introduction to high-level plotting with HoloViews

Data set download


[1]:
import numpy as np
import scipy.special
import pandas as pd

import bokeh.io
import holoviews as hv

import bebi103

bokeh.io.output_notebook()
hv.extension('bokeh')
Loading BokehJS ...

Note: You need to be sure to update the bebi103 package to work with this module. To do so, do the following on the command line.

pip install --upgrade bebi103

You will also likely get warnings when importing the bebi103 package, e.g., related to Stan not being installed. You can ignore these warnings unless you get an exception.


Introduction to HoloViews

HoloViews is a high-level plotting library that is part of the HoloViz ecosystem. It allows specification of plots, and is agnostic about what is used to render them. We will use Bokeh as our renderer.

To set this up, we import HoloViews (as hv) and then set the Holoviews extension to be Bokeh using hv.extension('bokeh') at the top of the notebook.

Main ideas behind HoloViews

Imagine you have a tidy data set (and HoloViews really only works with tidy data sets). It is already logically organized; each row is an observation and each column a variable. Let us think for a moment conceptually (that is, not in terms of steps of coding) about how we might make a scatter plot from a tidy data frame. We need to (obviously) first decide that we want to make a scatter plot, i.e., we specify what kind of graphic element we want to convert our data set into. Then, we need to annotate the columns of the data frame. That is, we need to annotate which column will determine the x-coordinate of the glyphs in the scatter plot and which will determine the y-coordinate of the glyphs. After we have made these decisions, that is, what kind of graphic element we want to produce and what columns give the x-coordinates and what gives the y-coordinates, the fundamental plot is complete. Everything else is visual styling.

The philosophy of HoloViews, right on the front of the webpage, is “Stop plotting your data—annotate your data and let it visualize itself.” With HoloViews, you add minimal annotations to your (tidy; must be tidy!) data to enable visualization. You can then later stylize the visualization, but the annotation is sufficient to specify the plot. Specifically, the annotations you need are:

  1. What kind of plotting element are you making (e.g., scatter, box-and-whisker, heat map, etc.).

  2. What columns specify the dimensions of the data, needed to set up axes.

Once you make those annotations, HoloViews can take care of the rendering, using either Matplotlib, Bokeh, or Plotly. The main idea is that HoloViews objects are conceptual, agnostic to the particulars of rendering. You can stylize the rending if you like, but the fundamentals of the plotting object are already set by the annotation.

Importing HoloViews and choosing a renderer

HoloViews is imported as hv, which we have done in the cell at the top of this notebook. Because HoloViews is agnostic to the ultimate renderer, we need to specify an extension, which we did above by executing hv.extension('bokeh') in the first code cell of this notebook. Our plots will now be rendered using Bokeh.

Note that you must install the appropriate JupyterLab extension to view HoloViews plots. You can do this with

jupyter labextension install @pyviz/jupyterlab_pyviz

and you already have if you completed Lesson 0.

An example: A scatter plot

As an example of use of HoloViews, we will again use the facial recognition data set. We will load it in make the same adjustments as before, converting the 'gender' column to fully spelled-out genders and adding a 'sleeper' column.

[2]:
df = pd.read_csv('../data/gfmt_sleep.csv', na_values='*')
df['insomnia'] = df['sci'] <= 16
df['sleeper'] = df['insomnia'].apply(lambda x: 'insomniac' if x else 'normal')
df['gender'] = df['gender'].apply(lambda x: 'female' if x == 'f' else 'male')

df.head()
[2]:
participant number gender age correct hit percentage correct reject percentage percent correct confidence when correct hit confidence incorrect hit confidence correct reject confidence incorrect reject confidence when correct confidence when incorrect sci psqi ess insomnia sleeper
0 8 female 39 65 80 72.5 91.0 90.0 93.0 83.5 93.0 90.0 9 13 2 True insomniac
1 16 male 42 90 90 90.0 75.5 55.5 70.5 50.0 75.0 50.0 4 11 7 True insomniac
2 18 female 31 90 95 92.5 89.5 90.0 86.0 81.0 89.0 88.0 10 9 3 True insomniac
3 22 female 35 100 75 87.5 89.5 NaN 71.0 80.0 88.0 80.0 13 8 20 True insomniac
4 27 female 74 60 65 62.5 68.5 49.0 61.0 49.0 65.0 49.0 13 9 12 True insomniac

We will now make a plot and explain how the syntax relates to the ideas behind annotating data sets. We will make a simple scatter plot of confidence in their facial recognition when incorrect versus confidence when correct.

[3]:
hv.Points(
    data=df,
    kdims=['confidence when correct', 'confidence when incorrect'],
    vdims=['sleeper'],
)
[3]:

Specification of the element type

We used hv.Points to invoke an element of visualization. An element is just a way of converting the tabular nature of the data to a graphical representation, in this case a scatter plot of points. That is, we want to make a plot where each glyph lies in a two-dimensional plot and the values of both the x- and y-axes are independent. (This is contrasted with hv.Scatter in which the x-coordinate is the independent variable and the y-coordinate is dependent on x; hv.Points is more appropriate here.)

The available element types may be found in the HoloViews reference gallery.

Specification of dimensions

There are two types of dimensions, key dimensions and value dimensions, specified with the kdims and vdims arguments, respectively. You can think of these like key-value pairs in dictionaries (where you can have multidimensional keys). Key dimensions are indexing dimensions, which say where on the graphic the data in a row will reside. The value dimensions give information about each data point. In the simple plot above, the key dimensions are the confidences when correct and when incorrect. Those columns determined where the glyphs were placed.

We additionally had a value dimension, specified by vdims, which has additional information associated with each data point. This information was not used in the above plot, but we will put it to use momentarily.

Stylizing plots

After a plotting Element is specified, we can stylize it using the hv.opts functionality. To investigate what styling options are available for each kind of plotting Element, you can enter, for example

hv.help(hv.Points)

and you will get detailed information on what options are available for stylizing hv.Points elements.

I find the HoloViews defaults not very pleasing. If you agree and want to define defaults for an entire document, you may do so using hv.opts.defaults(). I have made some defaults that I find more pleasing that are available in the bebi103.hv.set_defaults() function. Let’s set those defaults (which will be active for the rest of the notebook), and see how our plot looks. Note that the defaults must be set after the HoloViews extension has been set, e.g., with ``hv.extension(‘bokeh’)``. This is because the available options vary depending on which extension you are using (the most common ones being Bokeh and Matplotlib).

[4]:
bebi103.hv.set_defaults()

hv.Points(
    data=df,
    kdims=['confidence when correct', 'confidence when incorrect'],
    vdims=['sleeper'],
)
[4]:

Grouping by value dimensions

Recall that we have an unused value dimension in the element we created. We would naturally like to demarcate glyphs corresponding to normal sleepers or insomniacs. To do this, we can do a groupby operation on the Element. That’s right, we can do groupby operations on graphical elements! After all, they are conceptually just annotated tidy data frames.

[5]:
hv.Points(
    data=df,
    kdims=['confidence when correct', 'confidence when incorrect'],
    vdims=['sleeper'],
).groupby(
    'sleeper'
)
[5]:

We now have a pull down menu to the right of the plot where we can select the species we want and the glyphs on the plot will adjust accordingly. By default, after applying the groupby operation, HoloViews gives us a HoloMap object. The column we used to group by is now selectable through a graphical interface (a pull-down menu).

We may instead with to group by species and lay the plots out next to each other, creating a layout. We can use the layout() method do to this. In the plot below, I am using the opts() method to set the height and width of the plots so they fit nicely; more on that soon.

[6]:
hv.Points(
    data=df,
    kdims=['confidence when correct', 'confidence when incorrect'],
    vdims=['sleeper'],
).groupby(
    'sleeper'
).opts(
    height=250,
    width=300
).layout(
)
[6]:

Finally, we may wish to overlay the plots.

[7]:
hv.Points(
    data=df,
    kdims=['confidence when correct', 'confidence when incorrect'],
    vdims=['sleeper'],
).groupby(
    'sleeper'
).overlay(
)
[7]:

HoloViews was kind enough to automatically provide us with a legend! (Try clicking on the legend symbols.)

Further stylizing

As we briefly saw, we can style plots using the .opts() method of a plotting element. Different plotting elements have different properties that can be set with .opts(), and you can learn what they are by doing, e.g.,

hv.opts.Points?

for a Points plotting element. The options are many!

As an example of how to use the .opts() method to stylize a plot, we can use .opts() to add tooltips where we can hover to get additional information from the vdims.

[8]:
hv.Points(
    data=df,
    kdims=['confidence when correct', 'confidence when incorrect'],
    vdims=['gender', 'sleeper', 'age'],
).groupby(
    'sleeper'
).opts(
    tools=['hover']
).overlay(
)
[8]:

Note that any information we want included in the hover must be specified in the kdims or vdims.

As a final example of constructing this plot, let’s consider set up a plot where confidence when incorrect is plotted against confidence when correct for each gender separately, with the points colored by the sleeper type. (We have to select show_legend=False because of a bug in laying out HoloMaps with legends.)

[9]:
hv.Points(
    data=df,
    kdims=['confidence when correct', 'confidence when incorrect'],
    vdims=['gender', 'sleeper', 'age'],
).groupby(
    ['gender', 'sleeper']
).opts(
    tools=['hover'],
    show_legend=False,
).overlay(
    'sleeper'
)
[9]:

Extracting the Bokeh plotting object

After making and displaying a HoloViews plot, we might want to get the Bokeh figure. We can extract that using hv.render().

[10]:
hv_fig = hv.Points(
    data=df,
    kdims=['confidence when correct', 'confidence when incorrect'],
    vdims=['sleeper', 'gender', 'age'],
).groupby(
    'sleeper'
).opts(
    tools=['hover'],
).overlay(
    'sleeper'
).opts(
    width=500
)

# Take out the Bokeh object
p = hv.render(hv_fig)

# Display using Bokeh
bokeh.io.show(p)

One advantage of doing this is that we can now drop into the lower-level plotting package (Bokeh) to update the plot as we see fit. For example, we may wish to put a title in the legend.

[11]:
p.legend.title = 'type of sleeper'

bokeh.io.show(p)

Computing environment

[12]:
%load_ext watermark
%watermark -v -p numpy,scipy,pandas,bokeh,holoviews,datashader,jupyterlab
CPython 3.7.4
IPython 7.8.0

numpy 1.17.2
scipy 1.3.1
pandas 0.24.2
bokeh 1.3.4
holoviews 1.12.5
datashader 0.7.0
jupyterlab 1.1.4