Styling data frames

Data set download


[2]:
import pandas as pd

It is sometimes useful to highlight features in a data frame when viewing them. (Note that this is generally far less useful than making informative plots, which we will come to shortly.) Pandas offers some convenient ways to style the display of a data frame.

To demonstrate, we will again use a data set from Beattie, et al. containing results from a study the effects of sleep quality on performance in the Glasgow Facial Matching Test (GMFT).

[3]:
df = pd.read_csv(os.path.join(data_path, 'gfmt_sleep.csv'), na_values='*')

As our first example demonstrating styling, let’s say we wanted to highlight rows corresponding to women who scored at or above 75% correct. We can write a function that will take as an argument a row of the data frame, check the value in the 'gender' and 'percent correct' columns, and then specify a row color of gray or green accordingly. We then use df.style.apply() with the axis=1 kwarg to apply that function to each row.

[4]:
def highlight_high_scoring_females(s):
    if s["gender"] == "f" and s["percent correct"] >= 75:
        return ["background-color: #7fc97f"] * len(s)
    else:
        return ["background-color: lightgray"] * len(s)

df.head(10).style.apply(highlight_high_scoring_females, axis=1)
[4]:
  participant number gender age correct hit percentage correct reject percentage percent correct confidence when correct hit confidence incorrect hit confidence correct reject confidence incorrect reject confidence when correct confidence when incorrect sci psqi ess
0 8 f 39 65 80 72.500000 91.000000 90.000000 93.000000 83.500000 93.000000 90.000000 9 13 2
1 16 m 42 90 90 90.000000 75.500000 55.500000 70.500000 50.000000 75.000000 50.000000 4 11 7
2 18 f 31 90 95 92.500000 89.500000 90.000000 86.000000 81.000000 89.000000 88.000000 10 9 3
3 22 f 35 100 75 87.500000 89.500000 nan 71.000000 80.000000 88.000000 80.000000 13 8 20
4 27 f 74 60 65 62.500000 68.500000 49.000000 61.000000 49.000000 65.000000 49.000000 13 9 12
5 28 f 61 80 20 50.000000 71.000000 63.000000 31.000000 72.500000 64.500000 70.500000 15 14 2
6 30 m 32 90 75 82.500000 67.000000 56.500000 66.000000 65.000000 66.000000 64.000000 16 9 3
7 33 m 62 45 90 67.500000 54.000000 37.000000 65.000000 81.500000 62.000000 61.000000 14 9 9
8 34 f 33 80 100 90.000000 70.500000 76.500000 64.500000 nan 68.000000 76.500000 14 12 10
9 35 f 53 100 50 75.000000 74.500000 nan 60.500000 65.000000 71.000000 65.000000 14 8 7

We can be more fancy. Let’s say we want to shade the 'percent correct' column with a bar corresponding to the value in the column. We use the df.style.bar() method to do so. The subset kwarg specifies which columns are to have bars.

[5]:
df.head(10).style.bar(subset=["percent correct"], vmin=0, vmax=100)
[5]:
  participant number gender age correct hit percentage correct reject percentage percent correct confidence when correct hit confidence incorrect hit confidence correct reject confidence incorrect reject confidence when correct confidence when incorrect sci psqi ess
0 8 f 39 65 80 72.500000 91.000000 90.000000 93.000000 83.500000 93.000000 90.000000 9 13 2
1 16 m 42 90 90 90.000000 75.500000 55.500000 70.500000 50.000000 75.000000 50.000000 4 11 7
2 18 f 31 90 95 92.500000 89.500000 90.000000 86.000000 81.000000 89.000000 88.000000 10 9 3
3 22 f 35 100 75 87.500000 89.500000 nan 71.000000 80.000000 88.000000 80.000000 13 8 20
4 27 f 74 60 65 62.500000 68.500000 49.000000 61.000000 49.000000 65.000000 49.000000 13 9 12
5 28 f 61 80 20 50.000000 71.000000 63.000000 31.000000 72.500000 64.500000 70.500000 15 14 2
6 30 m 32 90 75 82.500000 67.000000 56.500000 66.000000 65.000000 66.000000 64.000000 16 9 3
7 33 m 62 45 90 67.500000 54.000000 37.000000 65.000000 81.500000 62.000000 61.000000 14 9 9
8 34 f 33 80 100 90.000000 70.500000 76.500000 64.500000 nan 68.000000 76.500000 14 12 10
9 35 f 53 100 50 75.000000 74.500000 nan 60.500000 65.000000 71.000000 65.000000 14 8 7

Note that I have used the vmin=0 and vmax=100 kwargs to set the base of the bar to be at zero and the maximum to be 100.

Alternatively, I could color the percent correct according to the percent correct.

[6]:
df.head(10).style.background_gradient(subset=["percent correct"], cmap="Reds")
[6]:
  participant number gender age correct hit percentage correct reject percentage percent correct confidence when correct hit confidence incorrect hit confidence correct reject confidence incorrect reject confidence when correct confidence when incorrect sci psqi ess
0 8 f 39 65 80 72.500000 91.000000 90.000000 93.000000 83.500000 93.000000 90.000000 9 13 2
1 16 m 42 90 90 90.000000 75.500000 55.500000 70.500000 50.000000 75.000000 50.000000 4 11 7
2 18 f 31 90 95 92.500000 89.500000 90.000000 86.000000 81.000000 89.000000 88.000000 10 9 3
3 22 f 35 100 75 87.500000 89.500000 nan 71.000000 80.000000 88.000000 80.000000 13 8 20
4 27 f 74 60 65 62.500000 68.500000 49.000000 61.000000 49.000000 65.000000 49.000000 13 9 12
5 28 f 61 80 20 50.000000 71.000000 63.000000 31.000000 72.500000 64.500000 70.500000 15 14 2
6 30 m 32 90 75 82.500000 67.000000 56.500000 66.000000 65.000000 66.000000 64.000000 16 9 3
7 33 m 62 45 90 67.500000 54.000000 37.000000 65.000000 81.500000 62.000000 61.000000 14 9 9
8 34 f 33 80 100 90.000000 70.500000 76.500000 64.500000 nan 68.000000 76.500000 14 12 10
9 35 f 53 100 50 75.000000 74.500000 nan 60.500000 65.000000 71.000000 65.000000 14 8 7

We could have multiple effects together as well.

[7]:
df.head(10).style.bar(
    subset=["percent correct"], vmin=0, vmax=100
).apply(
    highlight_high_scoring_females, axis=1
)
[7]:
  participant number gender age correct hit percentage correct reject percentage percent correct confidence when correct hit confidence incorrect hit confidence correct reject confidence incorrect reject confidence when correct confidence when incorrect sci psqi ess
0 8 f 39 65 80 72.500000 91.000000 90.000000 93.000000 83.500000 93.000000 90.000000 9 13 2
1 16 m 42 90 90 90.000000 75.500000 55.500000 70.500000 50.000000 75.000000 50.000000 4 11 7
2 18 f 31 90 95 92.500000 89.500000 90.000000 86.000000 81.000000 89.000000 88.000000 10 9 3
3 22 f 35 100 75 87.500000 89.500000 nan 71.000000 80.000000 88.000000 80.000000 13 8 20
4 27 f 74 60 65 62.500000 68.500000 49.000000 61.000000 49.000000 65.000000 49.000000 13 9 12
5 28 f 61 80 20 50.000000 71.000000 63.000000 31.000000 72.500000 64.500000 70.500000 15 14 2
6 30 m 32 90 75 82.500000 67.000000 56.500000 66.000000 65.000000 66.000000 64.000000 16 9 3
7 33 m 62 45 90 67.500000 54.000000 37.000000 65.000000 81.500000 62.000000 61.000000 14 9 9
8 34 f 33 80 100 90.000000 70.500000 76.500000 64.500000 nan 68.000000 76.500000 14 12 10
9 35 f 53 100 50 75.000000 74.500000 nan 60.500000 65.000000 71.000000 65.000000 14 8 7

In practice, I almost never use these features because it is almost always better to display results as a plot rather than in tabular form. Still, it can be useful when exploring data sets to highlight certain aspects when exploring data sets in tabular form.

Computing environment

[8]:
%load_ext watermark
%watermark -v -p pandas,jupyterlab
Python implementation: CPython
Python version       : 3.8.11
IPython version      : 7.26.0

pandas    : 1.3.2
jupyterlab: 3.1.7