Tutorial 1b: Loading and displaying data

This tutorial was generated from an IPython notebook. You can download the notebook here.

In this tutorial, we will learn how to load data stored on disk into a Python data structure. We will use pandas to read in CSV (comma separated value) files.

The data set we will use comes from a fun paper about the adhesive properties of frog tongues. The reference is Kleinteich and Gorb, Tongue adhesion in the horned frog Ceratophrys sp., Sci. Rep., 4, 5225, 2014. You can download the paper here. You might also want to check out a New York Times feature on the paper here.

In this paper, the authors investigated various properties of the adhesive characteristics of the tongues of horned frogs when they strike prey. The authors had a striking pad connected to a cantilever to measure forces. They also used high speed cameras to capture the strike and record relevant data.

Importing modules

As I mentioned in the last tutorial, we need to import modules we need for data analysis. I generally like to import everything we'll need at the beginning. We will use __future__, numpy, and matplotlib as in the last tutorial, but also pandas.

In [1]:
from __future__ import division, absolute_import, \
                                    print_function, unicode_literals

import numpy as np
import matplotlib.pyplot as plt
import pandas as pd

# Necessary to display plots in this IPython notebook
%matplotlib inline

The data file

The data from the paper are contained in the file frog_tongue_adhesion.csv, which you can download here. We can look at its contents.

In [2]:
# Use ! to invoke a shell command. Use head to look at top 20 lines of file.
!head -n 20 ../data/kleinteich_and_gorb/frog_tongue_adhesion.csv





















The first lines all begin with # signs, signifying that they are comments and not data. They do give important information, though, such as the meaning of the ID data. The ID refers to which specific frog was tested.

Immediately after the comments, we have a row of comma-separated headers. This row sets the number of columns in this data set and labels the meaning of the columns. So, we see that the first column is the date of the experiment, the second column is the ID of the frog, the third is the trial number, and so on.

After this row, each row repesents a single experiment where the frog struck the target.

CSV files are generally a good way to store data. Commas make better delimiters than white space (such as tabs) because they have no portability issues. Delimiter collision is avoided by putting the data fields in double quotes when necessary. There are other good ways to store data, such as JSON, but we will almost exclusively use CSV files in this class.

Loading a data set

We will use pd.read_csv to load the data set. The data are stored in a DataFrame, which is one of the data types that makes pandas so convenient for use in data analysis. DataFrames offer mixed data types, including incomplete columns, and convenient slicing, among many, many other convenient features. We will use the DataFrame to look at the data, at the same time demonstrating some of the power of DataFrames. They are like spreadsheets, only a lot better.

In [3]:
# Use pd.read_csv to read in the data and store in a DataFrame
# I am using my relative path of the data file; adjust as needed.
fname = '../data/kleinteich_and_gorb/frog_tongue_adhesion.csv'
df = pd.read_csv(fname, comment='#')

Notice that we used the kwarg comment to specify that lines that begin with # are comments and are to be ignored. If you check out the doc string for pd.read_csv, you will see there are lots of options for reading in the data.

Exploring the DataFrame

Let's jump right in and look at the contents of the DataFrame.

In [4]:
# Look at the contents (first 10 indices)
df[:10]
Out[4]:
date ID trial number impact force (mN) impact time (ms) impact force / body weight adhesive force (mN) time frog pulls on target (ms) adhesive force / body weight adhesive impulse (N-s) total contact area (mm2) contact area without mucus (mm2) contact area with mucus / contact area without mucus contact pressure (Pa) adhesive strength (Pa)
0 2013_02_26 I 3 1205 46 1.95 -785 884 1.27 -0.290 387 70 0.82 3117 -2030
1 2013_02_26 I 4 2527 44 4.08 -983 248 1.59 -0.181 101 94 0.07 24923 -9695
2 2013_03_01 I 1 1745 34 2.82 -850 211 1.37 -0.157 83 79 0.05 21020 -10239
3 2013_03_01 I 2 1556 41 2.51 -455 1025 0.74 -0.170 330 158 0.52 4718 -1381
4 2013_03_01 I 3 493 36 0.80 -974 499 1.57 -0.423 245 216 0.12 2012 -3975
5 2013_03_01 I 4 2276 31 3.68 -592 969 0.96 -0.176 341 106 0.69 6676 -1737
6 2013_03_05 I 1 556 43 0.90 -512 835 0.83 -0.285 359 110 0.69 1550 -1427
7 2013_03_05 I 2 1928 46 3.11 -804 508 1.30 -0.285 246 178 0.28 7832 -3266
8 2013_03_05 I 3 2641 50 4.27 -690 491 1.12 -0.239 269 224 0.17 9824 -2568
9 2013_03_05 I 4 1897 41 3.06 -462 839 0.75 -0.328 266 176 0.34 7122 -1733

We see that the column headings were automatically assigned. pandas also automatically defined the indices (names of the rows) as integers going up from zero. We could have defined the indices to be any of the columns of data.

To access a column of data, we use the following syntax.

In [5]:
# Slicing a column out of a DataFrame is achieved by using the column name
df['impact force (mN)']
Out[5]:
0     1205
1     2527
2     1745
3     1556
4      493
5     2276
6      556
7     1928
8     2641
9     1897
10    1891
11    1545
12    1307
13    1692
14    1543
...
65     22
66    502
67    273
68    720
69    582
70    198
71    198
72    597
73    516
74    815
75    402
76    605
77    711
78    614
79    468
Name: impact force (mN), Length: 80, dtype: int64

The indexing of the rows is preserved, and we can see that we can easily extract all of the impact forces. Note, though, that pd.read_csv interpreted the data to be integer (dtype = int64), so we may want to convert these to floats.

In [6]:
# Use .astype method to convert it to a NumPy float64 data type.
df['impact force (mN)'] = df['impact force (mN)'].astype(np.float64)

# Check that it worked
print('dtype = ', df['impact force (mN)'].dtype)
dtype =  float64

Now let's say we only want the impact force of strikes above one Newton.

In [7]:
# We can use a Boolean in the indexing of the DataFrame
df_big_force = df[df['impact force (mN)'] > 1000.0]

# Let's look at it; there will be only a few high-force values
df_big_force
Out[7]:
date ID trial number impact force (mN) impact time (ms) impact force / body weight adhesive force (mN) time frog pulls on target (ms) adhesive force / body weight adhesive impulse (N-s) total contact area (mm2) contact area without mucus (mm2) contact area with mucus / contact area without mucus contact pressure (Pa) adhesive strength (Pa)
0 2013_02_26 I 3 1205 46 1.95 -785 884 1.27 -0.290 387 70 0.82 3117 -2030
1 2013_02_26 I 4 2527 44 4.08 -983 248 1.59 -0.181 101 94 0.07 24923 -9695
2 2013_03_01 I 1 1745 34 2.82 -850 211 1.37 -0.157 83 79 0.05 21020 -10239
3 2013_03_01 I 2 1556 41 2.51 -455 1025 0.74 -0.170 330 158 0.52 4718 -1381
5 2013_03_01 I 4 2276 31 3.68 -592 969 0.96 -0.176 341 106 0.69 6676 -1737
7 2013_03_05 I 2 1928 46 3.11 -804 508 1.30 -0.285 246 178 0.28 7832 -3266
8 2013_03_05 I 3 2641 50 4.27 -690 491 1.12 -0.239 269 224 0.17 9824 -2568
9 2013_03_05 I 4 1897 41 3.06 -462 839 0.75 -0.328 266 176 0.34 7122 -1733
10 2013_03_12 I 1 1891 40 3.06 -766 1069 1.24 -0.380 408 33 0.92 4638 -1879
11 2013_03_12 I 2 1545 48 2.50 -715 649 1.15 -0.298 141 112 0.21 10947 -5064
12 2013_03_12 I 3 1307 29 2.11 -613 1845 0.99 -0.768 455 92 0.80 2874 -1348
13 2013_03_12 I 4 1692 31 2.73 -677 917 1.09 -0.457 186 129 0.31 9089 -3636
14 2013_03_12 I 5 1543 38 2.49 -528 750 0.85 -0.353 153 148 0.03 10095 -3453
15 2013_03_15 I 1 1282 31 2.07 -452 785 0.73 -0.253 290 105 0.64 4419 -1557
17 2013_03_15 I 3 2032 60 3.28 -652 486 1.05 -0.257 147 134 0.09 13784 -4425
18 2013_03_15 I 4 1240 34 2.00 -692 906 1.12 -0.317 364 260 0.28 3406 -1901
20 2013_03_19 II 1 1612 18 3.79 -655 3087 1.54 -0.385 348 15 0.96 4633 -1881
25 2013_03_21 II 2 1539 43 3.62 -664 741 1.56 -0.046 85 24 0.72 18073 -7802
28 2013_03_25 II 1 1453 72 3.42 -92 1652 0.22 -0.008 136 0 1.00 10645 -678
34 2013_04_03 II 1 1182 28 2.78 -522 1197 1.23 -0.118 281 0 1.00 4213 -1860

So now we only have the strikes of high force. Note, though, that the original indexing of rows was retained! In our new DataFrame with only the big force strikes, there is no index 4, for example.

In [9]:
# Executing the below will result in an exception
df_big_force['impact force (mN)'][4]
---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
<ipython-input-9-f8c89e836e51> in <module>()
      1 # Executing the below will result in an exception
----> 2 df_big_force['impact force (mN)'][4]

/Users/Justin/Library/Enthought/Canopy_64bit/User/lib/python2.7/site-packages/pandas/core/series.pyc in __getitem__(self, key)
    482     def __getitem__(self, key):
    483         try:
--> 484             result = self.index.get_value(self, key)
    485 
    486             if not np.isscalar(result):

/Users/Justin/Library/Enthought/Canopy_64bit/User/lib/python2.7/site-packages/pandas/core/index.pyc in get_value(self, series, key)
   1194 
   1195         try:
-> 1196             return self._engine.get_value(s, k)
   1197         except KeyError as e1:
   1198             if len(self) > 0 and self.inferred_type in ['integer','boolean']:

/Users/Justin/Library/Enthought/Canopy_64bit/User/lib/python2.7/site-packages/pandas/index.so in pandas.index.IndexEngine.get_value (pandas/index.c:2680)()

/Users/Justin/Library/Enthought/Canopy_64bit/User/lib/python2.7/site-packages/pandas/index.so in pandas.index.IndexEngine.get_value (pandas/index.c:2495)()

/Users/Justin/Library/Enthought/Canopy_64bit/User/lib/python2.7/site-packages/pandas/index.so in pandas.index.IndexEngine.get_loc (pandas/index.c:3234)()

/Users/Justin/Library/Enthought/Canopy_64bit/User/lib/python2.7/site-packages/pandas/hashtable.so in pandas.hashtable.Int64HashTable.get_item (pandas/hashtable.c:6540)()

/Users/Justin/Library/Enthought/Canopy_64bit/User/lib/python2.7/site-packages/pandas/hashtable.so in pandas.hashtable.Int64HashTable.get_item (pandas/hashtable.c:6484)()

KeyError: 4

This might seem counterintuitive, but it is actually a good idea. Remember, indices do not have to be integers!

There is a way around this, though. We can use the iloc attribute of a DataFrame. This gives indexing with sequential integers.

In [10]:
# Using iloc enables indexing by the corresponding sequence of integers
df_big_force['impact force (mN)'].iloc[4]
Out[10]:
2276.0

One column represents many experiments of the same kind of measurement (e.g., impact force). One row represents a single experiment and many kinds of measurements. We have extracted a column out of the DataFrame, but how do we extract a row? This is also very easy with DataFrames using their ix method.

In [11]:
# Let's slice out experiment with index 42
exp_42 = df.ix[42]

# And now let's look at it
exp_42
Out[11]:
date                                                    2013_05_27
ID                                                             III
trial number                                                     3
impact force (mN)                                              324
impact time (ms)                                               105
impact force / body weight                                    2.61
adhesive force (mN)                                           -172
time frog pulls on target (ms)                                 619
adhesive force / body weight                                  1.38
adhesive impulse (N-s)                                      -0.079
total contact area (mm2)                                        55
contact area without mucus (mm2)                                23
contact area with mucus / contact area without mucus          0.37
contact pressure (Pa)                                         5946
adhesive strength (Pa)                                       -3149
Name: 42, dtype: object

Notice how clever the DataFrame is. We sliced a row out, and now the indices describing its elements are the column headings.

You may take issue with the rather lengthy syntax of access column names. I.e., if you were trying to access the ratio of the contact area with mucus to the contact area without mucus for trial number 3 on May 27, 2013, you would do the following.

In [12]:
# Set up criteria for our seach of the DataFrame
date = (df['date'] == '2013_05_27')
trial = (df['trial number'] == 3)
ID = (df['ID'] == 'III')

# When indexing DataFrames, use & for Boolean and (and | for or; ~ for not)
df['contact area with mucus / contact area without mucus'][date & trial & ID]
Out[12]:
42    0.37
Name: contact area with mucus / contact area without mucus, dtype: float64

Yeesh. That syntax is clunky. But many would argue that this is prefered because there is no ambiguity about what you are asking for. However, you may want to use shorter names. Conveniently, when your column names do not have spaces, you can use attribute access. For example....

In [13]:
# Attribute access
df.date[:10]
Out[13]:
0    2013_02_26
1    2013_02_26
2    2013_03_01
3    2013_03_01
4    2013_03_01
5    2013_03_01
6    2013_03_05
7    2013_03_05
8    2013_03_05
9    2013_03_05
Name: date, dtype: object

Attribute access is really for convenience and is not the default, but if can make writing the code much less cumbersome. So, let's change the name of the colums as

\begin{align} \texttt{trial number} &\to \texttt{trial} \\ \texttt{contact area with mucus / contact area without mucus} &\to \texttt{ca_ratio}. \end{align}

DataFrames have a nice rename method to do this.

In [14]:
# Make a dictionary to rename columns
rename_dict = {'trial number' : 'trial',
        'contact area with mucus / contact area without mucus' : 'ca_ratio'}

# Rename in-place; otherwise returns a new DataFrame
df.rename(columns=rename_dict, inplace=True)

# Try out our new column name
df.ca_ratio[42]
Out[14]:
0.37

Notice that we have introduced a new Python data structure, the dictionary. A dictionary is a collection of objects, each one with a key to use for indexing. For example, in rename_dict, we could get what we wanted to rename 'trial number'.

In [15]:
# Indexing of dictionaries looks syntactially similar to cols in DataFrames
rename_dict['trial number']
Out[15]:
u'trial'

We can go on and on about indexing, but we will stop here. For much more on all of the clever ways you can access data and subsets thereor in DataFrames, see the pandas docs on indexing.

"Playing" with data: an important part of data analysis

"Exploratory data analysis" is the time during data analysis where you explore your data. You look at numbers, plot various quantities, and think about what analysis techniques you would like to use. pandas DataFrames help you do this.

As we go through the interactive analysis, we will learn about syntax for various matplotlib plot styles.

The first thing we'll do is look at strike forces. We'll rename the column for impact force for convenient attribute access because we'll access it many times.

In [16]:
df.rename(columns={'impact force (mN)': 'impf'}, inplace=True)

Now, let's start plotting the impact forces to see what we're dealing with. We'll start with the most naive plot.

In [17]:
# Just make a scatter plot of forces
plt.plot(df.impf, 'k.')
plt.margins(x=0.02, y=0.02)
plt.xlabel('order in DataFrame')
plt.ylabel('impact force (mN)')
Out[17]:
<matplotlib.text.Text at 0x1105378d0>

The $x$-axis is pretty meaningless. So, instead let's plot a histogram of impact forces.

In [18]:
# Make a histogram plot; bins kwarg gives the number of bars
n, bin_edges, patches = plt.hist(df.impf, bins=20, normed=False)
plt.grid(True, axis='y')
plt.xlabel('impact force (mN)')
plt.ylabel('number of observations')
Out[18]:
<matplotlib.text.Text at 0x110562310>

This is a better way to look at the impact force measurements. We see that there are a few high-force impacts, but that most of the impacts are about 500 mN or so.

This is still only part of the story. We would like to know how the impacts vary from frog to frog. First, let's see how many trials we have for each frog.

In [19]:
# This is a fancy way to do string formatting; unnecessary, but I thought
# I'd show how to do it.
print("""
Frog ID      Number of samples
=======      =================
   I               {0:d}
  II               {1:d}
 III               {2:d}
  IV               {3:d}
""".format(df.ID[df.ID=='I'].count(), df.ID[df.ID=='II'].count(),
           df.ID[df.ID=='III'].count(), df.ID[df.ID=='IV'].count()))

Frog ID      Number of samples
=======      =================
   I               20
  II               20
 III               20
  IV               20


So, we only have 20 samples for each frog. That's a bit few to construct a meaningful histogram for each frog. So, maybe we can make a bar graph showing the mean and standard deviation of each sample.

In [20]:
# Compute the mean impacts
mean_impacts = df.groupby('ID').impf.mean()

# We will use standard deviation as error bar
# Default DataFrame std is normalized by N-1; more on that in lecture
std_impacts = df.groupby('ID').impf.std()

# Bar locations
x = np.arange(4)

# Bar widths
bar_width = 0.5

# How to label each bar
bar_labels = df.ID.unique()

# We use matplotlib's bar function to make plot
plt.bar(x, mean_impacts, yerr=std_impacts, width=bar_width, align='center',
        color='lightgray', error_kw={'ecolor' : 'black'})
plt.grid(True, axis='y')
plt.xticks(x, bar_labels)
plt.ylabel('impact force (mN)')
Out[20]:
<matplotlib.text.Text at 0x110533910>

I had you do the plot that way because I want you to get comfortable with matplotlib, which is really diverse in its uses. However, you can make essentially the same figure in one or two lines using some of the awesomeness of DataFrames.

In [21]:
# Compute standard deviations grouped by ID
yerr = df.groupby('ID').impf.std()

# Plot the bar graphs
df.groupby('ID').impf.mean().plot(kind='bar', yerr=yerr, color='lightgray')
plt.ylabel('impact force (mN)')
Out[21]:
<matplotlib.text.Text at 0x11059da50>

We only have 20 measurements for each frog. Wouldn't it be better just to plot all 20 instead of trying to distill it down to a mean and a standard deviation in the bar graph? After all, the impact force measurements might not be Gaussian distributed; they may be bimodal or something else. So, we would like to generate a column scatter plot. We will plot each data point for each frog in a column, and "jitter" its position along the $x$-axis. We make the points somewhat transparent to allow visualization of overlap.

In [22]:
# Column positions
x_pos = np.arange(4)

# Column labels
x_labels = ['I', 'II', 'III', 'IV']

# Jitter by adding normally distributed random number and plot with 
# alpha < 1, which gives transparency
for i in range(len(x_pos)):
    y = df[df.ID==x_labels[i]].impf
    x = np.random.normal(x_pos[i], 0.04, size=len(y))
    plt.plot(x, y, 'ko', alpha=0.3)
plt.xticks(x_pos, x_labels)
plt.ylabel('impact force (mN)')
Out[22]:
<matplotlib.text.Text at 0x111601710>

Very nice! Now we can see that frog I, an adult, strikes with a wide range of impact forces, and can strike really hard. Frog II, also an adult, tends to strike at around 500 mN, but occassionally will strike harder. Juvenile frog III is a pretty consistent striker, while frog IV can have some pretty weak strikes.

The column scatter plot is not difficult to look at. The informational content of the data does not need to be distilled into a bar with a standard deviation.

Plots like these in which the dots to not overlap are called beeswarm plots. For large numbers of data points, these are not favored over the column scatter plot, but for small numbers of data points, beeswarm plots tend to be easier to read, as each data point can be clearly seen. Unfortunately, matplotlib does not natively have a clean way of doing this. It is definitely possible, but is cumbersome.

Fortunately, there is a package to do it, pybeeswarm, written by Melissa Gyrmek. It is not available through Canopy, so we need to install it by hand, which we will occassionally need to do. To install Python packages, pip is the most commonly used tool. To do it, we simply invoke the shell from IPython and ask pip to install pybeeswarm.

In [23]:
# This should work with Mac OS X and Linux.  For Windows, you'll have to use
# PowerShell or cygwin.
!pip install pybeeswarm
Downloading/unpacking pybeeswarm
  Downloading pybeeswarm-1.0.0.tar.gz
  Running setup.py (path:/Users/Justin/Library/Enthought/Canopy_64bit/User/build/pybeeswarm/setup.py) egg_info for package pybeeswarm
    
Requirement already satisfied (use --upgrade to upgrade): matplotlib in /Users/Justin/Library/Enthought/Canopy_64bit/User/lib/python2.7/site-packages (from pybeeswarm)
Requirement already satisfied (use --upgrade to upgrade): numpy in /Users/Justin/Library/Enthought/Canopy_64bit/User/lib/python2.7/site-packages (from pybeeswarm)
Requirement already satisfied (use --upgrade to upgrade): pandas in /Users/Justin/Library/Enthought/Canopy_64bit/User/lib/python2.7/site-packages (from pybeeswarm)
Requirement already satisfied (use --upgrade to upgrade): six>=1.3 in /Users/Justin/Library/Enthought/Canopy_64bit/User/lib/python2.7/site-packages (from matplotlib->pybeeswarm)
Requirement already satisfied (use --upgrade to upgrade): python-dateutil in /Users/Justin/Library/Enthought/Canopy_64bit/User/lib/python2.7/site-packages (from matplotlib->pybeeswarm)
Requirement already satisfied (use --upgrade to upgrade): pyparsing>=1.5.6 in /Users/Justin/Library/Enthought/Canopy_64bit/User/lib/python2.7/site-packages (from matplotlib->pybeeswarm)
Requirement already satisfied (use --upgrade to upgrade): nose>=0.11.1 in /Users/Justin/Library/Enthought/Canopy_64bit/User/lib/python2.7/site-packages (from matplotlib->pybeeswarm)
Requirement already satisfied (use --upgrade to upgrade): mock in /Applications/Canopy.app/appdata/canopy-1.4.1.1975.macosx-x86_64/Canopy.app/Contents/lib/python2.7/site-packages (from matplotlib->pybeeswarm)
Requirement already satisfied (use --upgrade to upgrade): pytz>=2011k in /Users/Justin/Library/Enthought/Canopy_64bit/User/lib/python2.7/site-packages (from pandas->pybeeswarm)
Installing collected packages: pybeeswarm
  Running setup.py install for pybeeswarm
    
Successfully installed pybeeswarm
Cleaning up...

Now that we have pybeeswarm, we can put it to use!

In [24]:
# Import it first (will do this at the beginning of future tutorials)
import beeswarm as bs

# Separate impact forces by frog
list_of_impfs = [df.impf[df.ID=='I'], df.impf[df.ID=='II'],
                 df.impf[df.ID=='III'], df.impf[df.ID=='IV']]

# Generate a beeswarm plot
bs_plot, ax = bs.beeswarm(list_of_impfs, labels=['I', 'II', 'III', 'IV'])
plt.ylabel('impact force (mN)')
Out[24]:
<matplotlib.text.Text at 0x1116346d0>

We can get even more information. We might be interested to see if the variability in impact force is day-to-day or time-independent. So, we would like to make the beeswarm plot with different colors on different days. This requires a bit more code, but it is not too bad. (As is the case in many of the examples in the tutorials, there is likely a more concise way to do this.)

In [25]:
# We need to import the colormap module from matplotlib
from matplotlib import cm

# Get list of dates; the unique method from a DataFrame works well
dates = df.date.unique()

# Number of unique dates
n_dates = len(dates)

# Assign colors to date names, calling colormap with value between 0 and 1
# We will use the qualitative colormap "Accent."
colors = []
for i in range(n_dates):
    colors.append(cm.Set1(float(i) / float(n_dates)))
    
# Make a dictionary of dates and colors
color_dict = dict(zip(dates, colors))

# Sort the DataFrame by ID to make sure color labels are in correct order
# in the beeswarm plot
df.sort('ID', inplace=True)

# Make a list of colors
colors = []
for i in range(len(df)):
    colors.append(color_dict[df.date.iloc[i]])

# Make beeswarm with date coloring
bs_plot, ax = bs.beeswarm(list_of_impfs, labels=['I', 'II', 'III', 'IV'],
                          col=colors)
plt.ylabel('impact force (mN)')
Out[25]:
<matplotlib.text.Text at 0x11166ff50>

We do see some clustering of colors, so there may be some correlation between the day of the measurement and the impact force of the frog's tongue. I am careful to say may be some correlation, since we have not carefully quantified it.

Now let's look at correlations. We may be curious: does a hard strike result in better adhesion? We can plot the adhesive force versus the impact force.

In [26]:
# Rename the column for convenience in future use
df.rename(columns={'adhesive force (mN)' : 'adhf'}, inplace=True)

# Plot adhesive force vs. impact force
plt.plot(df.impf, df.adhf, 'k.', alpha=0.5)
plt.xlabel('impact force (mN)')
plt.ylabel('adhesive force (mN)')
Out[26]:
<matplotlib.text.Text at 0x1116759d0>

Later in the course, we will learn how to do regressions to test the relationship between two variables.

For the time remaining in this tutorial, explore the frog tongue adhesion data set, generating plots of various quantities.

Conclusions

In this tutorial, we have learned how to load data from CSV files into pandas DataFrames. DataFrames are useful objects for looking at data from many angles. Together with matplotlib, you can start to look at qualitative features of your data and think about how you will go about doing your detailed analysis.