Lesson 8 exercises

Data set download


[1]:
# Colab setup ------------------
import os, sys, subprocess
if "google.colab" in sys.modules:
    cmd = "pip install --upgrade iqplot bebi103 watermark"
    process = subprocess.Popen(cmd.split(), stdout=subprocess.PIPE, stderr=subprocess.PIPE)
    stdout, stderr = process.communicate()
    data_path = "https://s3.amazonaws.com/bebi103.caltech.edu/data/"
else:
    data_path = "../data/"
# ------------------------------

import pandas as pd

Exercise 8.1

In the lesson exercise, we will again work with a subset of the Palmer penguin data set. I will load it and view it now.

[2]:
df = pd.read_csv(os.path.join(data_path, "penguins_subset.csv"), header=[0, 1])

df.head()
[2]:
Gentoo Adelie Chinstrap
bill_depth_mm bill_length_mm flipper_length_mm body_mass_g bill_depth_mm bill_length_mm flipper_length_mm body_mass_g bill_depth_mm bill_length_mm flipper_length_mm body_mass_g
0 16.3 48.4 220.0 5400.0 18.5 36.8 193.0 3500.0 18.3 47.6 195.0 3850.0
1 15.8 46.3 215.0 5050.0 16.9 37.0 185.0 3000.0 16.7 42.5 187.0 3350.0
2 14.2 47.5 209.0 4600.0 19.5 42.0 200.0 4050.0 16.6 40.9 187.0 3200.0
3 15.7 48.7 208.0 5350.0 18.3 42.7 196.0 4075.0 20.0 52.8 205.0 4550.0
4 14.1 48.7 210.0 4450.0 18.0 35.7 202.0 3550.0 18.7 45.4 188.0 3525.0

Explain in words what each of the following code cells does as we work toward tidying this data frame.

[3]:
df.columns.names = ['species', None]
[4]:
df = df.stack(level='species')
[5]:
df = df.reset_index(level='species')
[6]:
df = df.reset_index(drop=True)

Exercise 8.2

What is the difference between merging and concatenating data frames?

Exercise 8.3

Write down any questions or points of confusion that you have.

Computing environment

[7]:
%load_ext watermark
%watermark -v -p pandas,jupyterlab
CPython 3.8.5
IPython 7.18.1

pandas 1.1.3
jupyterlab 2.2.6